On older versions of glib (anything prior to glib commit 0f056ebe
from May 2019), the implementation of g_source_ref() and
g_source_unref() is not threadsafe for a GSource which is not
attached to a GMainContext.
QEMU's real iothread.c implementation always attaches its
iothread->ctx's GSource to a GMainContext created for that iothread,
so it is OK, but the simple test framework implementation in
tests/iothread.c was not doing this. This was causing intermittent
assertion failures in the test-aio-multithread subtest
"/aio/multi/mutex/contended" test on the BSD hosts. (It's unclear
why only BSD seems to have been affected -- perhaps a combination of
the specific glib version being used in the VMs and their happening
to run on a host with a lot of CPUs).
Borrow the iothread_init_gcontext() from the real iothread.c
and add the corresponding cleanup code and the calls to
g_main_context_push/pop_thread_default() so we actually use
the GMainContext we create.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Message-id: 20200106144552.7205-1-peter.maydell@linaro.org
tests/test-bdrv-drain can hang in tests/iothread.c:iothread_run():
while (!atomic_read(&iothread->stopping)) {
aio_poll(iothread->ctx, true);
}
The iothread_join() function works as follows:
void iothread_join(IOThread *iothread)
{
iothread->stopping = true;
aio_notify(iothread->ctx);
qemu_thread_join(&iothread->thread);
If iothread_run() checks iothread->stopping before the iothread_join()
thread sets stopping to true, then aio_notify() may be optimized away
and iothread_run() hangs forever in aio_poll().
The correct way to change iothread->stopping is from a BH that executes
within iothread_run(). This ensures that iothread->stopping is checked
after we set it to true.
This was already fixed for ./iothread.c (note this is a different source
file!) by commit 2362a28ea1 ("iothread:
fix iothread_stop() race condition"), but not for tests/iothread.c.
Fixes: 0c330a734b
("aio: introduce aio_co_schedule and aio_co_wake")
Reported-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20191003100103.331-1-stefanha@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
aio_co_wake provides the infrastructure to start a coroutine on a "home"
AioContext. It will be used by CoMutex and CoQueue, so that coroutines
don't jump from one context to another when they go to sleep on a
mutex or waitqueue. However, it can also be used as a more efficient
alternative to one-shot bottom halves, and saves the effort of tracking
which AioContext a coroutine is running on.
aio_co_schedule is the part of aio_co_wake that starts a coroutine
on a remove AioContext, but it is also useful to implement e.g.
bdrv_set_aio_context callbacks.
The implementation of aio_co_schedule is based on a lock-free
multiple-producer, single-consumer queue. The multiple producers use
cmpxchg to add to a LIFO stack. The consumer (a per-AioContext bottom
half) grabs all items added so far, inverts the list to make it FIFO,
and goes through it one item at a time until it's empty. The data
structure was inspired by OSv, which uses it in the very code we'll
"port" to QEMU for the thread-safe CoMutex.
Most of the new code is really tests.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 20170213135235.12274-3-pbonzini@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>