Commit Graph

40 Commits

Author SHA1 Message Date
Stefan Hajnoczi da5e1de95b Revert "iothread: release iothread around aio_poll"
This reverts commit a0710f7995.

In qemu-devel email message <556DBF87.2020908@de.ibm.com>, Christian
Borntraeger writes:

  Having many guests all with a kernel/ramdisk (via -kernel) and
  several null block devices will result in hangs. All hanging
  guests are in partition detection code waiting for an I/O to return
  so very early maybe even the first I/O.

  Reverting that commit "fixes" the hangs.

Reverting this commit for the 2.4 release.  More time is needed to
investigate and correct this patch.

Reported-by: Christian Borntraeger <borntraeger@de.ibm.com>
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-06-12 13:58:33 +01:00
Paolo Bonzini a0710f7995 iothread: release iothread around aio_poll
This is the first step towards having fine-grained critical sections in
dataplane threads, which resolves lock ordering problems between
address_space_* functions (which need the BQL when doing MMIO, even
after we complete RCU-based dispatch) and the AioContext.

Because AioContext does not use contention callbacks anymore, the
unit test has to be changed.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1424449612-18215-4-git-send-email-pbonzini@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-04-28 15:36:08 +02:00
Paolo Bonzini e98ab09709 aio-posix: move pollfds to thread-local storage
By using thread-local storage, aio_poll can stop using global data during
g_poll_ns.  This will make it possible to drop callbacks from rfifolock.

[Moved npfd = 0 assignment to end of walking_handlers region as
suggested by Paolo.  This resolves the assert(npfd == 0) assertion
failure in pollfds_cleanup().
--Stefan]

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1424449612-18215-2-git-send-email-pbonzini@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-04-28 15:36:08 +02:00
Paolo Bonzini e8d3b1a25f aio: strengthen memory barriers for bottom half scheduling
There are two problems with memory barriers in async.c.  The fix is
to use atomic_xchg in order to achieve sequential consistency between
the scheduling of a bottom half and the corresponding execution.

First, if bh->scheduled is already 1 in qemu_bh_schedule, QEMU does
not execute a memory barrier to order any writes needed by the callback
before the read of bh->scheduled.  If the other side sees req->state as
THREAD_ACTIVE, the callback is not invoked and you get deadlock.

Second, the memory barrier in aio_bh_poll is too weak.  Without this
patch, it is possible that bh->scheduled = 0 is not "published" until
after the callback has returned.  Another thread wants to schedule the
bottom half, but it sees bh->scheduled = 1 and does nothing.  This causes
a lost wakeup.  The memory barrier should have been changed to smp_mb()
in commit 924fe12 (aio: fix qemu_bh_schedule() bh->ctx race condition,
2014-06-03) together with qemu_bh_schedule()'s.  Guess who reviewed
that patch?

Both of these involve a store and a load, so they are reproducible on
x86_64 as well.  It is however much easier on aarch64, where the
libguestfs test suite triggers the bug fairly easily.  Even there the
failure can go away or appear depending on compiler optimization level,
tracing options, or even kernel debugging options.

Paul Leveille however reported how to trigger the problem within 15
minutes on x86_64 as well.  His (untested) recipe, reproduced here
for reference, is the following:

   1) Qcow2 (or 3) is critical – raw files alone seem to avoid the problem.

   2) Use “cache=directsync” rather than the default of
   “cache=none” to make it happen easier.

   3) Use a server with a write-back RAID controller to allow for rapid
   IO rates.

   4) Run a random-access load that (mostly) writes chunks to various
   files on the virtual block device.

      a. I use ‘diskload.exe c:25’, a Microsoft HCT load
         generator, on Windows VMs.

      b. Iometer can probably be configured to generate a similar load.

   5) Run multiple VMs in parallel, against the same storage device,
   to shake the failure out sooner.

   6) IvyBridge and Haswell processors for certain; not sure about others.

A similar patch survived over 12 hours of testing, where an unpatched
QEMU would fail within 15 minutes.

This bug is, most likely, also the cause of failures in the libguestfs
testsuite on AArch64.

Thanks to Laszlo Ersek for initially reporting this bug, to Stefan
Hajnoczi for suggesting closer examination of qemu_bh_schedule, and to
Paul for providing test input and a prototype patch.

Reported-by: Laszlo Ersek <lersek@redhat.com>
Reported-by: Paul Leveille <Paul.Leveille@stratus.com>
Reported-by: John Snow <jsnow@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1428419779-26062-1-git-send-email-pbonzini@redhat.com
Suggested-by: Paul Leveille <Paul.Leveille@stratus.com>
Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-04-09 10:29:29 +01:00
Paolo Bonzini ee82310f8a block: replace g_new0 with g_new for bottom half allocation.
This saves about 15% of the clock cycles spent on allocation.  Using the
slice allocator does not add a visible improvement; allocation is faster
than malloc, while freeing seems to be slower.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-01-13 11:47:56 +00:00
Paolo Bonzini fcf5def1ab block: mark AioContext as recursive
AioContext can be accessed recursively, in fact that's what we do with
aio_poll.  Marking the GSource as recursive avoids that GLib blocks it
and unblocks it around every call to aio_dispatch, which is a pretty
expensive operation.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-01-13 11:47:55 +00:00
Markus Armbruster 3ba235a022 block: Use g_new0() for a bit of extra type checking
g_new(T, 1) is safer than g_malloc(sizeof(T)), because it returns T *
rather than void *, which lets the compiler catch more type errors.

Missed in commit 02c4f26.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-id: 1417697709-13087-1-git-send-email-armbru@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-12-10 10:31:21 +01:00
Chrysostomos Nanakos 2f78e491d7 async: aio_context_new(): Handle event_notifier_init failure
On a system with a low limit of open files the initialization
of the event notifier could fail and QEMU exits without printing any
error information to the user.

The problem can be easily reproduced by enforcing a low limit of open
files and start QEMU with enough I/O threads to hit this limit.

The same problem raises, without the creation of I/O threads, while
QEMU initializes the main event loop by enforcing an even lower limit of
open files.

This commit adds an error message on failure:

 # qemu [...] -object iothread,id=iothread0 -object iothread,id=iothread1
 qemu: Failed to initialize event notifier: Too many open files in system

Signed-off-by: Chrysostomos Nanakos <cnanakos@grnet.gr>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-09-22 11:39:48 +01:00
Paolo Bonzini a3462c6561 AioContext: introduce aio_prepare
This will be used to implement socket polling on Windows.
On Windows, select() and g_poll() are completely different;
sockets are polled with select() before calling g_poll,
and the g_poll must be nonblocking if select() says a
socket is ready.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-29 10:46:58 +01:00
Paolo Bonzini e4c7e2d12d AioContext: export and use aio_dispatch
So far, aio_poll's scheme was dispatch/poll/dispatch, where
the first dispatch phase was used only in the GSource case in
order to avoid a blocking poll.  Earlier patches changed it to
dispatch/prepare/poll/dispatch, where prepare is aio_compute_timeout.

By making aio_dispatch public, we can remove the first dispatch
phase altogether, so that both aio_poll and the GSource use the same
prepare/poll/dispatch scheme.

This patch breaks the invariant that aio_poll(..., true) will not block
the first time it returns false.  This used to be fundamental for
qemu_aio_flush's implementation as "while (qemu_aio_wait()) {}" but
no code in QEMU relies on this invariant anymore.  The return value
of aio_poll() is now comparable with that of g_main_context_iteration.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-29 10:46:58 +01:00
Paolo Bonzini 845ca10dd0 AioContext: take bottom halves into account when computing aio_poll timeout
Right now, QEMU invokes aio_bh_poll before the "poll" phase
of aio_poll.  It is simpler to do it afterwards and skip the
"poll" phase altogether when the OS-dependent parts of AioContext
are invoked from GSource.  This way, AioContext behaves more
similarly when used as a GSource vs. when used as stand-alone.

As a start, take bottom halves into account when computing the
poll timeout.  If a bottom half is ready, do a non-blocking
poll.  As a side effect, this makes idle bottom halves work
with aio_poll; an improvement, but not really an important
one since they are deprecated.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-29 10:46:58 +01:00
Paolo Bonzini 0ceb849bd3 AioContext: speed up aio_notify
In many cases, the call to event_notifier_set in aio_notify is unnecessary.
In particular, if we are executing aio_dispatch, or if aio_poll is not
blocking, we know that we will soon get to the next loop iteration (if
necessary); the thread that hosts the AioContext's event loop does not
need any nudging.

The patch includes a Promela formal model that shows that this really
works and does not need any further complication such as generation
counts.  It needs a memory barrier though.

The generation counts are not needed because any change to
ctx->dispatching after the memory barrier is okay for aio_notify.
If it changes from zero to one, it is the right thing to skip
event_notifier_set.  If it changes from one to zero, the
event_notifier_set is unnecessary but harmless.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-07-09 15:50:11 +02:00
Stefan Hajnoczi 924fe1293c aio: fix qemu_bh_schedule() bh->ctx race condition
qemu_bh_schedule() is supposed to be thread-safe at least the first time
it is called.  Unfortunately this is not quite true:

  bh->scheduled = 1;
  aio_notify(bh->ctx);

Since another thread may run the BH callback once it has been scheduled,
there is a race condition if the callback frees the BH before
aio_notify(bh->ctx) has a chance to run.

Reported-by: Stefan Priebe <s.priebe@profihost.ag>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Tested-by: Stefan Priebe <s.priebe@profihost.ag>
2014-06-04 09:56:06 +02:00
Stefan Hajnoczi 98563fc3ec aio: add aio_context_acquire() and aio_context_release()
It can be useful to run an AioContext from a thread which normally does
not "own" the AioContext.  For example, request draining can be
implemented by acquiring the AioContext and looping aio_poll() until all
requests have been completed.

The following pattern should work:

  /* Event loop thread */
  while (running) {
      aio_context_acquire(ctx);
      aio_poll(ctx, true);
      aio_context_release(ctx);
  }

  /* Another thread */
  aio_context_acquire(ctx);
  bdrv_read(bs, 0x1000, buf, 1);
  aio_context_release(ctx);

This patch implements aio_context_acquire() and aio_context_release().

Note that existing aio_poll() callers do not need to worry about
acquiring and releasing - it is only needed when multiple threads will
call aio_poll() on the same AioContext.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-03-13 14:42:24 +01:00
Alex Bligh 533a8cf350 aio / timers: aio_ctx_prepare sets timeout from AioContext timers
Calculate the timeout in aio_ctx_prepare taking into account
the timers attached to the AioContext.

Alter aio_ctx_check similarly.

Signed-off-by: Alex Bligh <alex@alex.org.uk>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-08-22 19:10:28 +02:00
Alex Bligh d5541d8680 aio / timers: Add a notify callback to QEMUTimerList
Add a notify pointer to QEMUTimerList so it knows what to notify
on a timer change.

Signed-off-by: Alex Bligh <alex@alex.org.uk>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-08-22 19:10:28 +02:00
Alex Bligh dae21b98b9 aio / timers: Add QEMUTimerListGroup to AioContext
Add a QEMUTimerListGroup each AioContext (meaning a QEMUTimerList
associated with each clock is added) and delete it when the
AioContext is freed.

Signed-off-by: Alex Bligh <alex@alex.org.uk>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-08-22 19:10:27 +02:00
Stefan Hajnoczi f2e5dca46b aio: drop io_flush argument
The .io_flush() handler no longer exists and has no users.  Drop the
io_flush argument to aio_set_fd_handler() and related functions.

The AioFlushEventNotifierHandler and AioFlushHandler typedefs are no
longer used and are dropped too.

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-08-19 15:52:19 +02:00
Liu Ping Fan dcc772e2f2 QEMUBH: make AioContext's bh re-entrant
BH will be used outside big lock, so introduce lock to protect
between the writers, ie, bh's adders and deleter. The lock only
affects the writers and bh's callback does not take this extra lock.
Note that for the same AioContext, aio_bh_poll() can not run in
parallel yet.

Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-07-19 12:29:21 +08:00
Stefan Hajnoczi 9b34277d23 aio: add a ThreadPool instance to AioContext
This patch adds a ThreadPool to AioContext.  It's possible that some
AioContext instances will never use the ThreadPool, so defer creation
until aio_get_thread_pool().

The reason why AioContext should have the ThreadPool is because the
ThreadPool is bound to a AioContext instance where the work item's
callback function is invoked.  It doesn't make sense to keep the
ThreadPool pointer anywhere other than AioContext.  For example,
block/raw-posix.c can get its AioContext's ThreadPool and submit work.

Special note about headers: I used struct ThreadPool in aio.h because
there is a circular dependency if aio.h includes thread-pool.h.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
2013-03-15 16:07:50 +01:00
Stefan Hajnoczi 6b5f876252 aio: convert aio_poll() to g_poll(3)
AioHandler already has a GPollFD so we can directly use its
events/revents.

Add the int pollfds_idx field to AioContext so we can map g_poll(3)
results back to AioHandlers.

Reuse aio_dispatch() to invoke handlers after g_poll(3).

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Message-id: 1361356113-11049-10-git-send-email-stefanha@redhat.com
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2013-02-21 16:17:31 -06:00
Paolo Bonzini 1de7afc984 misc: move include files to include/qemu/
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-12-19 08:32:39 +01:00
Paolo Bonzini 737e150e89 block: move include files to include/block/
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-12-19 08:31:31 +01:00
Kevin Wolf c57b6656c3 aio: Get rid of qemu_aio_flush()
There are no remaining users, and new users should probably be
using bdrv_drain_all() in the first place.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-12-11 11:04:25 +01:00
Paolo Bonzini f5022a135e aio: fix aio_ctx_prepare with idle bottom halves
Commit ed2aec4867f0d5f5de496bb765347b5d0cfe113d changed the return
value of aio_ctx_prepare from false to true when only idle bottom
halves are available.  This broke PC old-style DMA, which uses them.
Fix this by making aio_ctx_prepare return true only when non-idle
bottom halves are scheduled to run.

Reported-by: malc <av1474@comtv.ru>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: malc <av1474@comtv.ru>
2012-11-12 20:02:09 +04:00
Paolo Bonzini 22bfa75eaf aio: clean up now-unused functions
Some cleanups can now be made, now that the main loop does not anymore need
hooks into the bottom half code.

Reviewed-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-10-30 09:30:54 +01:00
Paolo Bonzini 2f4dc3c1b2 aio: add aio_notify
With this change async.c does not rely anymore on any service from
main-loop.c, i.e. it is completely self-contained.

Reviewed-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-10-30 09:30:53 +01:00
Paolo Bonzini e3713e001f aio: make AioContexts GSources
This lets AioContexts be used (optionally) with a glib main loop.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-10-30 09:30:53 +01:00
Paolo Bonzini 7c0628b20e aio: add non-blocking variant of aio_wait
This will be used when polling the GSource attached to an AioContext.

Reviewed-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-10-30 09:30:53 +01:00
Paolo Bonzini a915f4bc97 aio: add I/O handlers to the AioContext interface
With this patch, I/O handlers (including event notifier handlers) can be
attached to a single AioContext.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-10-30 09:30:53 +01:00
Paolo Bonzini f627aab1cc aio: introduce AioContext, move bottom halves there
Start introducing AioContext, which will let us remove globals from
aio.c/async.c, and introduce multiple I/O threads.

The bottom half functions now take an additional AioContext argument.
A bottom half is created with a specific AioContext that remains the
same throughout the lifetime.  qemu_bh_new is just a wrapper that
uses a global context.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-10-30 09:30:53 +01:00
Stefan Weil 9b47b17e80 async: Use bool for boolean struct members and remove a hole
Using bool reduces the size of the structure and improves readability.
A hole in the structure was removed.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
2012-05-01 10:13:25 +01:00
Stefano Stabellini 7c7db75576 main_loop_wait: block indefinitely
- remove qemu_calculate_timeout;

- explicitly size timeout to uint32_t;

- introduce slirp_update_timeout;

- pass NULL as timeout argument to select in case timeout is the maximum
value;

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Paul Brook <paul@codesourcery.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-04-26 13:14:58 -05:00
Paolo Bonzini 44a9b356ad main-loop: create main-loop.h
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2011-10-21 18:14:30 +02:00
Kevin Wolf 648fb0ea5e async: Allow nested qemu_bh_poll calls
qemu may segfault when a BH handler first deletes a BH and then (possibly
indirectly) calls a nested qemu_bh_poll(). This is because the inner instance
frees the BH and deletes it from the list that the outer one processes.

This patch deletes BHs only in the outermost qemu_bh_poll instance.

Commit 7887f620 already tried to achieve the same, but it assumed that the BH
handler would only delete its own BH. With a nested qemu_bh_poll(), this isn't
guaranteed, so that commit wasn't enough. Hope this one fixes it for real.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-09-06 11:23:51 +02:00
Anthony Liguori 7267c0947d Use glib memory allocation and free functions
qemu_malloc/qemu_free no longer exist after this commit.

Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2011-08-20 23:01:08 -05:00
Kevin Wolf 384acbf46b async: Remove AsyncContext
The purpose of AsyncContexts was to protect qcow and qcow2 against reentrancy
during an emulated bdrv_read/write (which includes a qemu_aio_wait() call and
can run AIO callbacks of different requests if it weren't for AsyncContexts).

Now both qcow and qcow2 are protected by CoMutexes and AsyncContexts can be
removed.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-08-02 15:53:41 +02:00
Kevin Wolf 7887f6201f Allow nested qemu_bh_poll() after BH deletion
Without this, qemu segfaults when a BH handler first deletes its BH and
then calls another function which involves a nested qemu_bh_poll() call.

This can be reproduced by generating an I/O error (e.g. with blkdebug) on
an IDE device and using rerror/werror=stop to stop the VM. When continuing
the VM, qemu segfaults.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
2011-06-15 15:43:20 +02:00
Kevin Wolf 9a1e948129 Introduce contexts for asynchronous callbacks
Add the possibility to use AIO and BHs without allowing foreign callbacks to be
run. Basically, you put your own AIOs and BHs in a separate context. For
details see the comments in the source.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-10-27 12:28:59 -05:00
Kevin Wolf 4f999d05f5 Split out bottom halves
Instead of putting more and more stuff into vl.c, let's have the generic
functions that deal with asynchronous callbacks in their own file.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-10-27 12:28:59 -05:00