qemu-e2k

Author	SHA1	Message	Date
Stefan Hajnoczi	6d740fb01b	aio-posix: do not nest poll handlers QEMU's event loop supports nesting, which means that event handler functions may themselves call aio_poll(). The condition that triggered a handler must be reset before the nested aio_poll() call, otherwise the same handler will be called and immediately re-enter aio_poll. This leads to an infinite loop and stack exhaustion. Poll handlers are especially prone to this issue, because they typically reset their condition by finishing the processing of pending work. Unfortunately it is during the processing of pending work that nested aio_poll() calls typically occur and the condition has not yet been reset. Disable a poll handler during ->io_poll_ready() so that a nested aio_poll() call cannot invoke ->io_poll_ready() again. As a result, the disabled poll handler and its associated fd handler do not run during the nested aio_poll(). Calling aio_set_fd_handler() from inside nested aio_poll() could cause it to run again. If the fd handler is pending inside nested aio_poll(), then it will also run again. In theory fd handlers can be affected by the same issue, but they are more likely to reset the condition before calling nested aio_poll(). This is a special case and it's somewhat complex, but I don't see a way around it as long as nested aio_poll() is supported. Cc: qemu-stable@nongnu.org Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2186181 Fixes: `c382706925` ("block: Mark bdrv_co_io_(un)plug() and callers GRAPH_RDLOCK") Cc: Kevin Wolf <kwolf@redhat.com> Cc: Emanuele Giuseppe Esposito <eesposit@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20230502184134.534703-2-stefanha@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2023-05-19 19:12:12 +02:00
Marc-André Lureau	6eeef4477a	aio: make aio_set_fd_poll() static to aio-posix.c Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Stefan Berger <stefanb@linux.ibm.com> Message-Id: <20230221124802.4103554-9-marcandre.lureau@redhat.com>	2023-03-13 15:23:37 +04:00
Chao Gao	816a430c51	util/aio: Defer disabling poll mode as long as possible When we measure FIO read performance (cache=writethrough, bs=4k, iodepth=64) in VMs, ~80K/s notifications (e.g., EPT_MISCONFIG) are observed from guest to qemu. It turns out those frequent notificatons are caused by interference from worker threads. Worker threads queue bottom halves after completing IO requests. Pending bottom halves may lead to either aio_compute_timeout() zeros timeout and pass it to try_poll_mode() or run_poll_handlers() returns no progress after noticing pending aio_notify() events. Both cause run_poll_handlers() to call poll_set_started(false) to disable poll mode. However, for both cases, as timeout is already zeroed, the event loop (i.e., aio_poll()) just processes bottom halves and then starts the next event loop iteration. So, disabling poll mode has no value but leads to unnecessary notifications from guest. To minimize unnecessary notifications from guest, defer disabling poll mode to when the event loop is about to be blocked. With this patch applied, FIO seq-read performance (bs=4k, iodepth=64, cache=writethrough) in VMs increases from 330K/s to 413K/s IOPS. Suggested-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Chao Gao <chao.gao@intel.com> Message-id: 20220710120849.63086-1-chao.gao@intel.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2023-01-23 15:01:22 -05:00
Nicolas Saenz Julienne	71ad4713cc	util/event-loop-base: Introduce options to set the thread pool size The thread pool regulates itself: when idle, it kills threads until empty, when in demand, it creates new threads until full. This behaviour doesn't play well with latency sensitive workloads where the price of creating a new thread is too high. For example, when paired with qemu's '-mlock', or using safety features like SafeStack, creating a new thread has been measured take multiple milliseconds. In order to mitigate this let's introduce a new 'EventLoopBase' property to set the thread pool size. The threads will be created during the pool's initialization or upon updating the property's value, remain available during its lifetime regardless of demand, and destroyed upon freeing it. A properly characterized workload will then be able to configure the pool to avoid any latency spikes. Signed-off-by: Nicolas Saenz Julienne <nsaenzju@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Acked-by: Markus Armbruster <armbru@redhat.com> Message-id: 20220425075723.20019-4-nsaenzju@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2022-05-09 10:43:23 +01:00
Stefan Hajnoczi	fc8796465c	aio-posix: fix spurious ->poll_ready() callbacks in main loop When ->poll() succeeds the AioHandler is placed on the ready list with revents set to the magic value 0. This magic value causes aio_dispatch_handler() to invoke ->poll_ready() instead of ->io_read() for G_IO_IN or ->io_write() for G_IO_OUT. This magic value 0 hack works for the IOThread where AioHandlers are placed on ->ready_list and processed by aio_dispatch_ready_handlers(). It does not work for the main loop where all AioHandlers are processed by aio_dispatch_handlers(), even those that are not ready and have a revents value of 0. As a result the main loop invokes ->poll_ready() on AioHandlers that are not ready. These spurious ->poll_ready() calls waste CPU cycles and could lead to crashes if the code assumes ->poll() must have succeeded before ->poll_ready() is called (a reasonable asumption but I haven't seen it in practice). Stop using revents to track whether ->poll_ready() will be called on an AioHandler. Introduce a separate AioHandler->poll_ready field instead. This eliminates spurious ->poll_ready() calls in the main loop. Fixes: `826cc32423` ("aio-posix: split poll check from ready handler") Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reported-by: Jason Wang <jasowang@redhat.com> Tested-by: Jason Wang <jasowang@redhat.com> Message-id: 20220223155703.136833-1-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2022-03-17 11:23:18 +00:00
Stefan Hajnoczi	826cc32423	aio-posix: split poll check from ready handler Adaptive polling measures the execution time of the polling check plus handlers called when a polled event becomes ready. Handlers can take a significant amount of time, making it look like polling was running for a long time when in fact the event handler was running for a long time. For example, on Linux the io_submit(2) syscall invoked when a virtio-blk device's virtqueue becomes ready can take 10s of microseconds. This can exceed the default polling interval (32 microseconds) and cause adaptive polling to stop polling. By excluding the handler's execution time from the polling check we make the adaptive polling calculation more accurate. As a result, the event loop now stays in polling mode where previously it would have fallen back to file descriptor monitoring. The following data was collected with virtio-blk num-queues=2 event_idx=off using an IOThread. Before: 168k IOPS, IOThread syscalls: 9837.115 ( 0.020 ms): IO iothread1/620155 io_submit(ctx_id: 140512552468480, nr: 16, iocbpp: 0x7fcb9f937db0) = 16 9837.158 ( 0.002 ms): IO iothread1/620155 write(fd: 103, buf: 0x556a2ef71b88, count: 8) = 8 9837.161 ( 0.001 ms): IO iothread1/620155 write(fd: 104, buf: 0x556a2ef71b88, count: 8) = 8 9837.163 ( 0.001 ms): IO iothread1/620155 ppoll(ufds: 0x7fcb90002800, nfds: 4, tsp: 0x7fcb9f1342d0, sigsetsize: 8) = 3 9837.164 ( 0.001 ms): IO iothread1/620155 read(fd: 107, buf: 0x7fcb9f939cc0, count: 512) = 8 9837.174 ( 0.001 ms): IO iothread1/620155 read(fd: 105, buf: 0x7fcb9f939cc0, count: 512) = 8 9837.176 ( 0.001 ms): IO iothread1/620155 read(fd: 106, buf: 0x7fcb9f939cc0, count: 512) = 8 9837.209 ( 0.035 ms): IO iothread1/620155 io_submit(ctx_id: 140512552468480, nr: 32, iocbpp: 0x7fca7d0cebe0) = 32 174k IOPS (+3.6%), IOThread syscalls: 9809.566 ( 0.036 ms): IO iothread1/623061 io_submit(ctx_id: 140539805028352, nr: 32, iocbpp: 0x7fd0cdd62be0) = 32 9809.625 ( 0.001 ms): IO iothread1/623061 write(fd: 103, buf: 0x5647cfba5f58, count: 8) = 8 9809.627 ( 0.002 ms): IO iothread1/623061 write(fd: 104, buf: 0x5647cfba5f58, count: 8) = 8 9809.663 ( 0.036 ms): IO iothread1/623061 io_submit(ctx_id: 140539805028352, nr: 32, iocbpp: 0x7fd0d0388b50) = 32 Notice that ppoll(2) and eventfd read(2) syscalls are eliminated because the IOThread stays in polling mode instead of falling back to file descriptor monitoring. As usual, polling is not implemented on Windows so this patch ignores the new io_poll_read() callback in aio-win32.c. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Message-id: 20211207132336.36627-2-stefanha@redhat.com [Fixed up aio_set_event_notifier() calls in tests/unit/test-fdmon-epoll.c added after this series was queued. --Stefan] Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2022-01-12 17:09:39 +00:00
Stefano Garzarella	1793ad0247	iothread: add aio-max-batch parameter The `aio-max-batch` parameter will be propagated to AIO engines and it will be used to control the maximum number of queued requests. When there are in queue a number of requests equal to `aio-max-batch`, the engine invokes the system call to forward the requests to the kernel. This parameter allows us to control the maximum batch size to reduce the latency that requests might accumulate while queued in the AIO engine queue. If `aio-max-batch` is equal to 0 (default value), the AIO engine will use its default maximum batch size value. Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Message-id: 20210721094211.69853-3-sgarzare@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2021-07-21 13:47:50 +01:00
Kevin Wolf	9ce44e2ce2	qmp: Move dispatcher to a coroutine This moves the QMP dispatcher to a coroutine and runs all QMP command handlers that declare 'coroutine': true in coroutine context so they can avoid blocking the main loop while doing I/O or waiting for other events. For commands that are not declared safe to run in a coroutine, the dispatcher drops out of coroutine context by calling the QMP command handler from a bottom half. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20201005155855.256490-10-kwolf@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>	2020-10-09 07:08:20 +02:00
Stefan Hajnoczi	d73415a315	qemu/atomic.h: rename atomic_ to qatomic_ clang's C11 atomic_fetch_() functions only take a C11 atomic type pointer argument. QEMU uses direct types (int, etc) and this causes a compiler error when a QEMU code calls these functions in a source file that also included <stdatomic.h> via a system header file: $ CC=clang CXX=clang++ ./configure ... && make ../util/async.c:79:17: error: address argument to atomic operation must be a pointer to _Atomic type ('unsigned int ' invalid) Avoid using atomic_*() names in QEMU's atomic.h since that namespace is used by <stdatomic.h>. Prefix QEMU's APIs with 'q' so that atomic.h and <stdatomic.h> can co-exist. I checked /usr/include on my machine and searched GitHub for existing "qatomic_" users but there seem to be none. This patch was generated using: $ git grep -h -o '\<atomic$64$\?_[a-z0-9_]\+' include/qemu/atomic.h \| \ sort -u >/tmp/changed_identifiers $ for identifier in $(</tmp/changed_identifiers); do sed -i "s%\<$identifier\>%q$identifier%g" \ $(git grep -I -l "\<$identifier\>") done I manually fixed line-wrap issues and misaligned rST tables. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20200923105646.47864-1-stefanha@redhat.com>	2020-09-23 16:07:44 +01:00
Stefan Hajnoczi	44277bf914	aio-posix: keep aio_notify_me disabled during polling Polling only monitors the ctx->notified field and does not need the ctx->notifier EventNotifier to be signalled. Keep ctx->aio_notify_me disabled while polling to avoid unnecessary EventNotifier syscalls. This optimization improves virtio-blk 4KB random read performance by 18%. The following results are with an IOThread and the null-co block driver: Test IOPS Error Before 244518.62 ± 1.20% After 290706.11 ± 0.44% Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20200806131802.569478-4-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-08-13 13:34:14 +01:00
Stefan Hajnoczi	ba607ca8bf	aio-posix: disable fdmon-io_uring when GSource is used The glib event loop does not call fdmon_io_uring_wait() so fd handlers waiting to be submitted build up in the list. There is no benefit is using io_uring when the glib GSource is being used, so disable it instead of implementing a more complex fix. This fixes a memory leak where AioHandlers would build up and increasing amounts of CPU time were spent iterating them in aio_pending(). The symptom is that guests become slow when QEMU is built with io_uring support. Buglink: https://bugs.launchpad.net/qemu/+bug/1877716 Fixes: `73fd282e7b` ("aio-posix: add io_uring fd monitoring implementation") Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Oleksandr Natalenko <oleksandr@redhat.com> Message-id: 20200511183630.279750-3-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-05-18 18:16:00 +01:00
Stefan Hajnoczi	de137e44f7	aio-posix: don't duplicate fd handler deletion in fdmon_io_uring_destroy() The io_uring file descriptor monitoring implementation has an internal list of fd handlers that are pending submission to io_uring. fdmon_io_uring_destroy() deletes all fd handlers on the list. Don't delete fd handlers directly in fdmon_io_uring_destroy() for two reasons: 1. This duplicates the aio-posix.c AioHandler deletion code and could become outdated if the struct changes. 2. Only handlers with the FDMON_IO_URING_REMOVE flag set are safe to remove. If the flag is not set then something still has a pointer to the fd handler. Let aio-posix.c and its user worry about that. In practice this isn't an issue because fdmon_io_uring_destroy() is only called when shutting down so all users have removed their fd handlers, but the next patch will need this! Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Oleksandr Natalenko <oleksandr@redhat.com> Message-id: 20200511183630.279750-2-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-05-18 18:16:00 +01:00
Paolo Bonzini	5710a3e09f	async: use explicit memory barriers When using C11 atomics, non-seqcst reads and writes do not participate in the total order of seqcst operations. In util/async.c and util/aio-posix.c, in particular, the pattern that we use write ctx->notify_me write bh->scheduled read bh->scheduled read ctx->notify_me if !bh->scheduled, sleep if ctx->notify_me, notify needs to use seqcst operations for both the write and the read. In general this is something that we do not want, because there can be many sources that are polled in addition to bottom halves. The alternative is to place a seqcst memory barrier between the write and the read. This also comes with a disadvantage, in that the memory barrier is implicit on strongly-ordered architectures and it wastes a few dozen clock cycles. Fortunately, ctx->notify_me is never written concurrently by two threads, so we can assert that and relax the writes to ctx->notify_me. The resulting solution works and performs well on both aarch64 and x86. Note that the atomic_set/atomic_read combination is not an atomic read-modify-write, and therefore it is even weaker than C11 ATOMIC_RELAXED; on x86, ATOMIC_RELAXED compiles to a locked operation. Analyzed-by: Ying Fang <fangying1@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Tested-by: Ying Fang <fangying1@huawei.com> Message-Id: <20200407140746.8041-6-pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-04-09 16:17:14 +01:00
Stefan Hajnoczi	d37d0e365a	aio-posix: remove idle poll handlers to improve scalability When there are many poll handlers it's likely that some of them are idle most of the time. Remove handlers that haven't had activity recently so that the polling loop scales better for guests with a large number of devices. This feature only takes effect for the Linux io_uring fd monitoring implementation because it is capable of combining fd monitoring with userspace polling. The other implementations can't do that and risk starving fds in favor of poll handlers, so don't try this optimization when they are in use. IOPS improves from 10k to 105k when the guest has 100 virtio-blk-pci,num-queues=32 devices and 1 virtio-blk-pci,num-queues=1 device for rw=randread,iodepth=1,bs=4k,ioengine=libaio on NVMe. [Clarified aio_poll_handlers locking discipline explanation in comment after discussion with Paolo Bonzini <pbonzini@redhat.com>. --Stefan] Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Link: https://lore.kernel.org/r/20200305170806.1313245-8-stefanha@redhat.com Message-Id: <20200305170806.1313245-8-stefanha@redhat.com>	2020-03-09 16:45:16 +00:00
Stefan Hajnoczi	aa38e19f05	aio-posix: support userspace polling of fd monitoring Unlike ppoll(2) and epoll(7), Linux io_uring completions can be polled from userspace. Previously userspace polling was only allowed when all AioHandler's had an ->io_poll() callback. This prevented starvation of fds by userspace pollable handlers. Add the FDMonOps->need_wait() callback that enables userspace polling even when some AioHandlers lack ->io_poll(). For example, it's now possible to do userspace polling when a TCP/IP socket is monitored thanks to Linux io_uring. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Link: https://lore.kernel.org/r/20200305170806.1313245-7-stefanha@redhat.com Message-Id: <20200305170806.1313245-7-stefanha@redhat.com>	2020-03-09 16:41:31 +00:00
Stefan Hajnoczi	73fd282e7b	aio-posix: add io_uring fd monitoring implementation The recent Linux io_uring API has several advantages over ppoll(2) and epoll(2). Details are given in the source code. Add an io_uring implementation and make it the default on Linux. Performance is the same as with epoll(7) but later patches add optimizations that take advantage of io_uring. It is necessary to change how aio_set_fd_handler() deals with deleting AioHandlers since removing monitored file descriptors is asynchronous in io_uring. fdmon_io_uring_remove() marks the AioHandler deleted and aio_set_fd_handler() will let it handle deletion in that case. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Link: https://lore.kernel.org/r/20200305170806.1313245-6-stefanha@redhat.com Message-Id: <20200305170806.1313245-6-stefanha@redhat.com>	2020-03-09 16:41:31 +00:00
Stefan Hajnoczi	b321051cf4	aio-posix: simplify FDMonOps->update() prototype The AioHandler node, bool is_new arguments are more complicated to think about than simply being given AioHandler old_node, AioHandler *new_node. Furthermore, the new Linux io_uring file descriptor monitoring mechanism added by the new patch requires access to both the old and the new nodes. Make this change now in preparation. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Link: https://lore.kernel.org/r/20200305170806.1313245-5-stefanha@redhat.com Message-Id: <20200305170806.1313245-5-stefanha@redhat.com>	2020-03-09 16:41:31 +00:00
Stefan Hajnoczi	1f050a4690	aio-posix: extract ppoll(2) and epoll(7) fd monitoring The ppoll(2) and epoll(7) file descriptor monitoring implementations are mixed with the core util/aio-posix.c code. Before adding another implementation for Linux io_uring, extract out the existing ones so there is a clear interface and the core code is simpler. The new interface is AioContext->fdmon_ops, a pointer to a FDMonOps struct. See the patch for details. Semantic changes: 1. ppoll(2) now reflects events from pollfds[] back into AioHandlers while we're still on the clock for adaptive polling. This was already happening for epoll(7), so if it's really an issue then we'll need to fix both in the future. 2. epoll(7)'s fallback to ppoll(2) while external events are disabled was broken when the number of fds exceeded the epoll(7) upgrade threshold. I guess this code path simply wasn't tested and no one noticed the bug. I didn't go out of my way to fix it but the correct code is simpler than preserving the bug. I also took some liberties in removing the unnecessary AioContext->epoll_available (just check AioContext->epollfd != -1 instead) and AioContext->epoll_enabled (it's implicit if our AioContext->fdmon_ops callbacks are being invoked) fields. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Link: https://lore.kernel.org/r/20200305170806.1313245-4-stefanha@redhat.com Message-Id: <20200305170806.1313245-4-stefanha@redhat.com>	2020-03-09 16:41:31 +00:00
Stefan Hajnoczi	3aa221b382	aio-posix: move RCU_READ_LOCK() into run_poll_handlers() Now that run_poll_handlers_once() is only called by run_poll_handlers() we can improve the CPU time profile by moving the expensive RCU_READ_LOCK() out of the polling loop. This reduces the run_poll_handlers() from 40% CPU to 10% CPU in perf's sampling profiler output. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Link: https://lore.kernel.org/r/20200305170806.1313245-3-stefanha@redhat.com Message-Id: <20200305170806.1313245-3-stefanha@redhat.com>	2020-03-09 16:41:31 +00:00
Stefan Hajnoczi	e4346192f1	aio-posix: completely stop polling when disabled One iteration of polling is always performed even when polling is disabled. This is done because: 1. Userspace polling is cheaper than making a syscall. We might get lucky. 2. We must poll once more after polling has stopped in case an event occurred while stopping polling. However, there are downsides: 1. Polling becomes a bottleneck when the number of event sources is very high. It's more efficient to monitor fds in that case. 2. A high-frequency polling event source can starve non-polling event sources because ppoll(2)/epoll(7) is never invoked. This patch removes the forced polling iteration so that poll_ns=0 really means no polling. IOPS increases from 10k to 60k when the guest has 100 virtio-blk-pci,num-queues=32 devices and 1 virtio-blk-pci,num-queues=1 device because the large number of event sources being polled slows down the event loop. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Link: https://lore.kernel.org/r/20200305170806.1313245-2-stefanha@redhat.com Message-Id: <20200305170806.1313245-2-stefanha@redhat.com>	2020-03-09 16:41:31 +00:00
Stefan Hajnoczi	c39cbedb54	aio-posix: remove confusing QLIST_SAFE_REMOVE() QLIST_SAFE_REMOVE() is confusing here because the node must be on the list. We actually just wanted to clear the linked list pointers when removing it from the list. QLIST_REMOVE() now does this, so switch to it. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Link: https://lore.kernel.org/r/20200224103406.1894923-3-stefanha@redhat.com Message-Id: <20200224103406.1894923-3-stefanha@redhat.com>	2020-03-09 16:39:20 +00:00
Stefan Hajnoczi	7391d34c3c	aio-posix: make AioHandler dispatch O(1) with epoll File descriptor monitoring is O(1) with epoll(7), but aio_dispatch_handlers() still scans all AioHandlers instead of dispatching just those that are ready. This makes aio_poll() O(n) with respect to the total number of registered handlers. Add a local ready_list to aio_poll() so that each nested aio_poll() builds a list of handlers ready to be dispatched. Since file descriptor polling is level-triggered, nested aio_poll() calls also see fds that were ready in the parent but not yet dispatched. This guarantees that nested aio_poll() invocations will dispatch all fds, even those that became ready before the nested invocation. Since only handlers ready to be dispatched are placed onto the ready_list, the new aio_dispatch_ready_handlers() function provides O(1) dispatch. Note that AioContext polling is still O(n) and currently cannot be fully disabled. This still needs to be fixed before aio_poll() is fully O(1). Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Sergio Lopez <slp@redhat.com> Message-id: 20200214171712.541358-6-stefanha@redhat.com [Fix compilation error on macOS where there is no epoll(87). The aio_epoll() prototype was out of date and aio_add_ready_list() needed to be moved outside the ifdef. --Stefan] Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-02-22 08:26:47 +00:00
Stefan Hajnoczi	4749079ce0	aio-posix: make AioHandler deletion O(1) It is not necessary to scan all AioHandlers for deletion. Keep a list of deleted handlers instead of scanning the full list of all handlers. The AioHandler->deleted field can be dropped. Let's check if the handler has been inserted into the deleted list instead. Add a new QLIST_IS_INSERTED() API for this check. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Sergio Lopez <slp@redhat.com> Message-id: 20200214171712.541358-5-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-02-22 08:26:47 +00:00
Stefan Hajnoczi	ca8c6b2275	aio-posix: don't pass ns timeout to epoll_wait() Don't pass the nanosecond timeout into epoll_wait(), which expects milliseconds. The epoll_wait() timeout value does not matter if qemu_poll_ns() determined that the poll fd is ready, but passing a value in the wrong units is still ugly. Pass a 0 timeout to epoll_wait() instead. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Sergio Lopez <slp@redhat.com> Message-id: 20200214171712.541358-3-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-02-22 08:26:47 +00:00
Stefan Hajnoczi	ff29ed3a33	aio-posix: fix use after leaving scope in aio_poll() epoll_handler is a stack variable and must not be accessed after it goes out of scope: if (aio_epoll_check_poll(ctx, pollfds, npfd, timeout)) { AioHandler epoll_handler; ... add_pollfd(&epoll_handler); ret = aio_epoll(ctx, pollfds, npfd, timeout); } ... ... /* if we have any readable fds, dispatch event */ if (ret > 0) { for (i = 0; i < npfd; i++) { nodes[i]->pfd.revents = pollfds[i].revents; } } nodes[0] is &epoll_handler, which has already gone out of scope. There is no need to use pollfds[] for epoll. We don't need an AioHandler for the epoll fd. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Sergio Lopez <slp@redhat.com> Message-id: 20200214171712.541358-2-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-02-22 08:26:47 +00:00
Stefan Hajnoczi	f25c0b5479	aio-posix: avoid reacquiring rcu_read_lock() when polling The first rcu_read_lock/unlock() is expensive. Nested calls are cheap. This optimization increases IOPS from 73k to 162k with a Linux guest that has 2 virtio-blk,num-queues=1 and 99 virtio-blk,num-queues=32 devices. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 20200218182708.914552-1-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-02-22 08:26:47 +00:00
Markus Armbruster	a8d2532645	Include qemu-common.h exactly where needed No header includes qemu-common.h after this commit, as prescribed by qemu-common.h's file comment. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20190523143508.25387-5-armbru@redhat.com> [Rebased with conflicts resolved automatically, except for include/hw/arm/xlnx-zynqmp.h hw/arm/nrf51_soc.c hw/arm/msf2-soc.c block/qcow2-refcount.c block/qcow2-cluster.c block/qcow2-cache.c target/arm/cpu.h target/lm32/cpu.h target/m68k/cpu.h target/mips/cpu.h target/moxie/cpu.h target/nios2/cpu.h target/openrisc/cpu.h target/riscv/cpu.h target/tilegx/cpu.h target/tricore/cpu.h target/unicore32/cpu.h target/xtensa/cpu.h; bsd-user/main.c and net/tap-bsd.c fixed up]	2019-06-12 13:20:20 +02:00
Paolo Bonzini	993ed89f35	aio-posix: ensure poll mode is left when aio_notify is called With aio=thread, adaptive polling makes latency worse rather than better, because it delays the execution of the ThreadPool's completion bottom half. event_notifier_poll() does run while polling, detecting that a bottom half was scheduled by a worker thread, but because ctx->notifier is explicitly ignored in run_poll_handlers_once(), scheduling the BH does not count as making progress and run_poll_handlers() keeps running. Fix this by recomputing the deadline after *timeout could have changed. With this change, ThreadPool still cannot participate in polling but at least it does not suffer from extra latency. Reported-by: Sergio Lopez <slp@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 20190409122823.12416-1-pbonzini@redhat.com Cc: Stefan Hajnoczi <stefanha@gmail.com> Cc: Kevin Wolf <kwolf@redhat.com> Cc: qemu-block@nongnu.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <1553692145-86728-1-git-send-email-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20190409122823.12416-1-pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2019-05-10 10:53:21 +01:00
Kevin Wolf	0dc165c1bd	aio-posix: Assert that aio_poll() is always called in home thread aio_poll() has an existing assertion that the function is only called from the AioContext's home thread if blocking is allowed. This is not enough, some handlers make assumptions about the thread they run in. Extend the assertion to non-blocking calls, too. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2019-02-25 15:03:19 +01:00
Remy Noel	fef1660132	aio-posix: Fix concurrent aio_poll/set_fd_handler. It is possible for an io_poll callback to be concurrently executed along with an aio_set_fd_handlers. This can cause all sorts of problems, like a NULL callback or a bad opaque pointer. This changes set_fd_handlers so that it no longer modify existing handlers entries and instead, always insert those after having proper initialisation. Tested-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Remy Noel <remy.noel@blade-group.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 20181220152030.28035-3-remy.noel@blade-group.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2019-01-14 14:09:41 +00:00
Remy Noel	8821b34a73	aio-posix: Unregister fd from ctx epoll when removing fd_handler. Cleaning the events will cause aio_epoll_update to unregister the fd. Otherwise, the fd is kept registered until it is destroyed. Signed-off-by: Remy Noel <remy.noel@blade-group.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 20181220152030.28035-2-remy.noel@blade-group.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2019-01-14 14:09:41 +00:00
Li Qiang	82dfee5a68	util: aio-posix: fix a typo Cc: qemu-trivial@nongnu.org Signed-off-by: Li Qiang <liq3ea@gmail.com> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 1538964972-3223-1-git-send-email-liq3ea@gmail.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2018-10-29 13:35:22 +00:00
Paolo Bonzini	cfeb35d677	aio-posix: do skip system call if ctx->notifier polling succeeds Commit `70232b5253` ("aio-posix: Don't count ctx->notifier as progress when 2018-08-15), by not reporting progress, causes aio_poll to execute the system call when polling succeeds because of ctx->notifier. This introduces latency before the call to aio_bh_poll() and negates the advantages of polling, unfortunately. The fix builds on the previous patch, separating the effect of polling on the timeout from the progress reported to aio_poll(). ctx->notifier does zero the timeout, causing the caller to skip the system call, but it does not report progress, so that the bug fix of commit `70232b5253` still stands. Fixes: `70232b5253` Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20180912171040.1732-4-pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com>	2018-09-26 10:46:21 +08:00
Paolo Bonzini	e30cffa04d	aio-posix: compute timeout before polling This is a preparation for the next patch, and also a very small optimization. Compute the timeout only once, before invoking try_poll_mode, and adjust it in run_poll_handlers. The adjustment is the polling time when polling fails, or zero (non-blocking) if polling succeeds. Fixes: `70232b5253` Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20180912171040.1732-3-pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com>	2018-09-26 10:46:21 +08:00
Paolo Bonzini	d7be5dd19c	aio-posix: fix concurrent access to poll_disable_cnt It is valid for an aio_set_fd_handler to happen concurrently with aio_poll. In that case, poll_disable_cnt can change under the heels of aio_poll, and the assertion on poll_disable_cnt can fail in run_poll_handlers. Therefore, this patch simply checks the counter on every polling iteration. There are no particular needs for ordering, since the polling loop is terminated anyway by aio_notify at the end of aio_set_fd_handler. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20180912171040.1732-2-pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com>	2018-09-26 10:46:21 +08:00
Fam Zheng	37a81812f7	aio-posix: Improve comment around marking node deleted The counter is for qemu_lockcnt_inc/dec sections (read side), qemu_lockcnt_lock/unlock is for the write side. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com> Message-Id: <20180803063917.30292-1-famz@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com>	2018-08-15 10:12:35 +08:00
Fam Zheng	b37548fcd1	aio: Do aio_notify_accept only during blocking aio_poll An aio_notify() pairs with an aio_notify_accept(). The former should happen in the main thread or a vCPU thread, and the latter should be done in the IOThread. There is one rare case that the main thread or vCPU thread may "steal" the aio_notify() event just raised by itself, in bdrv_set_aio_context() [1]. The sequence is like this: main thread IO Thread =============================================================== bdrv_drained_begin() aio_disable_external(ctx) aio_poll(ctx, true) ctx->notify_me += 2 ... bdrv_drained_end() ... aio_notify() ... bdrv_set_aio_context() aio_poll(ctx, false) [1] aio_notify_accept(ctx) ppoll() /* Hang! */ [1] is problematic. It will clear the ctx->notifier event so that the blocked ppoll() will not return. (For the curious, this bug was noticed when booting a number of VMs simultaneously in RHV. One or two of the VMs will hit this race condition, making the VIRTIO device unresponsive to I/O commands. When it hangs, Seabios is busy waiting for a read request to complete (read MBR), right after initializing the virtio-blk-pci device, using 100% guest CPU. See also https://bugzilla.redhat.com/show_bug.cgi?id=1562750 for the original bug analysis.) aio_notify() only injects an event when ctx->notify_me is set, correspondingly aio_notify_accept() is only useful when ctx->notify_me _was_ set. Move the call to it into the "blocking" branch. This will effectively skip [1] and fix the hang. Furthermore, blocking aio_poll is only allowed on home thread (in_aio_context_home_thread), because otherwise two blocking aio_poll()'s can steal each other's ctx->notifier event and cause hanging just like described above. Cc: qemu-stable@nongnu.org Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com> Message-Id: <20180809132259.18402-3-famz@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com>	2018-08-15 10:12:35 +08:00
Fam Zheng	70232b5253	aio-posix: Don't count ctx->notifier as progress when polling The same logic exists in fd polling. This change is especially important to avoid busy loop once we limit aio_notify_accept() to blocking aio_poll(). Cc: qemu-stable@nongnu.org Signed-off-by: Fam Zheng <famz@redhat.com> Message-Id: <20180809132259.18402-2-famz@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com>	2018-08-15 10:12:35 +08:00
Jie Wang	cd0a6d2b2c	iothread: fix epollfd leak in the process of delIOThread When we call addIOThread, the epollfd created in aio_context_setup, but not close it in the process of delIOThread, so the epollfd will leak. Reorder the code in aio_epoll_disable and reuse it. Signed-off-by: Jie Wang <wangjie88@huawei.com> Message-Id: <1526517763-11108-1-git-send-email-wangjie88@huawei.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> [Mention change to aio_epoll_disable in commit message. - Fam] Signed-off-by: Fam Zheng <famz@redhat.com>	2018-05-18 17:09:54 +08:00
Philippe Mathieu-Daudé	8f801baf3a	async: use ARRAY_SIZE macro Applied using the Coccinelle semantic patch scripts/coccinelle/use_osdep.cocci Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>	2018-02-10 10:43:18 +03:00
Stefan Hajnoczi	ef9115dd7c	aio-posix: drop QEMU_AIO_POLL_MAX_NS env var This hunk should not have been merged but I forgot to remove it. Let's remove it before it slips into a QEMU release. ¯\_(ツ)_/¯ Reviewed-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20171103154041.12617-1-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-11-06 11:04:38 +00:00
Stefan Hajnoczi	f708a5e71c	aio: fix assert when remove poll during destroy After iothread is enabled internally inside QEMU with GMainContext, we may encounter this warning when destroying the iothread: (qemu-system-x86_64:19925): GLib-CRITICAL **: g_source_remove_poll: assertion '!SOURCE_DESTROYED (source)' failed The problem is that g_source_remove_poll() does not allow to remove one source from array if the source is detached from its owner context. (peterx: which IMHO does not make much sense) Fix it on QEMU side by avoid calling g_source_remove_poll() if we know the object is during destruction, and we won't leak anything after all since the array will be gone soon cleanly even with that fd. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-id: 20170928025958.1420-6-peterx@redhat.com [peterx: write the commit message] Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-10-03 14:36:19 -04:00
Paolo Bonzini	bd451435c0	async: remove unnecessary inc/dec pairs Pull the increment/decrement pair out of aio_bh_poll and into the callers. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Daniel P. Berrange <berrange@redhat.com> Message-id: 20170213135235.12274-18-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-02-21 11:39:40 +00:00
Paolo Bonzini	a153bf52b3	aio-posix: partially inline aio_dispatch into aio_poll This patch prepares for the removal of unnecessary lockcnt inc/dec pairs. Extract the dispatching loop for file descriptor handlers into a new function aio_dispatch_handlers, and then inline aio_dispatch into aio_poll. aio_dispatch can now become void. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Daniel P. Berrange <berrange@redhat.com> Message-id: 20170213135235.12274-17-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-02-21 11:39:39 +00:00
Paolo Bonzini	9d45665448	block: explicitly acquire aiocontext in callbacks that need it This covers both file descriptor callbacks and polling callbacks, since they execute related code. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Daniel P. Berrange <berrange@redhat.com> Message-id: 20170213135235.12274-14-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-02-21 11:39:36 +00:00
Paolo Bonzini	2f47da5f7f	block: explicitly acquire aiocontext in timers that need it Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Daniel P. Berrange <berrange@redhat.com> Message-id: 20170213135235.12274-13-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-02-21 11:14:08 +00:00
Paolo Bonzini	0836c72f70	aio: push aio_context_acquire/release down to dispatching The AioContext data structures are now protected by list_lock and/or they are walked with FOREACH_RCU primitives. There is no need anymore to acquire the AioContext for the entire duration of aio_dispatch. Instead, just acquire it before and after invoking the callbacks. The next step is then to push it further down. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Daniel P. Berrange <berrange@redhat.com> Message-id: 20170213135235.12274-12-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-02-21 11:14:08 +00:00
Paolo Bonzini	c2b38b277a	block: move AioContext, QEMUTimer, main-loop to libqemuutil AioContext is fairly self contained, the only dependency is QEMUTimer but that in turn doesn't need anything else. So move them out of block-obj-y to avoid introducing a dependency from io/ to block-obj-y. main-loop and its dependency iohandler also need to be moved, because later in this series io/ will call iohandler_get_aio_context. [Changed copyright "the QEMU team" to "other QEMU contributors" as suggested by Daniel Berrange and agreed by Paolo. --Stefan] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 20170213135235.12274-2-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-02-21 11:14:07 +00:00

48 Commits