When there are many poll handlers it's likely that some of them are idle
most of the time. Remove handlers that haven't had activity recently so
that the polling loop scales better for guests with a large number of
devices.
This feature only takes effect for the Linux io_uring fd monitoring
implementation because it is capable of combining fd monitoring with
userspace polling. The other implementations can't do that and risk
starving fds in favor of poll handlers, so don't try this optimization
when they are in use.
IOPS improves from 10k to 105k when the guest has 100
virtio-blk-pci,num-queues=32 devices and 1 virtio-blk-pci,num-queues=1
device for rw=randread,iodepth=1,bs=4k,ioengine=libaio on NVMe.
[Clarified aio_poll_handlers locking discipline explanation in comment
after discussion with Paolo Bonzini <pbonzini@redhat.com>.
--Stefan]
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Link: https://lore.kernel.org/r/20200305170806.1313245-8-stefanha@redhat.com
Message-Id: <20200305170806.1313245-8-stefanha@redhat.com>
The recent Linux io_uring API has several advantages over ppoll(2) and
epoll(2). Details are given in the source code.
Add an io_uring implementation and make it the default on Linux.
Performance is the same as with epoll(7) but later patches add
optimizations that take advantage of io_uring.
It is necessary to change how aio_set_fd_handler() deals with deleting
AioHandlers since removing monitored file descriptors is asynchronous in
io_uring. fdmon_io_uring_remove() marks the AioHandler deleted and
aio_set_fd_handler() will let it handle deletion in that case.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Link: https://lore.kernel.org/r/20200305170806.1313245-6-stefanha@redhat.com
Message-Id: <20200305170806.1313245-6-stefanha@redhat.com>
The ppoll(2) and epoll(7) file descriptor monitoring implementations are
mixed with the core util/aio-posix.c code. Before adding another
implementation for Linux io_uring, extract out the existing
ones so there is a clear interface and the core code is simpler.
The new interface is AioContext->fdmon_ops, a pointer to a FDMonOps
struct. See the patch for details.
Semantic changes:
1. ppoll(2) now reflects events from pollfds[] back into AioHandlers
while we're still on the clock for adaptive polling. This was
already happening for epoll(7), so if it's really an issue then we'll
need to fix both in the future.
2. epoll(7)'s fallback to ppoll(2) while external events are disabled
was broken when the number of fds exceeded the epoll(7) upgrade
threshold. I guess this code path simply wasn't tested and no one
noticed the bug. I didn't go out of my way to fix it but the correct
code is simpler than preserving the bug.
I also took some liberties in removing the unnecessary
AioContext->epoll_available (just check AioContext->epollfd != -1
instead) and AioContext->epoll_enabled (it's implicit if our
AioContext->fdmon_ops callbacks are being invoked) fields.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Link: https://lore.kernel.org/r/20200305170806.1313245-4-stefanha@redhat.com
Message-Id: <20200305170806.1313245-4-stefanha@redhat.com>