Now that iothread is always compiled sending a signal seems only an
additional step. This patch also avoid writing to two pipe (one from signal
and one in qemu_service_io).
Work with kvm enabled or disabled. strace output is more readable (less syscalls).
[ kwolf: Merged build fix by Paolo Bonzini ]
Signed-off-by: Frediano Ziglio <freddy77@gmail.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
In certain circumstances, posix-aio-compat can incur a lot of latency:
- threads are created by vcpu threads, so if vcpu affinity is set,
aio threads inherit vcpu affinity. This can cause many aio threads
to compete for one cpu.
- we can create up to max_threads (64) aio threads in one go; since a
pthread_create can take around 30μs, we have up to 2ms of cpu time
under a global lock.
Fix by:
- moving thread creation to the main thread, so we inherit the main
thread's affinity instead of the vcpu thread's affinity.
- if a thread is currently being created, and we need to create yet
another thread, let thread being born create the new thread, reducing
the amount of time we spend under the main thread.
- drop the local lock while creating a thread (we may still hold the
global mutex, though)
Note this doesn't eliminate latency completely; scheduler artifacts or
lack of host cpu resources can still cause it. We may want pre-allocated
threads when this cannot be tolerated.
Thanks to Uli Obergfell of Red Hat for his excellent analysis and suggestions.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
In order to be able to transparently replace bdrv_read calls by bdrv_co_read,
reading beyond EOF must produce zeros instead of short reads for AIO, too.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The purpose of AsyncContexts was to protect qcow and qcow2 against reentrancy
during an emulated bdrv_read/write (which includes a qemu_aio_wait() call and
can run AIO callbacks of different requests if it weren't for AsyncContexts).
Now both qcow and qcow2 are protected by CoMutexes and AsyncContexts can be
removed.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This patch removes all references to signal.h when qemu-common.h is included
as they become redundant.
Signed-off-by: Alexandre Raymond <cerbere@gmail.com>
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
A thread should only be counted as idle when it really is waiting for new
requests. Without this patch, sometimes too few threads are started as busy
threads are counted as idle.
Not sure if it makes a difference in practice outside some artificial
qemu-io/qemu-img tests, but I think the change makes sense in any case.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This patch adds paio_complete() and paio_cancel() trace events to
complement the paio_submit() event.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Use qemu_blockalign for all allocations in the block layer. This allows
increasing the required alignment, which is need to support O_DIRECT on
devices with large block sizes.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Set the async_context_id field when queuing an async ioctl call
Signed-off-by: Andrew de Quincey <adq@lidskialf.net>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This patch expands tabs on a few lines so the code formats nicely and
follows the QEMU coding style.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
CC posix-aio-compat.o
cc1: warnings being treated as errors
posix-aio-compat.c: In function 'aio_signal_handler':
posix-aio-compat.c:505: error: ignoring return value of 'write', declared with attribute warn_unused_result
make: *** [posix-aio-compat.o] Error 1
Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Checking for nbytes < 0 is pointless as long as it's a size_t. If we want to
use negative numbers for error codes, we should use signed types.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
We're leaking file descriptors to child processes. Set FD_CLOEXEC on file
descriptors that don't need to be passed to children to stop this misbehaviour.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
The context parameter in paio_submit isn't used anyway, so there is no reason
why block drivers should need to remember it. This also avoids passing a Linux
AIO context to paio_submit (which doesn't do any harm as long as the parameter
is unused, but it is highly confusing).
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Don't call callbacks that don't belong to the active AsyncContext.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
We'll leave some AIO completions unhandled when we can't call the callback.
qemu_aio_process_queue() is used later to run any callbacks that are left and
can be run then.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
We need to process the request queue and run callbacks separately from reading
out the queue in a later patch, so split it out.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Problem: Our file sys-queue.h is a copy of the BSD file, but there are
some additions and it's not entirely compatible. Because of that, there have
been conflicts with system headers on BSD systems. Some hacks have been
introduced in the commits 15cc923584,
f40d753718,
96555a96d7 and
3990d09adf but the fixes were fragile.
Solution: Avoid the conflict entirely by renaming the functions and the
file. Revert the previous hacks.
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Instead stalling the VCPU while serving a cache flush try to do it
asynchronously. Use our good old helper thread pool to issue an
asynchronous fdatasync for raw-posix. Note that while Linux AIO
implements a fdatasync operation it is not useful for us because
it isn't actually implement in asynchronous fashion.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Currently the raw-posix.c code contains a lot of knowledge about the
asynchronous I/O scheme that is mostly implemented in posix-aio-compat.c.
All this code does not really belong here and is getting a bit in the
way of implementing native AIO on Linux.
So instead move all the guts of the AIO implementation into
posix-aio-compat.c (which might need a better name, btw).
There's now a very small interface between the AIO providers and raw-posix.c:
- an init routine is called from raw_open_common to return an AIO context
for this drive. An AIO implementation may either re-use one context
for all drives, or use a different one for each as the Linux native
AIO support will do.
- an submit routine is called from the aio_reav/writev methods to submit
an AIO request
There are no indirect calls involved in this interface as we need to
decide which one to call manually. We will only call the Linux AIO native
init function if we were requested to by vl.c, and we will only call
the native submit function if we are asked to and the request is properly
aligned. That's also the reason why the alignment check actually does
the inverse move and now goes into raw-posix.c.
The old posix-aio-compat.h headers is removed now that most of it's
content is private to posix-aio-compat.c, and instead we add a new
block/raw-posix-aio.h headers is created containing only the tiny interface
between raw-posix.c and the AIO implementation.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
posix_aio_read expect aio requests to return the number of bytes
requests to be successfull, so we need to fake this up for ioctls.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
This ties up the preadv/pwritev syscalls to qemu if they are declared in
unistd.h. This is the case currently on at least NetBSD and OpenBSD and
will hopefully soon be the case on Linux.
Thanks to Blue Swirl and Gerd Hoffmann for the configure autodetection
of preadv/pwritev.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@7021 c046a42c-6fe2-441c-8c8c-71466251a162
Make all AIO requests vectored and defer linearization until the actual
I/O thread. This prepares for using native preadv/pwritev.
Also enables asynchronous direct I/O by handling that case in the I/O thread.
Qcow and qcow2 propably want to be adopted to directly deal with multi-segment
requests, but that can be implemented later.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@7020 c046a42c-6fe2-441c-8c8c-71466251a162
Okay, I started looking into how to handle scsi-generic I/O in the
new world order.
I think the best is to use the SG_IO ioctl instead of the read/write
interface as that allows us to support scsi passthrough on disk/cdrom
devices, too. See Hannes patch on the kvm list from August for an
example.
Now that we always do ioctls we don't need another abstraction than
bdrv_ioctl for the synchronous requests for now, and for asynchronous
requests I've added a aio_ioctl abstraction keeping it simple.
Long-term we might want to move the ops to a higher-level abstraction
and let the low-level code fill out the request header, but I'm lazy
enough to leave that to the people trying to support scsi-passthrough
on a non-Linux OS.
Tested lightly by issuing various sg_ commands from sg3-utils in a guest
to a host CDROM device.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@6895 c046a42c-6fe2-441c-8c8c-71466251a162
pthread_cond_timedwait is allowed to both consume the signal and
return with the value indicating the timeout, hence predicate should
always be (re)checked before taking an action
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@6634 c046a42c-6fe2-441c-8c8c-71466251a162
Avoid repeated creation/initalization/destruction of attr and calls to
getpid
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@6633 c046a42c-6fe2-441c-8c8c-71466251a162
Broadcast was used so that the I/O threads would wakeup, reset their
ts values and all but one go to sleep, in other words an optimization
to prevent threads from exiting in presence of continuing I/O
activity. Spurious wakeups make the looping around cond_timedwait with
ever reinitialized ts potentially unsafe and as such ts in no longer
reinitilized inside the loop, hence switch to signal is warranted and
this benefits of this particlaur optimization are lost.
(It's worth noting that timed variants of pthread calls use realtime
clock by default, and therefore can hang "forever" should the host
time be changed. Unfortunatelly not all host systems QEMU runs on
support CLOCK_MONOTONIC and/or pthread_condattr_setclock with this
value)
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@6632 c046a42c-6fe2-441c-8c8c-71466251a162
When we cancel an AIO request that is already being processed by
aio_thread, qemu_paio_cancel should return QEMU_PAIO_NOTCANCELED as long
as aio_thread isn't done with this request. But as the latter currently
updates aiocb->ret after every block of the request, we may report
QEMU_PAIO_ALLDONE too early.
Futhermore, in case some zero-length request should have been queued,
aiocb->ret is never set to != -EINPROGRESS and callers like
raw_aio_cancel could get stuck in an endless loop.
Fix those issues by updating aiocb->ret _after_ the request has been
fully processed. This also simplifies the locking.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@6278 c046a42c-6fe2-441c-8c8c-71466251a162
glibc implements posix-aio as a thread pool and imposes a number of limitations.
1) it limits one request per-file descriptor. we hack around this by dup()'ing
file descriptors which is hideously ugly
2) it's impossible to add new interfaces and we need a vectored read/write
operation to properly support a zero-copy API.
What has been suggested to me by glibc folks, is to implement whatever new
interfaces we want and then it can eventually be proposed for standardization.
This requires that we implement our own posix-aio implementation though.
This patch implements posix-aio using pthreads. It immediately eliminates the
need for fd pooling.
It performs at least as well as the current posix-aio code (in some
circumstances, even better).
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@5996 c046a42c-6fe2-441c-8c8c-71466251a162