Coroutine pool size was 64 from long ago, and the basis was organized in the commit message in 4d68e86b.
At that time, virtio-blk queue-size and num-queue were not configuable, and equivalent values were 128 and 1.
Coroutine pool size 64 was fine then.
Later queue-size and num-queue got configuable, and default values were increased.
Coroutine pool with size 64 exhausts frequently with random disk IO in new size, and slows down.
This commit adjusts coroutine pool size adaptively with new values.
This commit adds 64 by default, but now coroutine is not only for block devices,
and is not too much burdon comparing with new default.
pool size of 128 * vCPUs.
Signed-off-by: Hiroki Narukawa <hnarukaw@yahoo-corp.jp>
Message-id: 20220214115302.13294-2-hnarukaw@yahoo-corp.jp
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
qemu_co_queue_next() and qemu_co_queue_restart_all() just call
aio_co_wake() which works well in non-coroutine context. So these
functions can be called from non-coroutine context as well. And
actually qemu_co_queue_restart_all() is called from
nbd_cancel_in_flight(), which is called from non-coroutine context.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210610100802.5888-2-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Allow using QemuCoSleep to sleep forever until woken by qemu_co_sleep_wake.
This makes the logic of qemu_co_sleep_ns_wakeable easy to understand.
In the future we will introduce an API that can work even if the
sleep and wake happen from different threads. For now, initializing
w->to_wake after timer_mod is fine because the timer can only fire in
the same AioContext.
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 20210517100548.28806-7-pbonzini@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Right now, users of qemu_co_sleep_ns_wakeable are simply passing
a pointer to QemuCoSleepState by reference to the function. But
QemuCoSleepState really is just a Coroutine*; making the
content of the struct public is just as efficient and lets us
skip the user_state_pointer indirection.
Since the usage is changed, take the occasion to rename the
struct to QemuCoSleep.
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 20210517100548.28806-6-pbonzini@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Simplify the code by removing conditionals. qemu_co_sleep_ns
can simply point the argument to an on-stack temporary.
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 20210517100548.28806-3-pbonzini@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
An invariant of the current rwlock is that if multiple coroutines hold a
reader lock, all must be runnable. The unlock implementation relies on
this, choosing to wake a single coroutine when the final read lock
holder exits the critical section, assuming that it will wake a
coroutine attempting to acquire a write lock.
The downgrade implementation violates this assumption by creating a
read lock owning coroutine that is exclusively runnable - any other
coroutines that are waiting to acquire a read lock are *not* made
runnable when the write lock holder converts its ownership to read
only.
More in general, the old implementation had lots of other fairness bugs.
The root cause of the bugs was that CoQueue would wake up readers even
if there were pending writers, and would wake up writers even if there
were readers. In that case, the coroutine would go back to sleep *at
the end* of the CoQueue, losing its place at the head of the line.
To fix this, keep the queue of waiters explicitly in the CoRwlock
instead of using CoQueue, and store for each whether it is a
potential reader or a writer. This way, downgrade can look at the
first queued coroutines and wake it only if it is a reader, causing
all other readers in line to be released in turn.
Reported-by: David Edmondson <david.edmondson@oracle.com>
Reviewed-by: David Edmondson <david.edmondson@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 20210325112941.365238-5-pbonzini@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
clang's C11 atomic_fetch_*() functions only take a C11 atomic type
pointer argument. QEMU uses direct types (int, etc) and this causes a
compiler error when a QEMU code calls these functions in a source file
that also included <stdatomic.h> via a system header file:
$ CC=clang CXX=clang++ ./configure ... && make
../util/async.c:79:17: error: address argument to atomic operation must be a pointer to _Atomic type ('unsigned int *' invalid)
Avoid using atomic_*() names in QEMU's atomic.h since that namespace is
used by <stdatomic.h>. Prefix QEMU's APIs with 'q' so that atomic.h
and <stdatomic.h> can co-exist. I checked /usr/include on my machine and
searched GitHub for existing "qatomic_" users but there seem to be none.
This patch was generated using:
$ git grep -h -o '\<atomic\(64\)\?_[a-z0-9_]\+' include/qemu/atomic.h | \
sort -u >/tmp/changed_identifiers
$ for identifier in $(</tmp/changed_identifiers); do
sed -i "s%\<$identifier\>%q$identifier%g" \
$(git grep -I -l "\<$identifier\>")
done
I manually fixed line-wrap issues and misaligned rST tables.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20200923105646.47864-1-stefanha@redhat.com>
Some functions require that the caller holds a certain CoMutex for them
to operate correctly. Add a function so that they can assert the lock is
really held.
Cc: qemu-stable@nongnu.org
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Tested-by: Michael Weiser <michael.weiser@gmx.de>
Reviewed-by: Michael Weiser <michael.weiser@gmx.de>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Introduce a function to gracefully wake a coroutine sleeping in
qemu_co_sleep_ns().
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20191009084158.15614-2-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
bdrv_do_drained_begin/end() assume that they are called with the
AioContext lock of bs held. If we call drain functions from a coroutine
with the AioContext lock held, we yield and schedule a BH to move out of
coroutine context. This means that the lock for the home context of the
coroutine is released and must be re-acquired in the bottom half.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
qemu_co_queue_next does not need to release and re-acquire the mutex,
because the queued coroutine does not run immediately. However, this
does not hold for qemu_co_enter_next. Now that qemu_co_queue_wait
can synchronize (via QemuLockable) with code that is not running in
coroutine context, it's important that code using qemu_co_enter_next
can easily use a standardized locking idiom.
First of all, qemu_co_enter_next must use aio_co_wake to restart the
coroutine. Second, the function gains a second argument, a QemuLockable*,
and the comments of qemu_co_queue_next and qemu_co_queue_restart_all
are adjusted to clarify the difference.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20180203153935.8056-5-pbonzini@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
There are cases in which a queued coroutine must be restarted from
non-coroutine context (with qemu_co_enter_next). In this cases,
qemu_co_enter_next also needs to be thread-safe, but it cannot use
a CoMutex and so cannot qemu_co_queue_wait. Use QemuLockable so
that the CoQueue can interchangeably use CoMutex or QemuMutex.
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20180203153935.8056-4-pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
QemuLockable is a polymorphic lock type that takes an object and
knows which function to use for locking and unlocking. The
implementation could use C11 _Generic, but since the support is
not very widespread I am instead using __builtin_choose_expr and
__builtin_types_compatible_p, which are already used by
include/qemu/atomic.h.
QemuLockable can be used to implement lock guards, or to pass around
a lock in such a way that a function can release it and re-acquire it.
The next patch will do this for CoQueue.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20180203153935.8056-3-pbonzini@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
The AioContext pointer argument to co_aio_sleep_ns() is only used for
the sleep timer. It does not affect where the caller coroutine is
resumed.
Due to changes to coroutine and AIO APIs it is now possible to drop the
AioContext pointer argument. This is safe to do since no caller has
specific requirements for which AioContext the timer must run in.
This patch drops the AioContext pointer argument and renames the
function to simplify the API.
Reported-by: Paolo Bonzini <pbonzini@redhat.com>
Reported-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 20171109102652.6360-1-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
These functions are more efficient in the presence of contention.
qemu_co_rwlock_downgrade also guarantees not to block, which may
be useful in some algorithms too.
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20170629132749.997-3-pbonzini@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
It's a variant of qemu_coroutine_enter with an explicit AioContext
parameter.
Signed-off-by: Fam Zheng <famz@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
This adds a CoMutex around the existing CoQueue. Because the write-side
can just take CoMutex, the old "writer" field is not necessary anymore.
Instead of removing it altogether, count the number of pending writers
during a read-side critical section and forbid further readers from
entering.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 20170213181244.16297-7-pbonzini@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
All that CoQueue needs in order to become thread-safe is help
from an external mutex. Add this to the API.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 20170213181244.16297-6-pbonzini@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
This will avoid forward references in the next patch. It is also
more logical because CoQueue is not anymore the basic primitive.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 20170213181244.16297-5-pbonzini@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Running a very small critical section on pthread_mutex_t and CoMutex
shows that pthread_mutex_t is much faster because it doesn't actually
go to sleep. What happens is that the critical section is shorter
than the latency of entering the kernel and thus FUTEX_WAIT always
fails. With CoMutex there is no such latency but you still want to
avoid wait and wakeup. So introduce it artificially.
This only works with one waiters; because CoMutex is fair, it will
always have more waits and wakeups than a pthread_mutex_t.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 20170213181244.16297-3-pbonzini@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
This uses the lock-free mutex described in the paper '"Blocking without
Locking", or LFTHREADS: A lock-free thread library' by Gidenstam and
Papatriantafilou. The same technique is used in OSv, and in fact
the code is essentially a conversion to C of OSv's code.
[Added missing coroutine_fn in tests/test-aio-multithread.c.
--Stefan]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 20170213181244.16297-2-pbonzini@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
In the context of asynchronous work, if we have a worker coroutine that
didn't yield, the parent coroutine cannot be reentered because it hasn't
yielded yet. In this case we don't even have to reenter the parent
because it will see that the work is already done and won't even yield.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
See the doc comments for a description of this new coroutine API.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 1474989516-18255-2-git-send-email-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
In cases of deadlocks, knowing who holds a given CoMutex is really
helpful for debugging. Keeping the information around doesn't cost much
and allows us to add another assertion to keep the code correct, so
let's just add it.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
In practice the entry argument is always known at creation time, and
it is confusing that sometimes qemu_coroutine_enter is used with a
non-NULL argument to re-enter a coroutine (this happens in
block/sheepdog.c and tests/test-coroutine.c). So pass the opaque value
at creation time, for consistency with e.g. aio_bh_new.
Mostly done with the following semantic patch:
@ entry1 @
expression entry, arg, co;
@@
- co = qemu_coroutine_create(entry);
+ co = qemu_coroutine_create(entry, arg);
...
- qemu_coroutine_enter(co, arg);
+ qemu_coroutine_enter(co);
@ entry2 @
expression entry, arg;
identifier co;
@@
- Coroutine *co = qemu_coroutine_create(entry);
+ Coroutine *co = qemu_coroutine_create(entry, arg);
...
- qemu_coroutine_enter(co, arg);
+ qemu_coroutine_enter(co);
@ entry3 @
expression entry, arg;
@@
- qemu_coroutine_enter(qemu_coroutine_create(entry), arg);
+ qemu_coroutine_enter(qemu_coroutine_create(entry, arg));
@ reentry @
expression co;
@@
- qemu_coroutine_enter(co, NULL);
+ qemu_coroutine_enter(co);
except for the aforementioned few places where the semantic patch
stumbled (as expected) and for test_co_queue, which would otherwise
produce an uninitialized variable warning.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
CoQueue do not need to remove any element but the head of the list;
processing is always strictly FIFO. Therefore, the simpler singly-linked
QSIMPLEQ can be used instead.
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Re-run scripts/clean-includes to apply the previous commit's
corrections and updates. Besides redundant qemu/typedefs.h, this only
finds a redundant config-host.h include in ui/egl-helpers.c. No idea
how that escaped the previous runs.
Some manual whitespace trimming around dropped includes squashed in.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Clean up includes so that osdep.h is included first and headers
which it implies are not included manually.
This commit was created with scripts/clean-includes.
NB: If this commit breaks compilation for your out-of-tree
patchseries or fork, then you need to make sure you add
#include "qemu/osdep.h" to any new .c files that you have.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
The coroutine files are currently referenced by the block-obj-y
variable. The coroutine functionality though is already used by
more than just the block code. eg migration code uses coroutine
yield. In the future the I/O channel code will also use the
coroutine yield functionality. Since the coroutine code is nicely
self-contained it can be easily built as part of the libqemuutil.a
library, making it widely available.
The headers are also moved into include/qemu, instead of the
include/block directory, since they are now part of the util
codebase, and the impl was never in the block/ directory
either.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>