ELF thread local storage is about 10% faster on tests/test-coroutine's
perf/cost test. The timing on my machine is 190ns per iteration with
pthread TLS, 170 with ELF TLS.
Based on a patch by Kevin Wolf and Peter Lieven, but redone to follow
the model of coroutine-win32.c (including the important "noinline"
attribute!).
Platforms without thread-local storage (OpenBSD probably?) will need
a new-enough GCC for this to compile, in order to use the same emutls
support that Windows already relies on.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 1417518350-6167-2-git-send-email-pbonzini@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
# By Paolo Bonzini (7) and others
# Via Kevin Wolf
* kwolf/for-anthony: (22 commits)
pc: add compatibility machine types for 1.4
blockdev: enable discard by default
qemu-nbd: add --discard option
blockdev: add discard suboption to -drive
block: implement BDRV_O_UNMAP
block: complete all IOs before .bdrv_truncate
coroutine: trim down nesting level in perf_nesting test
coroutine: move pooling to common code
qemu-iotests: Test qcow2 image creation options
qemu-iotests: Add qemu-img compare test
qemu-img: Add compare subcommand
qemu-img: Add "Quiet mode" option
block: Add synchronous wrapper for bdrv_co_is_allocated_above
block: refuse negative iops and bps values
block: use Error in do_check_io_limits()
qcow2: support compressed clusters in BlockFragInfo
qemu-img: add compressed clusters to BlockFragInfo
qemu-img: fix missing space in qemu-img check output
qcow2: record fragmentation statistics during check
qcow2: introduce check_refcounts_l1/l2() flags
...
The setjmp() function doesn't specify whether signal masks are saved and
restored; on Linux they are not, but on BSD (including MacOSX) they are.
We want to have consistent behaviour across platforms, so we should
always use "don't save/restore signal mask" (this is also generally
going to be faster). This also works around a bug in MacOSX where the
signal-restoration on longjmp() affects the signal mask for a completely
different thread, not just the mask for the thread which did the longjmp.
The most visible effect of this was that ctrl-C was ignored on MacOSX
because the CPU thread did a longjmp which resulted in its signal mask
being applied to every thread, so that all threads had SIGINT and SIGTERM
blocked.
The POSIX-sanctioned portable way to do a jump without affecting signal
masks is to siglongjmp() to a sigjmp_buf which was created by calling
sigsetjmp() with a zero savemask parameter, so change all uses of
setjmp()/longjmp() accordingly. [Technically POSIX allows sigsetjmp(buf, 0)
to save the signal mask; however the following siglongjmp() must not
restore the signal mask, so the pair can be effectively considered as
"sigjmp/longjmp which don't touch the mask".]
For Windows we provide a trivial sigsetjmp/siglongjmp in terms of
setjmp/longjmp -- this is OK because no user will ever pass a non-zero
savemask.
The setjmp() uses in tests/tcg/test-i386.c and tests/tcg/linux-test.c
are left untouched because these are self-contained singlethreaded
test programs intended to be run under QEMU's Linux emulation, so they
have neither the portability nor the multithreading issues to deal with.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Tested-by: Stefan Weil <sw@weilnetz.de>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
The coroutine pool code is duplicated between the ucontext and
sigaltstack backends, and absent from the win32 backend. But the
code can be shared easily by moving it to qemu-coroutine.c.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Split the configure test that checks for valgrind into two, one
part checking whether we have the gcc pragma to disable unused-but-set
variables, and the other part checking for the existence of valgrind.h.
The first of these has to be compiled with -Werror and the second
does not and shouldn't generate any warnings.
This (a) allows us to enable "make errors in configure tests be
build failures" and (b) enables use of valgrind on systems with
a gcc which doesn't know about -Wunused-but-set-varibale, like
Debian squeeze.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
valgrind tends to get confused and report false positives when you
switch stacks and don't tell it about it.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
ucontext-based coroutines use a free pool to reduce allocations and
deallocations of coroutine objects. The pool is per-thread, presumably
to improve locality. However, as coroutines are usually allocated in
a vcpu thread and freed in the I/O thread, the pool accounting gets
screwed up and we end allocating and freeing a coroutine for every I/O
request. This is expensive since large objects are allocated via the
kernel, and are not cached by the C runtime.
Fix by switching to a global pool. This is safe since we're protected
by the global mutex.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Asynchronous code is becoming very complex. At the same time
synchronous code is growing because it is convenient to write.
Sometimes duplicate code paths are even added, one synchronous and the
other asynchronous. This patch introduces coroutines which allow code
that looks synchronous but is asynchronous under the covers.
A coroutine has its own stack and is therefore able to preserve state
across blocking operations, which traditionally require callback
functions and manual marshalling of parameters.
Creating and starting a coroutine is easy:
coroutine = qemu_coroutine_create(my_coroutine);
qemu_coroutine_enter(coroutine, my_data);
The coroutine then executes until it returns or yields:
void coroutine_fn my_coroutine(void *opaque) {
MyData *my_data = opaque;
/* do some work */
qemu_coroutine_yield();
/* do some more work */
}
Yielding switches control back to the caller of qemu_coroutine_enter().
This is typically used to switch back to the main thread's event loop
after issuing an asynchronous I/O request. The request callback will
then invoke qemu_coroutine_enter() once more to switch back to the
coroutine.
Note that if coroutines are used only from threads which hold the global
mutex they will never execute concurrently. This makes programming with
coroutines easier than with threads. Race conditions cannot occur since
only one coroutine may be active at any time. Other coroutines can only
run across yield.
This coroutines implementation is based on the gtk-vnc implementation
written by Anthony Liguori <anthony@codemonkey.ws> but it has been
significantly rewritten by Kevin Wolf <kwolf@redhat.com> to use
setjmp()/longjmp() instead of the more expensive swapcontext() and by
Paolo Bonzini <pbonzini@redhat.com> for Windows Fibers support.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>