These functions do not modify their @ht or @bucket arguments.
Constify those arguments.
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
seqlock_read_begin takes a const param since c04649eeea
("seqlock: constify seqlock_read_begin", 2018-08-23), so
we can constify the entire lookup.
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Accessing the HT from an iterator results almost always
in a deadlock. Given that only one qht-internal function
uses this argument, drop it from the interface.
Suggested-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
This currently has no users, but the use case is so common that I
think we must support it.
Note that without the appended we cannot safely remove a set of
elements; a 2-step approach (i.e. qht_iter first, keep track of
the to-be-deleted elements, and then a bunch of qht_remove calls)
would be racy, since between the iteration and the removals other
threads might insert additional elements.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Commit 70232b5253 ("aio-posix: Don't count ctx->notifier as progress when
2018-08-15), by not reporting progress, causes aio_poll to execute the
system call when polling succeeds because of ctx->notifier. This introduces
latency before the call to aio_bh_poll() and negates the advantages of
polling, unfortunately.
The fix builds on the previous patch, separating the effect of polling on
the timeout from the progress reported to aio_poll(). ctx->notifier
does zero the timeout, causing the caller to skip the system call,
but it does not report progress, so that the bug fix of commit 70232b5253
still stands.
Fixes: 70232b5253
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20180912171040.1732-4-pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
This is a preparation for the next patch, and also a very small
optimization. Compute the timeout only once, before invoking
try_poll_mode, and adjust it in run_poll_handlers. The adjustment
is the polling time when polling fails, or zero (non-blocking) if
polling succeeds.
Fixes: 70232b5253
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20180912171040.1732-3-pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
It is valid for an aio_set_fd_handler to happen concurrently with
aio_poll. In that case, poll_disable_cnt can change under the heels
of aio_poll, and the assertion on poll_disable_cnt can fail in
run_poll_handlers.
Therefore, this patch simply checks the counter on every polling
iteration. There are no particular needs for ordering, since the
polling loop is terminated anyway by aio_notify at the end of
aio_set_fd_handler.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20180912171040.1732-2-pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
- Deprecate the "enforce-config-section" machine parameter
- Re-enable the wdt_ib700, endianness and vmxnet3 qtests
- Some trivial fixes and doc update patches that crossed my way
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAABAgAGBQJbqlsyAAoJEC7Z13T+cC21RbAP/3IvGfBxuRm6rBWoghjQgbl8
KU8nPnlZUtqjxmfUTILO/h+pJ3na5MQ8hh7v8JHi+xlQ2DPkECW21DtnfdxntVjw
+b+N5Ap6J22GHyEq4HJXPWAk2rDInqkU966DvL40RiMvOTfXdg9EO0TDX0VsVgZv
BR1r7/t3T0P7hiQ0XWb9U2JchRIC+Zgk34gXZPSTpoIv89fUhzNoK5LvAA6yV1FQ
TvE8VTKJm4wkqThH1ShtbJCBKjHjW/W8LYZr3YMothcs8vGjEdEcDL4BoJZDn3bF
h4VTkU+k8lp7W9LmlnPnu1WH/5ezhzdwJTeFaPJt4U10WKJptAS4vbK03DXlds9O
9d2BOXKrima2kSr1ejSe1f0kcE8fis1XFmSuhF61Nbw6ngT5+pP2JSc1XwFazd2K
zQwV4GXBLzAGnd4F2Ec+5TKzbGFVfczxeBDiBkkVmG+XdX/UXJpkpPYGAaw7DDiK
JwKVVYIPk1ll6MAbR6qEGsvE/adHNEm8lUdjXqwgbQlIeUZ2H0hCu9lJ0X81mtoQ
WZP+nMa/87COnlPX6VPVgxM2TXQOH/UbGz/WmYzZ6/gPKTX+gfwrHQGdp7Tjl33U
KxFKWioFnoqGuyWasvTtKEK67/IlrY+w1nXuuqKJg8J2/qx1SVtx45FHkRkxkIDx
4boRpx0XUqpDVdf8VhRB
=dXgp
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/huth-gitlab/tags/pull-request-2018-09-25' into staging
- Deprecate the usage of a network backend via "name" instead of "id"
- Deprecate the "enforce-config-section" machine parameter
- Re-enable the wdt_ib700, endianness and vmxnet3 qtests
- Some trivial fixes and doc update patches that crossed my way
# gpg: Signature made Tue 25 Sep 2018 16:58:42 BST
# gpg: using RSA key 2ED9D774FE702DB5
# gpg: Good signature from "Thomas Huth <th.huth@gmx.de>"
# gpg: aka "Thomas Huth <thuth@redhat.com>"
# gpg: aka "Thomas Huth <huth@tuxfamily.org>"
# gpg: aka "Thomas Huth <th.huth@posteo.de>"
# Primary key fingerprint: 27B8 8847 EEE0 2501 18F3 EAB9 2ED9 D774 FE70 2DB5
* remotes/huth-gitlab/tags/pull-request-2018-09-25:
Revert "check: Move VMXNET3 test to common"
Revert "check: Move endianess test to common"
Revert "check: Move wdt_ib700 test to common"
tests/migration: Speed up the test on ppc64
hw/qdev-core: Fix description of instance_init
qdev: fix a typo in comment
docs: Fix some typos (most found by codespell)
trivial: Make bios files and source files non-executable
memfd: fix possible usage of the uninitialized file descriptor
hw/core/machine: Officially deprecate the enforce-config-section parameter
net/slirp: Deprecate the [hub_id name] parameter tuple
net: Deprecate the "name" parameter of -net
Makefile: Add missing dependency for qemu-deprecated.texi
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
The qemu_memfd_alloc_check() routine allocates the fd variable on stack.
This variable is initialized inside the qemu_memfd_alloc() function.
There are several cases when *fd will be left unintialized which can
lead to the unexpected close() in the qemu_memfd_free() call.
Set file descriptor to -1 before calling the qemu_memfd_alloc routine.
Signed-off-by: Dima Stepanov <dimastep@yandex-team.ru>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
When draining a block node, we recurse to its parent and for subtree
drains also to its children. A single AIO_WAIT_WHILE() is then used to
wait for bdrv_drain_poll() to become true, which depends on all of the
nodes we recursed to. However, if the respective child or parent becomes
quiescent and calls bdrv_wakeup(), only the AioWait of the child/parent
is checked, while AIO_WAIT_WHILE() depends on the AioWait of the
original node.
Fix this by using a single AioWait for all callers of AIO_WAIT_WHILE().
This may mean that the draining thread gets a few more unnecessary
wakeups because an unrelated operation got completed, but we already
wake it up when something _could_ have changed rather than only if it
has certainly changed.
Apart from that, drain is a slow path anyway. In theory it would be
possible to use wakeups more selectively and still correctly, but the
gains are likely not worth the additional complexity. In fact, this
patch is a nice simplification for some places in the code.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
bdrv_do_drained_begin/end() assume that they are called with the
AioContext lock of bs held. If we call drain functions from a coroutine
with the AioContext lock held, we yield and schedule a BH to move out of
coroutine context. This means that the lock for the home context of the
coroutine is released and must be re-acquired in the bottom half.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
AIO Coroutines shouldn't by managed by an AioContext different than the
one assigned when they are created. aio_co_enter avoids entering a
coroutine from a different AioContext, calling aio_co_schedule instead.
Scheduled coroutines are then entered by co_schedule_bh_cb using
qemu_coroutine_enter, which just calls qemu_aio_coroutine_enter with the
current AioContext obtained with qemu_get_current_aio_context.
Eventually, co->ctx will be set to the AioContext passed as an argument
to qemu_aio_coroutine_enter.
This means that, if an IO Thread's AioConext is being processed by the
Main Thread (due to aio_poll being called with a BDS AioContext, as it
happens in AIO_WAIT_WHILE among other places), the AioContext from some
coroutines may be wrongly replaced with the one from the Main Thread.
This is the root cause behind some crashes, mainly triggered by the
drain code at block/io.c. The most common are these abort and failed
assertion:
util/async.c:aio_co_schedule
456 if (scheduled) {
457 fprintf(stderr,
458 "%s: Co-routine was already scheduled in '%s'\n",
459 __func__, scheduled);
460 abort();
461 }
util/qemu-coroutine-lock.c:
286 assert(mutex->holder == self);
But it's also known to cause random errors at different locations, and
even SIGSEGV with broken coroutine backtraces.
By using qemu_aio_coroutine_enter directly in co_schedule_bh_cb, we can
pass the correct AioContext as an argument, making sure co->ctx is not
wrongly altered.
Signed-off-by: Sergio Lopez <slp@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Add two functions to print an error/warning report once depending
on a passed-in condition variable and flip it if printed. This is
useful if you want to print a message not once-globally, but e.g.
once-per-device.
Inspired by warn_once() in hw/vfio/ccw.c, which has been replaced
with warn_report_once_cond().
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
Message-Id: <20180830145902.27376-2-cohuck@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
[Function comments reworded]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
vhost-user-gpu will share the same code to open a DRM node.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20180713130916.4153-20-marcandre.lureau@redhat.com>
[ kraxel: buildfix: util/drm.o must be CONFIG_OPENGL not CONFIG_LINUX ]
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
We reject bytes that can't occur in valid UTF-8 (\xC0..\xC1,
\xF5..\xFF in the lexer. That's insufficient; there's plenty of
invalid UTF-8 not containing these bytes, as demonstrated by
check-qjson:
* Malformed sequences
- Unexpected continuation bytes
- Missing continuation bytes after start bytes other than
\xC0..\xC1, \xF5..\xFD.
* Overlong sequences with start bytes other than \xC0..\xC1,
\xF5..\xFD.
* Invalid code points
Fixing this in the lexer would be bothersome. Fixing it in the parser
is straightforward, so do that.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180823164025.12553-23-armbru@redhat.com>
Let's set the alignment just like for the posix variant. This will
implicitly set the alignment of the underlying memory region and
therefore make memory_region_get_alignment(mr) return something > 0 for
all memory backends applicable to PCDIMM/NVDIMM.
The allocation granularity is ususally 64k, while the page size is 4k.
The documentation of VirtualAlloc is not really comprehensible in case
only MEM_COMMIT is specified without an address. We'll detect the actual
values and then go for the bigger one. The expection is, that it will
always be 64k aligned. (The assumption is that MEM_COMMIT does an
implicit MEM_RESERVE, so the address will always be aligned to the
allocation granularity. And the allocation granularity is always bigger
than the page size).
This will allow us to drop special handling in pc.c for
memory_region_get_alignment(mr) == 0, as we can then assume that it is
always set (and AFAICS >= getpagesize()).
For pc in pc_memory_plug(), under Windows TARGET_PAGE_SIZE == getpagesize(),
therefore alignment of DIMMs will not change, and therefore also not the
guest physical memory layout.
For spapr in spapr_memory_plug(), an alignment of 0 would have been used
until now. As QEMU_ALIGN_UP will crash with the alignment being 0, this
never worked, so we don't have to care about compatibility handling.
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20180801133444.11269-3-david@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The current paths for modules are CONFIG_QEMU_MODDIR and paths relative
to the executable. Qemu and its modules can be installed and executed in
paths that are different from these search paths. This change allows
a search path to be specified by environment variable.
An example usage for this is postmarketOS[1]. This is a build environment
for Alpine Linux. It sets up Alpine Linux in a chroot environment.
Alpine's Qemu packages are installed in the chroot. The Alpine Linux Qemu
package is used to test compiled Alpine Linux system images. This way there
isn't a reliance on the which ever version of Qemu the host system / distro
provides.
postmarketOS executes Qemu on host system outside of the chroot
The Qemu module search path needs to point to the location of the
chroot relative to the host system.
e.g.
The root of the Alpine Linux chroot is:
~/.local/var/pmbootstrap/chroot_native/
Alpine's Qemu is installed at
~/.local/var/pmbootstrap/chroot_native/usr/bin/
The Qemu module search path needs to be:
QEMU_MODULE_DIR=~/.local/var/pmbootstrap/chroot_native/usr/lib/qemu/
[1] https://postmarketos.org/
Signed-off-by: ryang <decatf@gmail.com>
Message-Id: <20180704181010.GA918@computer>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The BQL is acquired via qemu_mutex_lock_iothread(), which makes
the profiler assign the associated wait time (i.e. most of
BQL wait time) entirely to that function. This loses the original
call site information, which does not help diagnose BQL contention.
Fix it by tracking the callers explicitly.
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
I first implemented this by deleting all entries in the global
hash table. But doing that safely slows down profiling, since
we'd need to introduce rcu_read_lock/unlock in the fast path.
What's implemented here avoids messing with the thread-local
data in the global hash table. It achieves this by taking a snapshot
of the current state, so that subsequent reports present the delta
wrt to the snapshot.
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The goal of this module is to profile synchronization primitives (i.e.
mutexes, recursive mutexes and condition variables) so that scalability
issues can be quickly diagnosed.
Sync primitives are profiled by QSP based on the vaddr of the object accessed
as well as the call site (file:line_nr). That means the same object called
from two different call sites will be tracked in separate entries, which
might be reported together or separately (see subsequent commit on
call site coalescing).
Some perf numbers:
Host: Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
Command: taskset -c 0 tests/atomic_add-bench -d 5 -m
- Before: 54.80 Mops/s
- After: 54.75 Mops/s
That is, a negligible slowdown due to the now indirect call to
qemu_mutex_lock. Note that using a branch instead of an indirect
call introduces a more severe slowdown (53.65 Mops/s, i.e. 2% slowdown).
Enabling the profiler (with -p, added in this series) is more interesting:
- No profiling: 54.75 Mops/s
- W/ profiling: 12.53 Mops/s
That is, a 4.36X slowdown.
We can break down this slowdown by removing the get_clock calls or
the entry lookup:
- No profiling: 54.75 Mops/s
- W/o get_clock: 25.37 Mops/s
- W/o entry lookup: 19.30 Mops/s
- W/ profiling: 12.53 Mops/s
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The counter is for qemu_lockcnt_inc/dec sections (read side),
qemu_lockcnt_lock/unlock is for the write side.
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-Id: <20180803063917.30292-1-famz@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
An aio_notify() pairs with an aio_notify_accept(). The former should
happen in the main thread or a vCPU thread, and the latter should be
done in the IOThread.
There is one rare case that the main thread or vCPU thread may "steal"
the aio_notify() event just raised by itself, in bdrv_set_aio_context()
[1]. The sequence is like this:
main thread IO Thread
===============================================================
bdrv_drained_begin()
aio_disable_external(ctx)
aio_poll(ctx, true)
ctx->notify_me += 2
...
bdrv_drained_end()
...
aio_notify()
...
bdrv_set_aio_context()
aio_poll(ctx, false)
[1] aio_notify_accept(ctx)
ppoll() /* Hang! */
[1] is problematic. It will clear the ctx->notifier event so that
the blocked ppoll() will not return.
(For the curious, this bug was noticed when booting a number of VMs
simultaneously in RHV. One or two of the VMs will hit this race
condition, making the VIRTIO device unresponsive to I/O commands. When
it hangs, Seabios is busy waiting for a read request to complete (read
MBR), right after initializing the virtio-blk-pci device, using 100%
guest CPU. See also https://bugzilla.redhat.com/show_bug.cgi?id=1562750
for the original bug analysis.)
aio_notify() only injects an event when ctx->notify_me is set,
correspondingly aio_notify_accept() is only useful when ctx->notify_me
_was_ set. Move the call to it into the "blocking" branch. This will
effectively skip [1] and fix the hang.
Furthermore, blocking aio_poll is only allowed on home thread
(in_aio_context_home_thread), because otherwise two blocking
aio_poll()'s can steal each other's ctx->notifier event and cause
hanging just like described above.
Cc: qemu-stable@nongnu.org
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-Id: <20180809132259.18402-3-famz@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
The same logic exists in fd polling. This change is especially important
to avoid busy loop once we limit aio_notify_accept() to blocking
aio_poll().
Cc: qemu-stable@nongnu.org
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-Id: <20180809132259.18402-2-famz@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Ciro Santilli reported that commit a5ed352596
breaks the execution replay. It happens due to the probing the clock
for the new instances of iothread.
However, this probing was made in replay mode for the timer lists that
are empty.
This patch removes clock probing in replay mode.
It is an artifact of the old version with another thread model.
Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgaluk@ispras.ru>
Message-Id: <20180725121526.12867.17866.stgit@pasha-VirtualBox>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
No callers of get_opt_value() pass in a NULL for the "value" parameter,
so the check is redundant.
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20180514171913.17664-4-berrange@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Tested-by: Roman Kagan <rkagan@virtuozzo.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The logic for parsing the multiboot initrd modules was messed up in
commit 950c4e6c94
Author: Daniel P. Berrangé <berrange@redhat.com>
Date: Mon Apr 16 12:17:43 2018 +0100
opts: don't silently truncate long option values
Causing the length to be undercounter, and the number of modules over
counted. It also passes NULL to get_opt_value() which was not robust
at accepting a NULL value.
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20180514171913.17664-2-berrange@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Tested-by: Roman Kagan <rkagan@virtuozzo.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
soft-freeze, but I'd like these preparatory patches to be merged anyway.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEtIKLr5QxQM7yo0kQcdTV5YIvc9YFAls2DEgACgkQcdTV5YIv
c9ZShw/6AuDMeWpzFHAdf4lBuUdDFyAC7QFln8xdb+MDus5whAvtt7vDeY0OKbgo
w5VDTkstO7h4jQJDHkjzHK91ZdUbgu0Tj7C09x4oQpUueNWsTZGcPBvNGadjjOBt
70LdwyV2nSER3+QNjTNznrh0faxay4xuSTIY/mW6iudeWGobwXmseEeOE8gGM+w0
s1GwxMVKIfllKUmW2vx0mGfn02pKtTnan+Si+sp/AnY9xSquFfHWpZhXZlkZrfYd
mgtJOTY9IpSekr9jBBKgUlZ/QVYiliDzuh3ePDYKtsuHZZ7z2ype3DkXqYOnblOs
C+2gWUE/TC5BStjRX3RmPv21dpfkEdlxOZpgXbpP1VgKqbtnbnvcgTL89IPv9afl
Aj+q5uYR494kOL5rSDynVRdWhnUmMnkqHCZpKG+IRMHv6GlrXpxOQWenwCS/vYWK
swKqRwGj0CFugdt7qVZ+4XjXbbWEI21dHHG7nAXinfakKVOfJYIeGIQC7WfpIrxy
ApV0mHSceK0AMBJvlf1Zf0Qm0lJ7Ay7MRT/5XWDFV9Bogf+wxtGvf9Ukc2qQhwd8
mR9iN7rlWz3VSu5vS3bEdsiBXKibxIRfv7HhF5fa+mwkZA9gMbj33vVds1zA4ta1
Qw4doRq4xWui3uNO9jvtcXtW5Bq7N4p6wVFK76dLVHk5axLCIec=
=vVJr
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/gkurz/tags/for-upstream' into staging
The Darwin host support still needs some more work. It won't make it for
soft-freeze, but I'd like these preparatory patches to be merged anyway.
# gpg: Signature made Fri 29 Jun 2018 11:39:04 BST
# gpg: using RSA key 71D4D5E5822F73D6
# gpg: Good signature from "Greg Kurz <groug@kaod.org>"
# gpg: aka "Gregory Kurz <gregory.kurz@free.fr>"
# gpg: aka "[jpeg image of size 3330]"
# Primary key fingerprint: B482 8BAF 9431 40CE F2A3 4910 71D4 D5E5 822F 73D6
* remotes/gkurz/tags/for-upstream:
9p: darwin: Explicitly cast comparisons of mode_t with -1
cutils: Provide strchrnul
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
This updates the minimum required glib version to 2.40
-----BEGIN PGP SIGNATURE-----
iQIcBAABCAAGBQJbNhcKAAoJEL6G67QVEE/f1gMP+wV4J+V9WJOxcXbNjP9bb1K4
gsvZrv2pKCVbcJW2o3Mc3POXExX1E2GsrAl639zpDHbNVlVFvQC51GaRO1Fc6lwh
7NRFHen0Ee6I7TvWlVXMy1YPXPZ/8EJ/1KuZerYUUyKO/R8ojEt/TbELwfl3P1LF
VZGWgs+GpwUYwiG4ZmYPQ0VsT/munZPlaK1mRQSDbGqX0KG1Vy8Q+mgbGAm+S6xh
39HtM7ecdfXbVEAnoMsp9+kRi3zXpD5zSAyVBZN2RDktMt/EHxF0pPJCqX7oq0Rn
DehXDkzaqF2ghzrSAIT3rbKBhzYIY/ny/vxnyQZ/Px0GrBdKwuUJ+pYcF2akS2hV
s98VWRw/tqt8wQKYLF/wmeL7eE/tKAHSUqFc3Ta0Rvgfq0z9Mp3b73vuswBpfjWL
+GiASRvg96n/1yqmywtCd7KtF5riOo1iyvfMFCdQ+nBHQok/6K/oWfZPyoZsCAIa
2GJBfmHzkkc98QeNO0Dp/+ckSyKIBuyTV1bDyq/8Yz7IwEd1PfRWooEgVFSHO7MU
6ddCECvKVvP0S03r5dlw4Imio39gPpeZBMmSE4ZOh/hRa89T4UqIUvSLwzat7OhA
DH1NhJsj7G1jaIxENfzBhpggg++rHCVcfc2cv28Tl7i7twyRsb7CveYA6TwD+UIg
7JT+KswUo7W9MrkKBHC+
=8tKK
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/berrange/tags/min-glib-pull-request' into staging
glib: update the min required version
This updates the minimum required glib version to 2.40
# gpg: Signature made Fri 29 Jun 2018 12:24:58 BST
# gpg: using RSA key BE86EBB415104FDF
# gpg: Good signature from "Daniel P. Berrange <dan@berrange.com>"
# gpg: aka "Daniel P. Berrange <berrange@redhat.com>"
# Primary key fingerprint: DAF3 A6FD B26B 6291 2D0E 8E3F BE86 EBB4 1510 4FDF
* remotes/berrange/tags/min-glib-pull-request:
glib: enforce the minimum required version and warn about old APIs
glib: bump min required glib library version to 2.40
util: remove redundant include of glib.h and add osdep.h
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Per supported platforms doc[1], the various min glib on relevant distros is:
RHEL-7: 2.50.3
Debian (Stretch): 2.50.3
Debian (Jessie): 2.42.1
OpenBSD (Ports): 2.54.3
FreeBSD (Ports): 2.50.3
OpenSUSE Leap 15: 2.54.3
SLE12-SP2: 2.48.2
Ubuntu (Xenial): 2.48.0
macOS (Homebrew): 2.56.0
This suggests that a minimum glib of 2.42 is a reasonable target.
The GLibC compile farm, however, uses Ubuntu 14.04 (Trusty) which only
has glib 2.40.0, and this is needed for testing during merge. Thus an
exception is made to the documented platform support policy to allow for
all three current LTS releases to be supported.
Docker jobs that not longer satisfy this new min version are removed.
[1] https://qemu.weilnetz.de/doc/qemu-doc.html#Supported-build-platforms
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Code must only ever include glib.h indirectly via the glib-compat.h
header file, because we will need some macros set before glib.h is
pulled in. Adding extra includes of glib.h will (soon) cause compile
failures such as:
In file included from /home/berrange/src/virt/qemu/include/qemu/osdep.h:107,
from /home/berrange/src/virt/qemu/include/qemu/iova-tree.h:26,
from util/iova-tree.c:13:
/home/berrange/src/virt/qemu/include/glib-compat.h:22: error: "GLIB_VERSION_MIN_REQUIRED" redefined [-Werror]
#define GLIB_VERSION_MIN_REQUIRED GLIB_VERSION_2_40
In file included from /usr/include/glib-2.0/glib/gtypes.h:34,
from /usr/include/glib-2.0/glib/galloca.h:32,
from /usr/include/glib-2.0/glib.h:30,
from util/iova-tree.c:12:
/usr/include/glib-2.0/glib/gversionmacros.h:237: note: this is the location of the previous definition
# define GLIB_VERSION_MIN_REQUIRED (GLIB_VERSION_CUR_STABLE)
Furthermore, the osdep.h include should always be done directly from the
.c file rather than indirectly via any .h file.
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
strchrnul is a GNU extension and thus unavailable on a number of targets.
In the review for a commit removing strchrnul from 9p, I was asked to
create a qemu_strchrnul helper to factor out this functionality.
Do so, and use it in a number of other places in the code base that inlined
the replacement pattern in a place where strchrnul could be used.
Signed-off-by: Keno Fischer <keno@juliacomputing.com>
Acked-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Greg Kurz <groug@kaod.org>
We have had some tracing tools for mutex but it's not easy to use them
for e.g. dead locks. Let's provide "--enable-debug-mutex" parameter
when configure to allow QemuMutex to store the last owner that took
specific lock. It will be easy to use this tool to debug deadlocks
since we can directly know who took the lock then as long as we can have
a debugger attached to the process.
Reviewed-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20180425025459.5258-4-peterx@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Introduce some hooks for the shared part of qemu thread between POSIX
and Windows implementations. Note that in qemu_mutex_unlock_impl() we
moved the call before unlock operation which should make more sense.
And we don't need qemu_mutex_post_unlock() hook.
Put all these shared hooks into the header files. It should be internal
to qemu-thread but not for qemu-thread users, hence put into util/
directory.
Reviewed-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20180425025459.5258-3-peterx@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
laio_init() can fail for a couple of reasons, which will lead to a NULL
pointer dereference in laio_attach_aio_context().
To solve this, add a aio_setup_linux_aio() function which is called
early in raw_open_common. If this fails, propagate the error up. The
signature of aio_get_linux_aio() was not modified, because it seems
preferable to return the actual errno from the possible failing
initialization calls.
Additionally, when the AioContext changes, we need to associate a
LinuxAioState with the new AioContext. Use the bdrv_attach_aio_context
callback and call the new aio_setup_linux_aio(), which will allocate a
new AioContext if needed, and return errors on failures. If it fails for
any reason, fallback to threaded AIO with an error message, as the
device is already in-use by the guest.
Add an assert that aio_get_linux_aio() cannot return NULL.
Signed-off-by: Nishanth Aravamudan <naravamudan@digitalocean.com>
Message-id: 20180622193700.6523-1-naravamudan@digitalocean.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
This new parameter allows the caller to just query the next dirty
position without moving the iterator.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 20180613181823.13618-8-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
Introduce a new global big lock for mon_fdsets. Take it where needed.
The monitor_fdset_get_fd() handling is a bit tricky: now we need to call
qemu_mutex_unlock() which might pollute errno, so we need to make sure
the correct errno be passed up to the callers. To make things simpler,
we let monitor_fdset_get_fd() return the -errno directly when error
happens, then in qemu_open() we move it back into errno.
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20180608035511.7439-8-peterx@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
The meaning of "existing" is now changed to "matches in hash and
ht->cmp result". This is saner than just checking the pointer value.
Suggested-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
qht_lookup now uses the default cmp function. qht_lookup_custom is defined
to retain the old behaviour, that is a cmp function is explicitly provided.
qht_insert will gain use of the default cmp in the next patch.
Note that we move qht_lookup_custom's @func to be the last argument,
which makes the new qht_lookup as simple as possible.
Instead of this (i.e. keeping @func 2nd):
0000000000010750 <qht_lookup>:
10750: 89 d1 mov %edx,%ecx
10752: 48 89 f2 mov %rsi,%rdx
10755: 48 8b 77 08 mov 0x8(%rdi),%rsi
10759: e9 22 ff ff ff jmpq 10680 <qht_lookup_custom>
1075e: 66 90 xchg %ax,%ax
We get:
0000000000010740 <qht_lookup>:
10740: 48 8b 4f 08 mov 0x8(%rdi),%rcx
10744: e9 37 ff ff ff jmpq 10680 <qht_lookup_custom>
10749: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
There are numerous QDict functions that have been introduced for and are
used only by the block layer. Move their declarations into an own
header file to reflect that.
While qdict_extract_subqdict() is in fact used outside of the block
layer (in util/qemu-config.c), it is still a function related very
closely to how the block layer works with nested QDicts, namely by
sometimes flattening them. Therefore, its declaration is put into this
header as well and util/qemu-config.c includes it with a comment stating
exactly which function it needs.
Suggested-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20180509165530.29561-7-mreitz@redhat.com>
[Copyright note tweaked, superfluous includes dropped]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
We banned use of certain g_assert_FOO() functions outside tests, and
made checkpatch.pl flag them (commit 6e9389563e). We neglected to
purge existing uses. Do that now.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20180608170231.27912-1-armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: John Snow <jsnow@redhat.com>
It really is up to the caller to decide what this list of options means.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 20180509210023.20283-4-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
* Copy offloading for qemu-img convert (iSCSI, raw, and qcow2)
If the underlying storage supports copy offloading, qemu-img convert will
use it instead of performing reads and writes. This avoids data transfers
and thus frees up storage bandwidth for other purposes. SCSI EXTENDED COPY
and Linux copy_file_range(2) are used to implement this optimization.
* Drop spurious "WARNING: I\/O thread spun for 1000 iterations" warning
-----BEGIN PGP SIGNATURE-----
iQEcBAABAgAGBQJbFSBoAAoJEJykq7OBq3PISpEIAIcMao4/rzinAWXzS+ncK9LO
6FtRVpgutpHaWX2ayySaz5n2CdR3cNMrpCI7sjY2Kw0lrdkqxPgl5n0SWD+VCl4W
7+JLz/uF0iUV8X+99e7WGAjZbm9LSlxgn5AQKfrrwyPf0ZfzoYQ5nBMcQ6xjEeQP
48j2WqJqN9/u8RBD07o11yn0+CE5g56/f12xVjR5ASVodzsAmcZ2OQRMQbM01isU
1mBekJQkDxJkt5l13Rql8+t+vWz8/9BEW2c/eIDKvoayMqYJpdfKv4DqLloIuHnc
3RkquA0zUuKtl7xEnEkH/We7fi4QPGW/vyBN7ychS/zKzZFQrXmwqrAuFSw3dKU=
=vZp+
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/stefanha/tags/block-pull-request' into staging
Pull request
* Copy offloading for qemu-img convert (iSCSI, raw, and qcow2)
If the underlying storage supports copy offloading, qemu-img convert will
use it instead of performing reads and writes. This avoids data transfers
and thus frees up storage bandwidth for other purposes. SCSI EXTENDED COPY
and Linux copy_file_range(2) are used to implement this optimization.
* Drop spurious "WARNING: I\/O thread spun for 1000 iterations" warning
# gpg: Signature made Mon 04 Jun 2018 12:20:08 BST
# gpg: using RSA key 9CA4ABB381AB73C8
# gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>"
# gpg: aka "Stefan Hajnoczi <stefanha@gmail.com>"
# Primary key fingerprint: 8695 A8BF D3F9 7CDA AC35 775A 9CA4 ABB3 81AB 73C8
* remotes/stefanha/tags/block-pull-request:
main-loop: drop spin_counter
qemu-img: Convert with copy offloading
block-backend: Add blk_co_copy_range
iscsi: Implement copy offloading
iscsi: Create and use iscsi_co_wait_for_task
iscsi: Query and save device designator when opening
file-posix: Implement bdrv_co_copy_range
qcow2: Implement copy offloading
raw: Implement copy offloading
raw: Check byte range uniformly
block: Introduce API for copy offloading
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Commit d759c951f3 ("replay: push
replay_mutex_lock up the call tree") removed the !timeout lock
optimization in the main loop.
The idea of the optimization was to avoid ping-pongs between threads by
keeping the Big QEMU Lock held across non-blocking (!timeout) main loop
iterations.
A warning is printed when the main loop spins without releasing BQL for
long periods of time. These warnings were supposed to aid debugging but
in practice they just alarm users. They are considered noise because
the cause of spinning is not shown and is hard to find.
Now that the lock optimization has been removed, there is no danger of
hogging the BQL. Drop the spin counter and the infamous warning.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Coverity complains about qemu_memfd_create() (CID 1385858) because
we calculate a bit position htsize which could be up to 63, but
then use it in "1 << htsize" which is a 32-bit integer calculation
and could push the 1 off the top of the value.
Silence the complaint bu using "1ULL"; this isn't a bug in
practice since a hugetlbsize of 4GB is not very plausible.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <20180515172729.24564-1-peter.maydell@linaro.org>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Introduce a simplest iova tree implementation based on GTree.
CC: QEMU Stable <qemu-stable@nongnu.org>
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Currently the minimal supported version of glib is 2.22.
Since testing is done with a glib that claims to be 2.22, but in fact
has APIs from newer version of glib, this bug was not caught during
submit of the patch referenced below.
Replace g_realloc_n, which is available only since 2.24, with g_renew.
Fixes commit 418026ca43 ("util: Introduce vfio helpers")
Signed-off-by: Olaf Hering <olaf@aepfle.de>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
CC: qemu-stable@nongnu.org
When we call addIOThread, the epollfd created in aio_context_setup,
but not close it in the process of delIOThread, so the epollfd will leak.
Reorder the code in aio_epoll_disable and reuse it.
Signed-off-by: Jie Wang <wangjie88@huawei.com>
Message-Id: <1526517763-11108-1-git-send-email-wangjie88@huawei.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
[Mention change to aio_epoll_disable in commit message. - Fam]
Signed-off-by: Fam Zheng <famz@redhat.com>
Usually the logging of the CPU state produced by -d cpu is sufficient
to diagnose problems, but sometimes you want to see the state of
the floating point registers as well. We don't want to enable that
by default as it adds a lot of extra data to the log; instead,
allow it to be optionally enabled via -d fpu.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20180510130024.31678-1-peter.maydell@linaro.org
We will conditionally have a wrapper layer depending on whether the host
has the PTHREAD_SETNAME capability. It complicates stuff. Let's keep
the wrapper there; we opt out the pthread_setname_np() call only.
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20180412053444.17801-1-peterx@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Now that we can safely call QOBJECT() on QObject * as well as its
subtypes, we can have macros qobject_ref() / qobject_unref() that work
everywhere instead of having to use QINCREF() / QDECREF() for QObject
and qobject_incref() / qobject_decref() for its subtypes.
The replacement is mechanical, except I broke a long line, and added a
cast in monitor_qmp_cleanup_req_queue_locked(). Unlike
qobject_decref(), qobject_unref() doesn't accept void *.
Note that the new macros evaluate their argument exactly once, thus no
need to shout them.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180419150145.24795-4-marcandre.lureau@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
[Rebased, semantic conflict resolved, commit message improved]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
qemu_mempath_getpagesize() gets the effective (host side) page size for
a block of memory backed by an mmap()ed file on the host. It requires
the mem_path parameter to be non-NULL.
This ends up meaning all the callers need a different case for handling
anonymous memory (for memory-backend-ram or default memory with -mem-path
is not specified).
We can make all those callers a little simpler by having
qemu_mempath_getpagesize() accept NULL, and treat that as the anonymous
memory case.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Our rule right now is to use <> for external headers only.
util/sys_membarrier.c violates that. Fix it up.
Signed-off-by: Bruce Rogers <brogers@suse.com>
Message-Id: <20180329151018.15319-1-brogers@suse.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
qemu_aio_coroutine_enter() is (indirectly) called recursively when
processing co_queue_wakeup. This can lead to stack exhaustion.
This patch rewrites co_queue_wakeup in an iterative fashion (instead of
recursive) with bounded memory usage to prevent stack exhaustion.
qemu_co_queue_run_restart() is inlined into qemu_aio_coroutine_enter()
and the qemu_coroutine_enter() call is turned into a loop to avoid
recursion.
There is one change that is worth mentioning: Previously, when
coroutine A queued coroutine B, qemu_co_queue_run_restart() entered
coroutine B from coroutine A. If A was terminating then it would still
stay alive until B yielded. After this patch B is entered by A's parent
so that a A can be deleted immediately if it is terminating.
It is safe to make this change since B could never interact with A if it
was terminating anyway.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 20180322152834.12656-3-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
OOB can enable iothread for parsing even on Windows. We need some tunes
to enable that on Windows otherwise it'll break Windows users. This
patch fixes the breakage on Windows with qemu-system-ppc.exe.
Reported-by: Howard Spoelstra <hsp.cat7@gmail.com>
Tested-by: Howard Spoelstra <hsp.cat7@gmail.com>
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20180322085630.23654-1-peterx@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This patch was generated using the following Coccinelle script:
@@
expression Obj;
@@
(
- qobject_to_qnum(Obj)
+ qobject_to(QNum, Obj)
|
- qobject_to_qstring(Obj)
+ qobject_to(QString, Obj)
|
- qobject_to_qdict(Obj)
+ qobject_to(QDict, Obj)
|
- qobject_to_qlist(Obj)
+ qobject_to(QList, Obj)
|
- qobject_to_qbool(Obj)
+ qobject_to(QBool, Obj)
)
and a bit of manual fix-up for overly long lines and three places in
tests/check-qjson.c that Coccinelle did not find.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-Id: <20180224154033.29559-4-mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
[eblake: swap order from qobject_to(o, X), rebase to master, also a fix
to latent false-positive compiler complaint about hw/i386/acpi-build.c]
Signed-off-by: Eric Blake <eblake@redhat.com>
* SCSI fix to pass maximum transfer size (Daniel Barboza)
* chardev fixes and improved iothread support (Daniel Berrangé, Peter)
* checkpatch tweak (Eric)
* make help tweak (Marc-André)
* make more PCI NICs available with -net or -nic (myself)
* change default q35 NIC to e1000e (myself)
* SCSI support for NDOB bit (myself)
* membarrier system call support (myself)
* SuperIO refactoring (Philippe)
* miscellaneous cleanups and fixes (Thomas)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQEcBAABAgAGBQJapqaMAAoJEL/70l94x66DQoUH/Rvg+a8giz/SrEA4P8D3Cb2z
4GNbNUUoy4oU0ltD5IAMskMwpOsvl1batE0D+pKIlfO9NV4+Cj2kpgo0p9TxoYqM
VCby3wRtx27zb5nVytC6M++iIKXmeEMqXmFw61I6umddNPSl4IR3hiHEE0DM+7dV
UPIOvJeEiazyQaw3Iw+ZctNn8dDBKc/+6oxP9xRcYTaZ6hB4G9RZkqGNNSLcJkk7
R0UotdjzIZhyWMOkjIwlpTF4sWv8gsYUV4bPYKMYho5B0Obda2dBM3I1kpA8yDa/
xZ5lheOaAVBZvM5aMIcaQPa65MO9hLyXFmhMOgyfpJhLBBz6Qpa4OLLI6DeTN+0=
=UAgA
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging
* Record-replay lockstep execution, log dumper and fixes (Alex, Pavel)
* SCSI fix to pass maximum transfer size (Daniel Barboza)
* chardev fixes and improved iothread support (Daniel Berrangé, Peter)
* checkpatch tweak (Eric)
* make help tweak (Marc-André)
* make more PCI NICs available with -net or -nic (myself)
* change default q35 NIC to e1000e (myself)
* SCSI support for NDOB bit (myself)
* membarrier system call support (myself)
* SuperIO refactoring (Philippe)
* miscellaneous cleanups and fixes (Thomas)
# gpg: Signature made Mon 12 Mar 2018 16:10:52 GMT
# gpg: using RSA key BFFBD25F78C7AE83
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>"
# gpg: aka "Paolo Bonzini <pbonzini@redhat.com>"
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1
# Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83
* remotes/bonzini/tags/for-upstream: (69 commits)
tcg: fix cpu_io_recompile
replay: update documentation
replay: save vmstate of the asynchronous events
replay: don't process async events when warping the clock
scripts/replay-dump.py: replay log dumper
replay: avoid recursive call of checkpoints
replay: check return values of fwrite
replay: push replay_mutex_lock up the call tree
replay: don't destroy mutex at exit
replay: make locking visible outside replay code
replay/replay-internal.c: track holding of replay_lock
replay/replay.c: bump REPLAY_VERSION again
replay: save prior value of the host clock
replay: added replay log format description
replay: fix save/load vm for non-empty queue
replay: fixed replay_enable_events
replay: fix processing async events
cpu-exec: fix exception_index handling
hw/i386/pc: Factor out the superio code
hw/alpha/dp264: Use the TYPE_SMC37C669_SUPERIO
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
# Conflicts:
# default-configs/i386-softmmu.mak
# default-configs/x86_64-softmmu.mak
The SocketAddress 'fd' kind accepts the name of a file descriptor passed
to the monitor with the 'getfd' command. This makes it impossible to use
the 'fd' kind in cases where a monitor is not available. This can apply in
handling command line argv at startup, or simply if internal code wants to
use SocketAddress and pass a numeric FD it has acquired from elsewhere.
Fortunately the 'getfd' command mandated that the FD names must not start
with a leading digit. We can thus safely extend semantics of the
SocketAddress 'fd' kind, to allow a purely numeric name to reference an
file descriptor that QEMU already has open. There will be restrictions on
when each kind can be used.
In codepaths where we are handling a monitor command (ie cur_mon != NULL),
we will only support use of named file descriptors as before. Use of FD
numbers is still not permitted for monitor commands.
In codepaths where we are not handling a monitor command (ie cur_mon ==
NULL), we will not support named file descriptors. Instead we can reference
FD numers explicitly. This allows the app spawning QEMU to intentionally
"leak" a pre-opened socket to QEMU and reference that in a SocketAddress
definition, or for code inside QEMU to pass pre-opened FDs around.
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The SocketAddress struct has an "fd" type, which references the name of a
file descriptor passed over the monitor using the "getfd" command. We
currently blindly assume the FD is a socket, which can lead to hard to
diagnose errors later. This adds an explicit check that the FD is actually
a socket to improve the error diagnosis.
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The fd_is_socket() helper method is useful in a few places, so put it in
the common sockets code. Make the code more compact while moving it.
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
There are qemu_strtoNN functions for various sized integers. This adds two
more for plain int & unsigned int types, with suitable range checking.
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
This patch adds saving/restoring of the host clock field 'last'.
It is used in host clock calculation and therefore clock may
become incorrect when using restored vmstate.
Signed-off-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20180227095226.1060.50975.stgit@pasha-VirtualBox>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgaluk@ispras.ru>
Actually enable the global memory barriers if supported by the OS.
Because only recent versions of Linux include the support, they
are disabled by default. Note that it also has to be disabled
for QEMU to run under Wine.
Before this patch, rcutorture reports 85 ns/read for my machine,
after the patch it reports 12.5 ns/read. On the other hand updates
go from 50 *micro*seconds to 20 *milli*seconds.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This new header file provides heavy-weight "global" memory barriers that
enforce memory ordering on each running thread belonging to the current
process. For now, use a dummy implementation that issues memory barriers
on both sides (matching what QEMU has been doing so far).
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Prepare for introducing smp_mb_placeholder() and smp_mb_global().
The new smp_mb() in synchronize_rcu() is not strictly necessary, since
the first atomic_mb_set for rcu_gp_ctr provides the required ordering.
However, synchronize_rcu is not performance critical, and it *will* be
necessary to introduce a smp_mb_global before calling wait_for_readers().
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Nested BDRV_POLL_WHILE() calls can occur. Currently
assert(!wait_->wakeup) fails in AIO_WAIT_WHILE() when this happens.
This patch converts the bool wait_->need_kick flag to an unsigned
wait_->num_waiters counter.
Nesting works correctly because outer AIO_WAIT_WHILE() callers evaluate
the condition again after the inner caller completes (invoking the inner
caller counts as aio_poll() progress).
Reported-by: "fuweiwei (C)" <fuweiwei2@huawei.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20180307124619.6218-1-stefanha@redhat.com
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJaoqOgAAoJEH8JsnLIjy/W2MMP/1gj7CJgtSG9wIzyBHjSWQMy
ofXEgRJO9t/smfUMlH2NdrW8P2LvYcmqOsEBkLJzCtl48fPexwtI/cunzVjutXcf
VlpqKz/8uN4C9D6m8FN/5kKf65l+tnVqnCoJgwafY5uT7jmoC8LF1xO2jo8a+lJd
0Dv6RxJUQq/tDR6OvO6aW4EzbOUcD4wkLvi/uz8+ZjV1BLSLlpdudejr6W9TnJY/
EGFedbxqjPV7fIvMbodbFp0Ie8Aw0WEL8ttERboeR4jbA/o+PZVGpPtHsr/4V6QO
Pgh6vH2rGavxFzwuCWEGhlLKGx66CGqqdTknm6lNJchepCvcfoYxjOPZv9FCaMUs
enC/x43xSkCmkwBwKKxpXqu1vS5nGdMebAwRjstSIplypjv2YOwS1AiU5snaDwuk
t9Gjkw0Wka5nySuYi43H2RPXmlWbh4T8DfQ6pOyJGvXGjm8t+f5BTaMtSWn6Iq2W
F6r1UezQJBDnUbpFgsRg4AP+htPGDHgsOg7KzCCd/lBHwbjX7dkQlAYbBZZ2OBF+
wQN5olDR6jsKIy2IlARNgNweZHW5UQa1cc+7HlVNNE5tqtkjo7aWPk/LhEzBCIHg
sWG3VH2y3lQlaMzYh1v+jnGrFoq1ZJU4sbjaxvQX8czjmaQvPtbzKuZAovQ4pGwa
g0SrWP6p9yLo0LXLuXBP
=WDF4
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging
Block layer patches
# gpg: Signature made Fri 09 Mar 2018 15:09:20 GMT
# gpg: using RSA key 7F09B272C88F2FD6
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>"
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6
* remotes/kevin/tags/for-upstream: (56 commits)
qemu-iotests: fix 203 migration completion race
iotests: Tweak 030 in order to trigger a race condition with parallel jobs
iotests: Skip test for ENOMEM error
iotests: Mark all tests executable
iotests: Test creating overlay when guest running
qemu-iotests: Test ssh image creation over QMP
qemu-iotests: Test qcow2 over file image creation with QMP
block: Fail bdrv_truncate() with negative size
file-posix: Fix no-op bdrv_truncate() with falloc preallocation
ssh: Support .bdrv_co_create
ssh: Pass BlockdevOptionsSsh to connect_to_ssh()
ssh: QAPIfy host-key-check option
ssh: Use QAPI BlockdevOptionsSsh object
sheepdog: Support .bdrv_co_create
sheepdog: QAPIfy "redundancy" create option
nfs: Support .bdrv_co_create
nfs: Use QAPI options in nfs_client_open()
rbd: Use qemu_rbd_connect() in qemu_rbd_do_create()
rbd: Assign s->snap/image_name in qemu_rbd_open()
rbd: Support .bdrv_co_create
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
This patch disables the pragma diagnostic -Wunused-but-set-variable for
clang in util/coroutine-ucontext.c.
This in turn allows us to remove it from the configure check, so the
CONFIG_PRAGMA_DIAGNOSTIC_AVAILABLE will succeed for clang.
With that in place clang builds (linux) will use -Werror by default,
which breaks the build due to warning about unaligned struct members.
Just turning off this warning isn't a good idea as it indicates
portability problems. So make it a warning again, using
-Wno-error=address-of-packed-member. That way it doesn't break the
build but still shows up in the logs.
Now clang builds qemu without errors. Well, almost. There are some
left in the rdma code. Leaving that to the rdma people. All others can
use --disable-rdma to workarounds this.
Cc: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-Id: <20180309135945.20436-1-kraxel@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
This allows, given a QemuOpts for a QemuOptsList that was merged from
multiple QemuOptsList, to only consider those options that exist in one
specific list. Block drivers need this to separate format-layer create
options from protocol-level options.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Sometimes it's necessary for the main loop thread to run a BH in an
IOThread and wait for its completion. This primitive is useful during
startup/shutdown to synchronize and avoid race conditions.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 20180307144205.20619-2-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJanYJPAAoJEH8JsnLIjy/WxjUQAJA+DTOmGXvaNpMs65BrU79K
/r/iGVrzHv/RMLmrWMnqj96W9SnpMuiAP9hVLNsekqClY9q4ME4DpGcXhWfhSvF5
FC51ehvFJdfo8cPorsevcqNj60iWebjcx3lFfUq2606UOyYih3oijYxr6gSwWbRc
GAgdGMqsvGYpzgqAQVEWHUhaX0La49/OzY42aR+E+LCBNfTYvlydvyoc+tUTdIpW
1eM/ASGndGsN0Cf2vxlbKgJ0/P6v+cRZuuIDhKZqre+YG+yM+pq7yZb+o7nf/P36
TPR93BsT7FSVAizRK7VFRuPIynHpiaxYygrJERCXF0sxsV4OlKjpmt/uUPamWFh+
46Jx2NK1AuAx87BdErgmA119ObO3oAPxK0+2p981obb6SphTbbPxDj6SOlYCt4mJ
mhff4JtIiwCmDSckAwd2mkBI1Tvl9qqcELrpyd2t2eU4ec2vf7fPd85EsK/Mq6Kr
dbfqFvjNaaMxChoqFgkHAveYJ7zYqRFI2IY5o9c1QyZehCGPWjScxHXZZYdpDl59
YF9DkYQDOyvEX2jmMECaO1r/0nnO+BqQHu5ItJuTte9rjP9Q0do3iBISiIefewtf
yji6/QNn2hFrnr1HPAwLFFC3kPgc8Mq8mIUb53j8vG/01KhVRCcnJm2K6D4IUwLZ
S6ZnQJB97eE4y7YR5dNt
=2axz
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging
Block layer patches
# gpg: Signature made Mon 05 Mar 2018 17:45:51 GMT
# gpg: using RSA key 7F09B272C88F2FD6
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>"
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6
* remotes/kevin/tags/for-upstream: (38 commits)
block: Fix NULL dereference on empty drive error
qcow2: Replace align_offset() with ROUND_UP()
block/ssh: Add basic .bdrv_truncate()
block/ssh: Make ssh_grow_file() blocking
block/ssh: Pull ssh_grow_file() from ssh_create()
qemu-img: Make resize error message more general
qcow2: make qcow2_co_create2() a coroutine_fn
block: rename .bdrv_create() to .bdrv_co_create_opts()
Revert "IDE: Do not flush empty CDROM drives"
block: test blk_aio_flush() with blk->root == NULL
block: add BlockBackend->in_flight counter
block: extract AIO_WAIT_WHILE() from BlockDriverState
aio: rename aio_context_in_iothread() to in_aio_context_home_thread()
docs: document how to use the l2-cache-entry-size parameter
specs/qcow2: Fix documentation of the compressed cluster descriptor
iotest 033: add misaligned write-zeroes test via truncate
block: fix write with zero flag set and iovector provided
block: Drop unused .bdrv_co_get_block_status()
vvfat: Switch to .bdrv_co_block_status()
vpc: Switch to .bdrv_co_block_status()
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
# Conflicts:
# include/block/block.h
Mostly patches that are only indirectly related to the block layer, but I've
reviewed them and there is no maintainer.
-----BEGIN PGP SIGNATURE-----
iQEcBAABAgAGBQJanRBmAAoJEJykq7OBq3PIzQUIAI7q2BFcNEKjNYQusYDA6gJv
7uYiEx1NkcXUu0DVWlCSENGfvQRbYa46oti3MIgu/imMaYwtahefvqA4R2JEy/GS
xRx+LnzVeQ2d3CYLHd1D3ce2BWAIa9bzRDd25Ux2HfDC46a7v5yDPdeRxQF9xgPQ
CwtPKtr5X/xZXUJR3PoC6QqrZUthN5DXOk4cCLJoks6iLpQlNO3jnoq+RXMl+8II
yj0L2sdvi2ebLsuwndXblEg0qR27X94nz0L1j0GDbHir6I13A6Y66a9qv8mTXz/L
poRWaNTHza6KrapseZ85ed6I/IfNODKR2PCfL/CClsHhi5HWRK54nKS3xaWiBUk=
=8zc8
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/stefanha/tags/block-pull-request' into staging
Pull request
Mostly patches that are only indirectly related to the block layer, but I've
reviewed them and there is no maintainer.
# gpg: Signature made Mon 05 Mar 2018 09:39:50 GMT
# gpg: using RSA key 9CA4ABB381AB73C8
# gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>"
# gpg: aka "Stefan Hajnoczi <stefanha@gmail.com>"
# Primary key fingerprint: 8695 A8BF D3F9 7CDA AC35 775A 9CA4 ABB3 81AB 73C8
* remotes/stefanha/tags/block-pull-request:
README: Document 'git-publish' workflow
Add a git-publish configuration file
tests/libqos: Check for valid dev pointer when looking for PCI devices
util/uri.c: wrap single statement blocks with braces {}
util/uri.c: remove brackets that wrap `return` statement's content.
util/uri.c: Coding style check, Only whitespace involved
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
For this patch, using curly braces to wrap `if` `while` `else` statements,
which only hold single statement. For example:
'''
if (cond)
statement;
'''
to
'''
if (cond) {
statement;
}
'''
And using tricks that compare the disassemblies before and after
code changes, to make sure code logic isn't changed:
'''
git checkout master
make util/uri.o
strip util/uri.o
objdump -Drx util/uri.o > /tmp/uri-master.txt
git checkout cleanupbranch
make util/uri.o
strip util/uri.o
objdump -Drx util/uri.o > /tmp/uri-cleanup.txt
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
only remove brackets that wrap `return` statements' content.
use `perl -pi -e "s/return \((.*?)\);/return \1;/g" util/uri.c`
to remove pattern like this: "return (1);"
Signed-off-by: Su Hang <suhang16@mails.ucas.ac.cn>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-id: 1519533358-13759-3-git-send-email-suhang16@mails.ucas.ac.cn
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Using `clang-format -i util/uri.c` first, then change back few code
manually, to make sure only whitespace involved.
Signed-off-by: Su Hang <suhang16@mails.ucas.ac.cn>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-id: 1519533358-13759-2-git-send-email-suhang16@mails.ucas.ac.cn
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
The previous commit improved compile time by including less of the
generated QAPI headers. This is impossible for stuff defined directly
in qapi-schema.json, because that ends up in headers that that pull in
everything.
Move everything but include directives from qapi-schema.json to new
sub-module qapi/misc.json, then include just the "misc" shard where
possible.
It's possible everywhere, except:
* monitor.c needs qmp-command.h to get qmp_init_marshal()
* monitor.c, ui/vnc.c and the generated qapi-event-FOO.c need
qapi-event.h to get enum QAPIEvent
Perhaps we'll get rid of those some other day.
Adding a type to qapi/migration.json now recompiles some 120 instead
of 2300 out of 5100 objects.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20180211093607.27351-25-armbru@redhat.com>
[eblake: rebase to master]
Signed-off-by: Eric Blake <eblake@redhat.com>
In my "build everything" tree, a change to the types in
qapi-schema.json triggers a recompile of about 4800 out of 5100
objects.
The previous commit split up qmp-commands.h, qmp-event.h, qmp-visit.h,
qapi-types.h. Each of these headers still includes all its shards.
Reduce compile time by including just the shards we actually need.
To illustrate the benefits: adding a type to qapi/migration.json now
recompiles some 2300 instead of 4800 objects. The next commit will
improve it further.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20180211093607.27351-24-armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
[eblake: rebase to master]
Signed-off-by: Eric Blake <eblake@redhat.com>
BlockDriverState has the BDRV_POLL_WHILE() macro to wait on event loop
activity while a condition evaluates to true. This is used to implement
synchronous operations where it acts as a condvar between the IOThread
running the operation and the main loop waiting for the operation. It
can also be called from the thread that owns the AioContext and in that
case it's just a nested event loop.
BlockBackend needs this behavior but doesn't always have a
BlockDriverState it can use. This patch extracts BDRV_POLL_WHILE() into
the AioWait abstraction, which can be used with AioContext and isn't
tied to BlockDriverState anymore.
This feature could be built directly into AioContext but then all users
would kick the event loop even if they signal different conditions.
Imagine an AioContext with many BlockDriverStates, each time a request
completes any waiter would wake up and re-check their condition. It's
nicer to keep a separate AioWait object for each condition instead.
Please see "block/aio-wait.h" for details on the API.
The name AIO_WAIT_WHILE() avoids the confusion between AIO_POLL_WHILE()
and AioContext polling.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
1) string not null terminated in sysfs_find_group_file
2) NULL pointer dereference and dead local variable in nvme_init.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-Id: <20180213015240.9352-1-famz@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Currently only file backed memory backend can
be created with a "share" flag in order to allow
sharing guest RAM with other processes in the host.
Add the "share" flag also to RAM Memory Backend
in order to allow remapping parts of the guest RAM
to different host virtual addresses. This is needed
by the RDMA devices in order to remap non-contiguous
QEMU virtual addresses to a contiguous virtual address range.
Moved the "share" flag to the Host Memory base class,
modified phys_mem_alloc to include the new parameter
and a new interface memory_region_init_ram_shared_nomigrate.
There are no functional changes if the new flag is not used.
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Marcel Apfelbaum <marcel@redhat.com>
Applied using the Coccinelle semantic patch scripts/coccinelle/use_osdep.cocci
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Check for the presence of posix_memalign() in the configure script,
not using "defined(_POSIX_C_SOURCE) && !defined(__sun__)". This
lets qemu use posix_memalign() on NetBSD versions that have it,
instead of falling back to valloc() which is wasteful when the
required alignment is smaller than a page.
Signed-off-by: Andreas Gustafsson <gson@gson.org>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Reviewed-by: Kamil Rytarowski <n54@gmx.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
This cleanup makes the number of objects depending on qapi/qmp/qdict.h
drop from 4550 (out of 4743) to 368 in my "build everything" tree.
For qapi/qmp/qobject.h, the number drops from 4552 to 390.
While there, separate #include from file comment with a blank line.
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20180201111846.21846-13-armbru@redhat.com>
This cleanup makes the number of objects depending on qapi/qmp/qlist.h
drop from 4551 (out of 4743) to 16 in my "build everything" tree.
While there, separate #include from file comment with a blank line.
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20180201111846.21846-12-armbru@redhat.com>
The macro expansions of qdict_put_TYPE() and qlist_append_TYPE() need
qbool.h, qnull.h, qnum.h and qstring.h to compile. We include qnull.h
and qnum.h in the headers, but not qbool.h and qstring.h. Works,
because we include those wherever the macros get used.
Open-coding these helpers is of dubious value. Turn them into
functions and drop the includes from the headers.
This cleanup makes the number of objects depending on qapi/qmp/qnum.h
from 4551 (out of 4743) to 46 in my "build everything" tree. For
qapi/qmp/qnull.h, the number drops from 4552 to 21.
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20180201111846.21846-10-armbru@redhat.com>
qapi/qmp/types.h is a convenience header to include a number of
qapi/qmp/ headers. Since we rarely need all of the headers
qapi/qmp/types.h includes, we bypass it most of the time. Most of the
places that use it don't need all the headers, either.
Include the necessary headers directly, and drop qapi/qmp/types.h.
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20180201111846.21846-9-armbru@redhat.com>
This cleanup makes the number of objects depending on qapi/error.h
drop from 1910 (out of 4743) to 1612 in my "build everything" tree.
While there, separate #include from file comment with a blank line,
and drop a useless comment on why qemu/osdep.h is included first.
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20180201111846.21846-5-armbru@redhat.com>
[Semantic conflict with commit 34e304e975 resolved, OSX breakage fixed]
This is a library to manage the host vfio interface, which could be used
to implement userspace device driver code in QEMU such as NVMe or net
controllers.
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20180116060901.17413-3-famz@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
qemu_co_queue_next does not need to release and re-acquire the mutex,
because the queued coroutine does not run immediately. However, this
does not hold for qemu_co_enter_next. Now that qemu_co_queue_wait
can synchronize (via QemuLockable) with code that is not running in
coroutine context, it's important that code using qemu_co_enter_next
can easily use a standardized locking idiom.
First of all, qemu_co_enter_next must use aio_co_wake to restart the
coroutine. Second, the function gains a second argument, a QemuLockable*,
and the comments of qemu_co_queue_next and qemu_co_queue_restart_all
are adjusted to clarify the difference.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20180203153935.8056-5-pbonzini@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
There are cases in which a queued coroutine must be restarted from
non-coroutine context (with qemu_co_enter_next). In this cases,
qemu_co_enter_next also needs to be thread-safe, but it cannot use
a CoMutex and so cannot qemu_co_queue_wait. Use QemuLockable so
that the CoQueue can interchangeably use CoMutex or QemuMutex.
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20180203153935.8056-4-pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
The .count of HBitmap is forgot to set in function
hbitmap_deserialize_finish, let's set it to the right value.
Cc: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Cc: Fam Zheng <famz@redhat.com>
Cc: Max Reitz <mreitz@redhat.com>
Cc: John Snow <jsnow@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Weiping Zhang <zhangweiping@didichuxing.com>
Signed-off-by: Liang Li <liliangleo@didichuxing.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 20180118131308.GA2181@liangdeMacBook-Pro.local
Signed-off-by: John Snow <jsnow@redhat.com>
Linux commit 749df87bd7bee5a79cef073f5d032ddb2b211de8 (v4.14-rc1)
added a new flag MFD_HUGETLB to memfd_create() that specify the file
to be created resides in the hugetlbfs filesystem. This is the
generic hugetlbfs filesystem not associated with any specific mount
point.
hugetlbfs does not support sealing operations in v4.14, therefore
specifying MFD_ALLOW_SEALING with MFD_HUGETLB will result in EINVAL.
However, I added sealing support in "[PATCH v3 0/9] memfd: add sealing
to hugetlb-backed memory" series, queued in -mm tree for v4.16.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20180201132757.23063-3-marcandre.lureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This will allow callers to silence error report when the call is
allowed to failed.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20180201132757.23063-2-marcandre.lureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
It helps ASAN to detect more leaks on coroutine stacks, and to get rid
of some extra warnings.
Before:
tests/test-coroutine -p
/basic/lifecycle
/basic/lifecycle: ==20781==WARNING: ASan doesn't fully support
makecontext/swapcontext functions and may produce false positives in
some cases!
==20781==WARNING: ASan is ignoring requested __asan_handle_no_return:
stack top: 0x7ffcb184d000; bottom 0x7ff6c4cfd000; size: 0x0005ecb50000
(25446121472)
False positive error reports may follow
For details see https://github.com/google/sanitizers/issues/189
OK
After:
tests/test-coroutine -p /basic/lifecycle
/basic/lifecycle: ==21110==WARNING: ASan doesn't fully support
makecontext/swapcontext functions and may produce false positives in
some cases!
OK
A similar work would need to be done for sigaltstack & windows fibers
to have similar coverage. Since ucontext is preferred, I didn't bother
checking the other coroutine implementations for now.
Update travis to fix the build with ASAN annotations.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20180116151152.4040-4-marcandre.lureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The inet_parse() function looks for 'ipv4' and 'ipv6' flags, but only
treats them as bare bool flags. The normal QemuOpts parsing would allow
on/off values to be set too.
This updates inet_parse() so that its handling of the 'ipv4' and 'ipv6'
flags matches that done by QemuOpts.
This impacts the NBD block driver parsing the legacy filename syntax and
the migration code parsing the socket scheme.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-Id: <20180125171412.21627-1-berrange@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Since commit e5dc1a6c6c, QEMU aborts on exit if completion was used
in the monitor:
*** Error in `obj/ppc64-softmmu/qemu-system-ppc64': double free or
corruption (fasttop): 0x00000100331069d0 ***
/home/greg/Work/qemu/qemu-spapr/util/readline.c:514
/home/greg/Work/qemu/qemu-spapr/monitor.c:586
/home/greg/Work/qemu/qemu-spapr/monitor.c:4125
argv=<optimized out>, envp=<optimized out>) at
/home/greg/Work/qemu/qemu-spapr/vl.c:4795
Completion strings are not persistent accross completions (why would
they?). They are allocated under readline_completion(), which already
takes care of freeing them before returning.
Maybe all completion related bits should be moved out of ReadLineState
to a dedicated structure ?
In the meantime, let's drop the offending lines from readline_free()
to fix the crash.
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <151627206353.4505.4602428849861610759.stgit@bahia.lan>
Fixes: e5dc1a6c6c
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJaZyzMAAoJEH8JsnLIjy/WSOYP/jHffS2fOHtdLQU42G76HN0K
jkhSUE8cuKFzgxuJDJhv10AGsGZvDIaRhuumPIFArQkcEwsFDfd0UqzC5GkCnhfn
6frVPsRSUp9BqXqha1+6vOgVRobdBXPpS25ERfanTsbu3aPEDRnGpxmpMvyyimft
BTUnWNCg2lM6bojXrC6oy7MqUdi9p9PviMcQAfnN07SmGa+s6tS2Jc9znvZwgL06
o+oPukWVTAiub5qcH18BLA3T8xcCXWANdY9pUnNj7mXHoxg3kYzzYBArYDh6Kyju
BkSEML1kNcUACFAZ+LSqQpnoc8/5cP+jY5cOBGtUUgjZSns/xnAZxALltds0I4m3
fqQM68oOTX7squAYAaKYVYMirime6aa2OAn2afxPJildPp8uH4lNust95yiUyyJQ
oqA3zfAnP5FfmTnzjLG7smYlRUlcHp8eMPyOKHxp3BuqTMbWY5KQETyDMk3QVnZr
7fSFIdT4sRTdroKXUKHHu3RLFyCo77EBovxY2oUtt6v43qxQhLx0IFwW6jHrcLK9
ifLOr1CqdgwH/OU7h6rzoLcGLX5/eOTxwcCbU0kP2cx4E60VBXmSaDq9TiwQhbeV
4HteS+EP6R0WpCiAvsFl2aUd6iwDRHeYt0aKpYyUuTVrW2mfmLPaQP+tJLjEoaHF
H5HlbNWy2gFAB2uQtmOd
=gjNZ
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging
Block layer patches
# gpg: Signature made Tue 23 Jan 2018 12:38:36 GMT
# gpg: using RSA key 0x7F09B272C88F2FD6
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>"
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6
* remotes/kevin/tags/for-upstream: (29 commits)
iotests: Disable some tests for compat=0.10
iotests: Split 177 into two parts for compat=0.10
iotests: Make 059 pass on machines with little RAM
iotests: Filter compat-dependent info in 198
iotests: Make 191 work with qcow2 options
iotests: Make 184 image-less
iotests: Make 089 compatible with compat=0.10
iotests: Fix 067 for compat=0.10
iotests: Fix 059's reference output
iotests: Fix 051 for compat=0.10
iotests: Fix 020 for vmdk
iotests: Skip 103 for refcount_bits=1
iotests: Forbid 020 for non-file protocols
iotests: Drop format-specific in _filter_img_info
iotests: Fix _img_info for backslashes
block/vmdk: Add blkdebug events
block/qcow: Add blkdebug events
qcow2: No persistent dirty bitmaps for compat=0.10
block/vmdk: Fix , instead of ; at end of line
qemu-iotests: Fix locking issue in 102
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
This reverts commit f87d72f5c5 as that is
part of a patchset reported to break cleanup and migration.
Cc: Gal Hammer <ghammer@redhat.com>
Cc: Sitong Liu <siliu@redhat.com>
Cc: Xiaoling Gao <xiagao@redhat.com>
Suggested-by: Greg Kurz <groug@kaod.org>
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Reported-by: Jose Ricardo Ziviani <joserz@linux.vnet.ibm.com>
Reported-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com>
We could hit lock failure if there is a signal that makes fcntl return
-1 and errno set to EINTR. In this case we should retry.
Cc: qemu-stable@nongnu.org
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Add a function to only create a memfd, without mmap. The function is
used in the following memory backend.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20171023141815.17709-2-marcandre.lureau@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Adding a cleanup callback function to the EventNotifier struct
which allows users to execute event_notifier_cleanup in a
different context.
Signed-off-by: Gal Hammer <ghammer@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Direct leak of 160 byte(s) in 4 object(s) allocated from:
#0 0x55ed7678cda8 in calloc (/home/elmarco/src/qq/build/x86_64-softmmu/qemu-system-x86_64+0x797da8)
#1 0x7f3f5e725f75 in g_malloc0 /home/elmarco/src/gnome/glib/builddir/../glib/gmem.c:124
#2 0x55ed778aa3a7 in query_option_descs /home/elmarco/src/qq/util/qemu-config.c:60:16
#3 0x55ed778aa307 in get_drive_infolist /home/elmarco/src/qq/util/qemu-config.c:140:19
#4 0x55ed778a9f40 in qmp_query_command_line_options /home/elmarco/src/qq/util/qemu-config.c:254:36
#5 0x55ed76d4868c in qmp_marshal_query_command_line_options /home/elmarco/src/qq/build/qmp-marshal.c:3078:14
#6 0x55ed77855dd5 in do_qmp_dispatch /home/elmarco/src/qq/qapi/qmp-dispatch.c:104:5
#7 0x55ed778558cc in qmp_dispatch /home/elmarco/src/qq/qapi/qmp-dispatch.c:131:11
#8 0x55ed768b592f in handle_qmp_command /home/elmarco/src/qq/monitor.c:3840:11
#9 0x55ed7786ccfe in json_message_process_token /home/elmarco/src/qq/qobject/json-streamer.c:105:5
#10 0x55ed778fe37c in json_lexer_feed_char /home/elmarco/src/qq/qobject/json-lexer.c:323:13
#11 0x55ed778fdde6 in json_lexer_feed /home/elmarco/src/qq/qobject/json-lexer.c:373:15
#12 0x55ed7786cd83 in json_message_parser_feed /home/elmarco/src/qq/qobject/json-streamer.c:124:12
#13 0x55ed768b559e in monitor_qmp_read /home/elmarco/src/qq/monitor.c:3882:5
#14 0x55ed77714f29 in qemu_chr_be_write_impl /home/elmarco/src/qq/chardev/char.c:167:9
#15 0x55ed77714fde in qemu_chr_be_write /home/elmarco/src/qq/chardev/char.c:179:9
#16 0x55ed7772ffad in tcp_chr_read /home/elmarco/src/qq/chardev/char-socket.c:440:13
#17 0x55ed7777113b in qio_channel_fd_source_dispatch /home/elmarco/src/qq/io/channel-watch.c:84:12
#18 0x7f3f5e71d90b in g_main_dispatch /home/elmarco/src/gnome/glib/builddir/../glib/gmain.c:3182
#19 0x7f3f5e71e7ac in g_main_context_dispatch /home/elmarco/src/gnome/glib/builddir/../glib/gmain.c:3847
#20 0x55ed77886ffc in glib_pollfds_poll /home/elmarco/src/qq/util/main-loop.c:214:9
#21 0x55ed778865fd in os_host_main_loop_wait /home/elmarco/src/qq/util/main-loop.c:261:5
#22 0x55ed77886222 in main_loop_wait /home/elmarco/src/qq/util/main-loop.c:515:11
#23 0x55ed76d2a4df in main_loop /home/elmarco/src/qq/vl.c:1995:9
#24 0x55ed76d1cb4a in main /home/elmarco/src/qq/vl.c:4914:5
#25 0x7f3f555f6039 in __libc_start_main (/lib64/libc.so.6+0x21039)
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180104160523.22995-14-marcandre.lureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Fixes leaks such as:
Direct leak of 2 byte(s) in 1 object(s) allocated from:
#0 0x7eff58beb850 in malloc (/lib64/libasan.so.4+0xde850)
#1 0x7eff57942f0c in g_malloc ../glib/gmem.c:94
#2 0x7eff579431cf in g_malloc_n ../glib/gmem.c:331
#3 0x7eff5795f6eb in g_strdup ../glib/gstrfuncs.c:363
#4 0x55db720f1d46 in readline_hist_add /home/elmarco/src/qq/util/readline.c:258
#5 0x55db720f2d34 in readline_handle_byte /home/elmarco/src/qq/util/readline.c:387
#6 0x55db71539d00 in monitor_read /home/elmarco/src/qq/monitor.c:3896
#7 0x55db71f9be35 in qemu_chr_be_write_impl /home/elmarco/src/qq/chardev/char.c:167
#8 0x55db71f9bed3 in qemu_chr_be_write /home/elmarco/src/qq/chardev/char.c:179
#9 0x55db71fa013c in fd_chr_read /home/elmarco/src/qq/chardev/char-fd.c:66
#10 0x55db71fe18a8 in qio_channel_fd_source_dispatch /home/elmarco/src/qq/io/channel-watch.c:84
#11 0x7eff5793a90b in g_main_dispatch ../glib/gmain.c:3182
#12 0x7eff5793b7ac in g_main_context_dispatch ../glib/gmain.c:3847
#13 0x55db720af3bd in glib_pollfds_poll /home/elmarco/src/qq/util/main-loop.c:214
#14 0x55db720af505 in os_host_main_loop_wait /home/elmarco/src/qq/util/main-loop.c:261
#15 0x55db720af6d6 in main_loop_wait /home/elmarco/src/qq/util/main-loop.c:515
#16 0x55db7184e0de in main_loop /home/elmarco/src/qq/vl.c:1995
#17 0x55db7185e956 in main /home/elmarco/src/qq/vl.c:4914
#18 0x7eff4ea17039 in __libc_start_main (/lib64/libc.so.6+0x21039)
(while at it, use g_new0(ReadLineState), it's a bit easier to read)
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20180104160523.22995-11-marcandre.lureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Since there are some issues in memory alloc/free machenism
in glibc for little chunk memory, if Qemu frequently
alloc/free little chunk memory, the glibc doesn't alloc
little chunk memory from free list of glibc and still
allocate from OS, which make the heap size bigger and bigger.
This patch introduce malloc_trim(), which will free heap
memory when there is no rcu call during rcu thread loop.
malloc_trim() can be enabled/disabled by --enable-malloc-trim/
--disable-malloc-trim in the Qemu configure command. The
default malloc_trim() is enabled for libc.
Below are test results from smaps file.
(1)without patch
55f0783e1000-55f07992a000 rw-p 00000000 00:00 0 [heap]
Size: 21796 kB
Rss: 14260 kB
Pss: 14260 kB
(2)with patch
55cc5fadf000-55cc61008000 rw-p 00000000 00:00 0 [heap]
Size: 21668 kB
Rss: 6940 kB
Pss: 6940 kB
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
Message-Id: <1513775806-19779-1-git-send-email-yang.zhong@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
When listening on unix/tcp sockets there was optional code that would update
the original SocketAddress struct with the info about the actual address that
was listened on. Since the conversion of everything to QIOChannelSocket, no
remaining caller made use of this feature. It has been replaced with the ability
to query the listen address after the fact using the function
qio_channel_socket_get_local_address. This is a better model when the input
address can result in listening on multiple distinct sockets.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Message-Id: <20171212111219.32601-1-berrange@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
If we create a thread with QEMU_THREAD_DETACHED mode, QEMU may get a segfault with low probability.
The backtrace is:
#0 0x00007f46c60291d7 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1 0x00007f46c602a8c8 in __GI_abort () at abort.c:90
#2 0x00000000008543c9 in PAT_abort ()
#3 0x000000000085140d in patchIllInsHandler ()
#4 <signal handler called>
#5 pthread_detach (th=139933037614848) at pthread_detach.c:50
#6 0x0000000000829759 in qemu_thread_create (thread=thread@entry=0x7ffdaa8205e0, name=name@entry=0x94d94a "io-task-worker", start_routine=start_routine@entry=0x7eb9a0 <qio_task_thread_worker>,
arg=arg@entry=0x3f5cf70, mode=mode@entry=1) at util/qemu_thread_posix.c:512
#7 0x00000000007ebc96 in qio_task_run_in_thread (task=0x31db2c0, worker=worker@entry=0x7e7e40 <qio_channel_socket_connect_worker>, opaque=0xcd23380, destroy=0x7f1180 <qapi_free_SocketAddress>)
at io/task.c:141
#8 0x00000000007e7f33 in qio_channel_socket_connect_async (ioc=ioc@entry=0x626c0b0, addr=<optimized out>, callback=callback@entry=0x55e080 <qemu_chr_socket_connected>, opaque=opaque@entry=0x42862c0,
destroy=destroy@entry=0x0) at io/channel_socket.c:194
#9 0x000000000055bdd1 in socket_reconnect_timeout (opaque=0x42862c0) at qemu_char.c:4744
#10 0x00007f46c72483b3 in g_timeout_dispatch () from /usr/lib64/libglib-2.0.so.0
#11 0x00007f46c724799a in g_main_context_dispatch () from /usr/lib64/libglib-2.0.so.0
#12 0x000000000076c646 in glib_pollfds_poll () at main_loop.c:228
#13 0x000000000076c6eb in os_host_main_loop_wait (timeout=348000000) at main_loop.c:273
#14 0x000000000076c815 in main_loop_wait (nonblocking=nonblocking@entry=0) at main_loop.c:521
#15 0x000000000056a511 in main_loop () at vl.c:2076
#16 0x0000000000420705 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4940
The cause of this problem is a glibc bug; for more information, see
https://sourceware.org/bugzilla/show_bug.cgi?id=19951.
The solution for this bug is to use pthread_attr_setdetachstate.
There is a similar issue with pthread_setname_np, which is moved
from creating thread to created thread.
Signed-off-by: linzhecheng <linzhecheng@huawei.com>
Message-Id: <20171128044656.10592-1-linzhecheng@huawei.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
[Simplify the code by removing qemu_thread_set_name, and free the arguments
before invoking the start routine. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Recent glibc added memfd_create in sys/mman.h. This conflicts with
the definition in util/memfd.c:
/builddir/build/BUILD/qemu-2.11.0-rc1/util/memfd.c:40:12: error: static declaration of memfd_create follows non-static declaration
Fix the configure test, and remove the sys/memfd.h inclusion since the
file actually does not exist---it is a typo in the memfd_create(2) man
page.
Cc: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
It's going to be useful, in particular, in VMBus code massively using
uuids aka GUIDs.
Signed-off-by: Roman Kagan <rkagan@virtuozzo.com>
Message-Id: <20171127124355.26015-1-rkagan@virtuozzo.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Their last user went away in commit f51074cdc6, "pci-hotplug-old: Has
been dead for five major releases, bury", v2.3.0. Remove them, as new
code should use QemuOpts or maybe keyval_parse() instead.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20171006131645.17729-1-armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
opt was declared as a separate local inside the last loop,
shadowing the local at the top of the function.
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20171005190725.18712-1-dgilbert@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
The AioContext pointer argument to co_aio_sleep_ns() is only used for
the sleep timer. It does not affect where the caller coroutine is
resumed.
Due to changes to coroutine and AIO APIs it is now possible to drop the
AioContext pointer argument. This is safe to do since no caller has
specific requirements for which AioContext the timer must run in.
This patch drops the AioContext pointer argument and renames the
function to simplify the API.
Reported-by: Paolo Bonzini <pbonzini@redhat.com>
Reported-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 20171109102652.6360-1-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
The function searches for next zero bit.
Also add interface for BdrvDirtyBitmap and unit test.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 20171012135313.227864-2-vsementsov@virtuozzo.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
exec: housekeeping (funny since 02d0e09503)
applied using ./scripts/clean-includes
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Acked-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
SPARC Linux has an oddity that it insists that mmap()
of MAP_FIXED memory must be at an alignment defined by
SHMLBA, which is more aligned than the page size
(typically, SHMLBA alignment is to 16K, and pages are 8K).
This is a relic of ancient hardware that had cache
aliasing constraints, but even on modern hardware the
kernel still insists on the alignment.
To ensure that we get mmap() alignment sufficient to
make the kernel happy, change QEMU_VMALLOC_ALIGN,
qemu_fd_getpagesize() and qemu_mempath_getpagesize()
to use the maximum of getpagesize() and SHMLBA.
In particular, this allows 'make check' to pass on Sparc:
we were previously failing the ivshmem tests.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 1512752248-17857-1-git-send-email-peter.maydell@linaro.org
If socket_listen_cleanup is passed an invalid FD, then querying the socket
local address will fail. We must thus be prepared for the returned addr to
be NULL
Reported-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The previous patch fixed a race condition, in which there were
coroutines being executing doubly, or after coroutine deletion.
We can detect common scenarios when this happens, and print an error
message and abort before we corrupt memory / data, or segfault.
This patch will abort if an attempt to enter a coroutine is made while
it is currently pending execution, either in a specific AioContext bh,
or pending execution via a timer. It will also abort if a coroutine
is scheduled, before a prior scheduled run has occurred.
We cannot rely on the existing co->caller check for recursive re-entry
to catch this, as the coroutine may run and exit with
COROUTINE_TERMINATE before the scheduled coroutine executes.
(This is the scenario that was occurring and fixed in the previous
patch).
This patch also re-orders the Coroutine struct elements in an attempt to
optimize caching.
Signed-off-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
stat64_min_slow() and stat64_max_slow() compare the wrong way. This
makes iotest 136 fail with clang and -m32.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20171114232223.25207-1-mreitz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Commit b7a745d added a qemu_bh_cancel call to the completion function
as an optimization to prevent it from unnecessarily rescheduling itself.
This completion function is scheduled from worker_thread, after setting
the state of a ThreadPoolElement to THREAD_DONE.
This was considered to be safe, as the completion function restarts the
loop just after the call to qemu_bh_cancel. But, as this loop lacks a HW
memory barrier, the read of req->state may actually happen _before_ the
call, seeing it still as THREAD_QUEUED, and ending the completion
function without having processed a pending TPE linked at pool->head:
worker thread | I/O thread
------------------------------------------------------------------------
| speculatively read req->state
req->state = THREAD_DONE; |
qemu_bh_schedule(p->completion_bh) |
bh->scheduled = 1; |
| qemu_bh_cancel(p->completion_bh)
| bh->scheduled = 0;
| if (req->state == THREAD_DONE)
| // sees THREAD_QUEUED
The source of the misunderstanding was that qemu_bh_cancel is now being
used by the _consumer_ rather than the producer, and therefore now needs
to have acquire semantics just like e.g. aio_bh_poll.
In some situations, if there are no other independent requests in the
same aio context that could eventually trigger the scheduling of the
completion function, the omitted TPE and all operations pending on it
will get stuck forever.
[Added Sergio's updated wording about the HW memory barrier.
--Stefan]
Signed-off-by: Sergio Lopez <slp@redhat.com>
Message-id: 20171108063447.2842-1-slp@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
If we iterate over the full port range without successfully binding+listening
on the socket, we'll try the next address, whereupon we overwrite the slisten
file descriptor variable without closing it.
Rather than having two places where we open + close socket FDs on different
iterations of nested for loops, re-arrange the code to always open+close
within the same loop iteration.
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
This hunk should not have been merged but I forgot to remove it. Let's
remove it before it slips into a QEMU release.
¯\_(ツ)_/¯
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20171103154041.12617-1-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
NetBSD 8.0(beta) ships with KERN_PROC_PATHNAME in sysctl(2).
Older NetBSD versions can use argv[0] parsing fallback.
This code section is partly shared with FreeBSD.
Signed-off-by: Kamil Rytarowski <n54@gmx.com>
Message-id: 20171028194833.23858-1-n54@gmx.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
gcc warning:
/qemu/util/oslib-posix.c:304:11: error:
variable ‘addr’ might be clobbered by ‘longjmp’ or ‘vfork’
[-Werror=clobbered]
Fix also some related data types:
numpages, hpagesize are used as pointer offset.
Always use size_t for them and also for the derived
numpages_per_thread and size_per_thread.
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Message-id: 20171016202912.1117-1-sw@weilnetz.de
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
If an offset of ports is specified to the inet_listen_saddr function(),
and two or more processes tries to bind from these ports at the same time,
occasionally more than one process may be able to bind to the same
port. The condition is detected by listen() but too late to avoid a failure.
This function is called by socket_listen() and used
by all socket listening code in QEMU, so all cases where any form of dynamic
port selection is used should be subject to this issue.
Add code to close and re-establish the socket when this
condition is observed, hiding the race condition from the user.
Also clean up some issues with error handling to allow more
accurate reporting of the cause of an error.
This has been developed and tested by means of the
test-listen unit test in the previous commit.
Enable the test for make check now that it passes.
Reviewed-by: Bhavesh Davda <bhavesh.davda@oracle.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Reviewed-by: Girish Moodalbail <girish.moodalbail@oracle.com>
Signed-off-by: Knut Omang <knut.omang@oracle.com>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Another refactoring step to prepare for fixing the problem
exposed with the test-listen test in the previous commit
Signed-off-by: Knut Omang <knut.omang@oracle.com>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
A refactoring step to prepare for the problem
exposed by the test-listen test in the previous commit.
Simplify and reorganize the IPv6 specific extra
measures and move it out of the for loop to increase
code readability. No semantic changes.
Signed-off-by: Knut Omang <knut.omang@oracle.com>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
These only depend on the host and therefore belong in the common
osdep, not in a target-dependent object.
While at it, query the host during an init constructor, which guarantees
the page size will be well-defined throughout the execution of the program.
Suggested-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Change qemu_config_parse() to return the number of config groups
in success and -EINVAL on error. This will allow callers of
qemu_config_parse() to check if something was really loaded from
the config file.
All existing callers of qemu_config_parse() and
qemu_read_config_file() only check if the return value was
negative, so the change shouldn't affect them.
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20171004025043.3788-2-ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
The only client of hbitmap_serialization_granularity() is dirty-bitmap's
bdrv_dirty_bitmap_serialization_align(). Keeping the two names consistent
is worthwhile, and the shorter name is more representative of what the
function returns (the required alignment to be used for start/count of
other serialization functions, where violating the alignment causes
assertion failures).
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
After iothread is enabled internally inside QEMU with GMainContext, we
may encounter this warning when destroying the iothread:
(qemu-system-x86_64:19925): GLib-CRITICAL **: g_source_remove_poll:
assertion '!SOURCE_DESTROYED (source)' failed
The problem is that g_source_remove_poll() does not allow to remove one
source from array if the source is detached from its owner
context. (peterx: which IMHO does not make much sense)
Fix it on QEMU side by avoid calling g_source_remove_poll() if we know
the object is during destruction, and we won't leak anything after all
since the array will be gone soon cleanly even with that fd.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-id: 20170928025958.1420-6-peterx@redhat.com
[peterx: write the commit message]
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJZylugAAoJEH8JsnLIjy/Wp1QP/jtwre8WwtlX3SZ82cyiwonc
fODcokF6iEsX7vs9vMr8lkbwF4mjxaU2Xf/knC98J/6rCxM8DRP/MW4BZttTxDI9
++U926cIHHzjtSDalcjfTKnJD7PikHbLXz3SJ+u9hPnIeGs56hybCU6iHrrIcOTs
YSMtD3eUQ8B8JaoegKgNyqkhvfhEdHlWFCq9/T2YEWwYMzGt1jvSTRqe3vr8vIu3
v7QKvh35gm965yaUTGpv7Ej7TOBTWOaYBo9D1TBKNEZOFTBjqyldciyBoDhCF2P9
+4EsNTkZ7u20Rko41dzsZPzhjG10gm/lNZNj3Cul9ta4kQ0hGeOjEujd9L9ANTGl
gwnPPHKwgax5O+ctCPmrU7yHG+XIQD3gckC69qQeRXnPYQ4Jeo/LqhwjU+FcZfHs
97Lz6CHQHgtBP9JJwBMtUp77HJ58KPnnWIxGkb9u2vm4CpvRFkMrx5ekmj//9klX
5niRiqkNdrkEUnu/FIXOXxSmwnlhedAGQNq7ALSoW95El1QCy8Mm0eKEvmHyvZzd
z2gSufLX6ynOaG4x5oY5eezKm6F4Hxwt+w8Svj9PHXIrmrEIop11LG5MVsDGDjyh
XKiLddEIVKTYGwffX0CGTLYA34m2uHPkVVrMOIEvni3G6byXkb10+4pBpuTu/O/h
wQPFraquH1I2B5YETsMa
=7Bt2
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging
Block layer patches
# gpg: Signature made Tue 26 Sep 2017 14:52:32 BST
# gpg: using RSA key 0x7F09B272C88F2FD6
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>"
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6
* remotes/kevin/tags/for-upstream: (24 commits)
block/qcow2-bitmap: fix use of uninitialized pointer
qemu-iotests: add shrinking image test
qcow2: add shrink image support
qcow2: add qcow2_cache_discard
qemu-img: add --shrink flag for resize
iotests: fix 181: enable postcopy-ram capability on target
qemu-iotests: Test change-backing-file command
block: Fix permissions after bdrv_reopen()
block: reopen: Queue children after their parents
block: Base permissions on rw state after reopen
block: Add reopen queue to bdrv_check_perm()
block: Add reopen_queue to bdrv_child_perm()
qemu-io: Drop write permissions before read-only reopen
block: Clean up some bad code in the vvfat driver
block/throttle-groups.c: allocate RestartData on the heap
throttle: Assert that bkt->max is valid in throttle_compute_wait()
iotests: Print full path of bad output if mismatch
iotests: use virtio aliases for 067
iotests: use -ccw on s390x for 051
iotests: use -ccw on s390x for 040, 139, and 182
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
If bkt->max == 0 and bkt->burst_length > 1 then we could have a
division by 0 in throttle_do_compute_wait(). That configuration is
however not permitted and is already detected by throttle_is_valid(),
but let's assert it in throttle_compute_wait() to make it explicit.
Found by Coverity (CID: 1381016).
Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
In qemu-thread-posix.c we have two implementations of the
various qemu_sem_* functions, one of which uses native POSIX
sem_* and the other of which emulates them with pthread conditions.
This is necessary because not all our host OSes support
sem_timedwait().
Instead of a hard-coded list of OSes which don't implement
sem_timedwait(), which gets out of date, make configure
test for the presence of the function and set a new
CONFIG_HAVE_SEM_TIMEDWAIT appropriately.
In particular, newer NetBSDs have sem_timedwait(), so this
commit will switch them over to using it. OSX still does
not have an implementation.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Kamil Rytarowski <n54@gmx.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Provide helpers to convert bitmaps to little endian format. It can be
used when we want to send one bitmap via network to some other hosts.
One thing to mention is that, these helpers only solve the problem of
endianess, but it does not solve the problem of different word size on
machines (the bitmaps managing same count of bits may contains different
size when malloced). So we need to take care of the size alignment issue
on the callers for now.
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Count how many bits set in the bitmap.
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
We have BIT_WORD(). It's the same.
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Convert any remaining uses of fprintf(stderr, "warning:"...
to use warn_report() instead. This helps standardise on a single
method of printing warnings to the user.
All of the warnings were changed using this command:
find ./* -type f -exec sed -i 's|fprintf(.*".*warning[,:] |warn_report("|Ig' {} +
The #include lines and chagnes to the test Makefile were manually
updated to allow the code to compile.
Signed-off-by: Alistair Francis <alistair.francis@xilinx.com>
Message-Id: <2c94ac3bb116cc6b8ebbcd66a254920a69665515.1503077821.git.alistair.francis@xilinx.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Convert all the multi-line uses of fprintf(stderr, "warning:"..."\n"...
to use warn_report() instead. This helps standardise on a single
method of printing warnings to the user.
All of the warnings were changed using these commands:
find ./* -type f -exec sed -i \
'N; {s|fprintf(.*".*warning[,:] \(.*\)\\n"\(.*\));|warn_report("\1"\2);|Ig}' \
{} +
find ./* -type f -exec sed -i \
'N;N; {s|fprintf(.*".*warning[,:] \(.*\)\\n"\(.*\));|warn_report("\1"\2);|Ig}' \
{} +
find ./* -type f -exec sed -i \
'N;N;N; {s|fprintf(.*".*warning[,:] \(.*\)\\n"\(.*\));|warn_report("\1"\2);|Ig}' \
{} +
find ./* -type f -exec sed -i \
'N;N;N;N {s|fprintf(.*".*warning[,:] \(.*\)\\n"\(.*\));|warn_report("\1"\2);|Ig}' \
{} +
find ./* -type f -exec sed -i \
'N;N;N;N;N {s|fprintf(.*".*warning[,:] \(.*\)\\n"\(.*\));|warn_report("\1"\2);|Ig}' \
{} +
find ./* -type f -exec sed -i \
'N;N;N;N;N;N {s|fprintf(.*".*warning[,:] \(.*\)\\n"\(.*\));|warn_report("\1"\2);|Ig}' \
{} +
find ./* -type f -exec sed -i \
'N;N;N;N;N;N;N; {s|fprintf(.*".*warning[,:] \(.*\)\\n"\(.*\));|warn_report("\1"\2);|Ig}' \
{} +
Indentation fixed up manually afterwards.
Some of the lines were manually edited to reduce the line length to below
80 charecters. Some of the lines with newlines in the middle of the
string were also manually edit to avoid checkpatch errrors.
The #include lines were manually updated to allow the code to compile.
Several of the warning messages can be improved after this patch, to
keep this patch mechanical this has been moved into a later patch.
Signed-off-by: Alistair Francis <alistair.francis@xilinx.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Kevin Wolf <kwolf@redhat.com>
Cc: Max Reitz <mreitz@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Peter Maydell <peter.maydell@linaro.org>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Anthony Perard <anthony.perard@citrix.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Aurelien Jarno <aurelien@aurel32.net>
Cc: Yongbok Kim <yongbok.kim@imgtec.com>
Cc: Cornelia Huck <cohuck@redhat.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Alexander Graf <agraf@suse.de>
Cc: Jason Wang <jasowang@redhat.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: Gerd Hoffmann <kraxel@redhat.com>
Acked-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <5def63849ca8f551630c6f2b45bcb1c482f765a6.1505158760.git.alistair.francis@xilinx.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
util/scsi.c includes some SCSI code that is shared by block/iscsi.c and
hw/scsi, but the introduction of the persistent reservation helper
will add many more instances of this. There is also include/block/scsi.h,
which actually is not part of the core block layer.
The persistent reservation manager will also need a home. A scsi/
directory provides one for both the aforementioned shared code and
the PR manager code.
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This recognizes the "fixed" and "descriptor" format sense data, extracts
the sense key/asc/ascq fields then converts them to an errno.
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-Id: <20170821141008.19383-4-famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Tweak the errno mapping to return more accurate/appropriate values.
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-Id: <20170821141008.19383-3-famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
So that it can be reused outside of iscsi.c.
Also update MAINTAINERS to include the new files in SCSI section.
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-Id: <20170821141008.19383-2-famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Nobody has mentioned AIX host support on the mailing list for years,
and we have no test systems for it so it is most likely broken.
We've advertised in configure for two releases now that we plan
to drop support for this host OS, and have had no complaints.
Drop the AIX host support code.
We can also drop the now-unused AIX version of sys_cache_info().
Note that the _CALL_AIX define used in the PPC tcg backend is
also used for Linux PPC64, and so that code should not be removed.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1504545540-8002-1-git-send-email-peter.maydell@linaro.org
ThrottleGroup is converted to an object. This will allow the future
throttle block filter drive easy creation and configuration of throttle
groups in QMP and cli.
A new QAPI struct, ThrottleLimits, is introduced to provide a shared
struct for all throttle configuration needs in QMP.
ThrottleGroups can be created via CLI as
-object throttle-group,id=foo,x-iops-total=100,x-..
where x-* are individual limit properties. Since we can't add non-scalar
properties in -object this interface must be used instead. However,
setting these properties must be disabled after initialization because
certain combinations of limits are forbidden and thus configuration
changes should be done in one transaction. The individual properties
will go away when support for non-scalar values in CLI is implemented
and thus are marked as experimental.
ThrottleGroup also has a `limits` property that uses the ThrottleLimits
struct. It can be used to create ThrottleGroups or set the
configuration in existing groups as follows:
{ "execute": "object-add",
"arguments": {
"qom-type": "throttle-group",
"id": "foo",
"props" : {
"limits": {
"iops-total": 100
}
}
}
}
{ "execute" : "qom-set",
"arguments" : {
"path" : "foo",
"property" : "limits",
"value" : {
"iops-total" : 99
}
}
}
This also means a group's configuration can be fetched with qom-get.
Signed-off-by: Manos Pitsidianakis <el13635@mail.ntua.gr>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The non-blocking connect mechanism is obsolete, and it doesn't
work well in inet connection, because it will call getaddrinfo
first and getaddrinfo will blocks on DNS lookups. Since commit
e65c67e4 & d984464e, the non-blocking connect of migration goes
through QIOChannel in a different manner(using a thread), and
nobody use this old non-blocking connect anymore.
Any newly written code which needs a non-blocking connect should
use the QIOChannel code, so we can drop NonBlockingConnectHandler
as a concept entirely.
Suggested-by: Daniel P. Berrange <berrange@redhat.com>
Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>
Signed-off-by: Mao Zhongyi <maozy.fnst@cn.fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The next commit will put it to use. May look pointless now, but we're
going to change the FOO_lookup's type, and then it'll help.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <1503564371-26090-13-git-send-email-armbru@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
If QEMU is running on a system that's out of memory and mmap()
fails, QEMU aborts with no error message at all, making it hard
to debug the reason for the failure.
Add perror() calls that will print error information before
aborting.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20170829212053.6003-1-ehabkost@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
LeakyBucket.burst_length is defined as an unsigned integer but the
code never checks for overflows and it only makes sure that the value
is not 0.
In practice this means that the user can set something like
throttling.iops-total-max-length=4294967300 despite being larger than
UINT_MAX and the final value after casting to unsigned int will be 4.
This patch changes the data type to uint64_t. This does not increase
the storage size of LeakyBucket, and allows us to assign the value
directly from qemu_opt_get_number() or BlockIOThrottle and then do the
checks directly in throttle_is_valid().
The value of burst_length does not have a specific upper limit,
but since the bucket size is defined by max * burst_length we have
to prevent overflows. Instead of going for UINT64_MAX or something
similar this patch reuses THROTTLE_VALUE_MAX, which allows I/O bursts
of 1 GiB/s for 10 days in a row.
Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-id: 1b2e3049803f71cafb2e1fa1be4fb47147a0d398.1503580370.git.berto@igalia.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Both the throttling limits set with the throttling.iops-* and
throttling.bps-* options and their QMP equivalents defined in the
BlockIOThrottle struct are integer values.
Those limits are also reported in the BlockDeviceInfo struct and they
are integers there as well.
Therefore there's no reason to store them internally as double and do
the conversion everytime we're setting or querying them, so this patch
uses uint64_t for those types. Let's also use an unsigned type because
we don't allow negative values anyway.
LeakyBucket.level and LeakyBucket.burst_level do however remain double
because their value changes depending on the fraction of time elapsed
since the previous I/O operation.
Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-id: f29b840422767b5be2c41c2dfdbbbf6c5f8fedf8.1503580370.git.berto@igalia.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
The throttling code can change internally the value of bkt->max if it
hasn't been set by the user. The problem with this is that if we want
to retrieve the original value we have to undo this change first. This
is ugly and unnecessary: this patch removes the throttle_fix_bucket()
and throttle_unfix_bucket() functions completely and moves the logic
to throttle_compute_wait().
Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Manos Pitsidianakis <el13635@mail.ntua.gr>
Message-id: 5b0b9e1ac6eb208d709eddc7b09e7669a523bff3.1503580370.git.berto@igalia.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Use a pointer to the bucket instead of repeating cfg->buckets[i] all
the time. This makes the code more concise and will help us expand the
checks later and save a few line breaks.
Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-id: 763ffc40a26b17d54cf93f5a999e4656049fcf0c.1503580370.git.berto@igalia.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
The way the throttling algorithm works is that requests start being
throttled once the bucket level exceeds the burst limit. When we get
there the bucket leaks at the level set by the user (bkt->avg), and
that leak rate is what prevents guest I/O from exceeding the desired
limit.
If we don't allow bursts (i.e. bkt->max == 0) then we can start
throttling requests immediately. The problem with keeping the
threshold at 0 is that it only allows one request at a time, and as
soon as there's a bit of I/O from the guest every other request will
be throttled and performance will suffer considerably. That can even
make the guest unable to reach the throttle limit if that limit is
high enough, and that happens regardless of the block scheduler used
by the guest.
Increasing that threshold gives flexibility to the guest, allowing it
to perform short bursts of I/O before being throttled. Increasing the
threshold too much does not make a difference in the long run (because
it's the leak rate what defines the actual throughput) but it does
allow the guest to perform longer initial bursts and exceed the
throttle limit for a short while.
A burst value of bkt->avg / 10 allows the guest to perform 100ms'
worth of I/O at the target rate without being throttled.
Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-id: 31aae6645f0d1fbf3860fb2b528b757236f0c0a7.1503580370.git.berto@igalia.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Build time check of OFD lock is not sufficient and can cause image open
errors when the runtime environment doesn't support it.
Add a helper function to probe it at runtime, additionally. Also provide
a qemu_has_ofd_lock() for callers to check the status.
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This reverts commit a59629fcc6.
This is not needed anymore because the IOThread mutex is not
"magic" anymore (need not kick the CPU thread)and also because
fork callbacks are only enabled at the very beginning of
QEMU's execution.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Because of -daemonize, system mode QEMU sometimes needs to fork() and
keep RCU enabled in the child. However, there is a possible deadlock
with synchronize_rcu:
- the CPU thread is inside a RCU critical section and wants to take
the BQL in order to do MMIO
- the monitor thread, which is owning the BQL, calls rcu_init_lock
which tries to take the rcu_sync_lock
- the call_rcu thread has taken rcu_sync_lock in synchronize_rcu, but
synchronize_rcu needs the CPU thread to end the critical section
before returning.
This cannot happen for user-mode emulation, because it does not have
a BQL.
To fix it, assume that system mode QEMU only forks in preparation for
exec (except when daemonizing) and disable pthread_atfork as soon as
the double fork has happened.
Reported-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Tested-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
With the move of some docs/ to docs/devel/ on ac06724a71,
no references were updated.
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Commit 97f4030 changed warning messages from
timestamp-if-enabled progname ":" location "warning: " message
to
"warning: " timestamp-if-enabled progname ":" location message
This regressed qemu-iotests 051. Put "warning: " right back where it
was, along with "info: ".
Reported-by: Kevin Wolf <kwolf@redhat.com>
Cc: Alistair Francis <alistair.francis@xilinx.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <1500449614-16811-1-git-send-email-armbru@redhat.com>
Reviewed-by: Alistair Francis <alistair.francis@xilinx.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Clang 3.9 passes the CONFIG_AVX2_OPT configure test. However, the
supplied <cpuid.h> does not contain the bit_AVX2 define that we use
when detecting whether the routine can be enabled.
Introduce a qemu-specific header that uses the compiler's definition
of __cpuid et al, but supplies any missing bit_* definitions needed.
This avoids introducing any extra ifdefs to util/bufferiszero.c, and
allows quite a few to be removed from tcg/i386/tcg-target.inc.c.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20170719044018.18063-1-rth@twiddle.net
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
On NetBSD the compiler warns:
util/oslib-posix.c: In function 'sigaction_invoke':
util/oslib-posix.c:589:5: warning: missing braces around initializer [-Wmissing-braces]
siginfo_t si = { 0 };
^
util/oslib-posix.c:589:5: warning: (near initialization for 'si.si_pad') [-Wmissing-braces]
because on this platform siginfo_t is defined as
typedef union siginfo {
char si_pad[128]; /* Total size; for future expansion */
struct _ksiginfo _info;
} siginfo_t;
Avoid this warning by initializing the struct with {} instead;
this is a GCC extension but we use it all over the codebase already.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 1500568341-8389-1-git-send-email-peter.maydell@linaro.org
This include was forgotten when splitting cacheinfo.c out of
tcg/ppc/tcg-target.inc.c (see commit b255b2c8).
For a Centos7 host, the include path
<signal.h>
<bits/sigcontext.h>
<asm/sigcontext.h>
<asm/elf.h>
<asm/auxvec.h>
implicitly pulls in the desired AT_* defines.
Not so for Debian Jessie.
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20170711015524.22936-1-f4bug@amsat.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
throttle_config() cancels the timers of the calling BlockBackend. This
doesn't make sense because other BlockBackends in the group remain
untouched. There's no need to cancel the timers in the one specific
BlockBackend so let's not do that. Throttled requests will run as
scheduled and future requests will follow the new configuration. This
also allows a throttle group's configuration to be changed even when it
has no members.
Signed-off-by: Manos Pitsidianakis <el13635@mail.ntua.gr>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Clock type in throttling is currently inferred by the ThrottleTimer's
clock type even though it is a per-ThrottleGroup property; it doesn't
make sense to have different clock types in the same group. Moving this
to a field in ThrottleGroup can simplify some of the throttle functions.
Signed-off-by: Manos Pitsidianakis <el13635@mail.ntua.gr>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
These functions are more efficient in the presence of contention.
qemu_co_rwlock_downgrade also guarantees not to block, which may
be useful in some algorithms too.
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20170629132749.997-3-pbonzini@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Currently if you disable listening on IPv4 addresses, via the
CLI flag ipv4=off, we still mistakenly accept IPv4 clients via
the IPv6 listener socket due to IPV6_V6ONLY flag being unset.
We must ensure IPV6_V6ONLY is always set if ipv4=off
This fixes the following scenarios
-incoming tcp::9000,ipv6=on
-incoming tcp:[::]:9000,ipv6=on
-chardev socket,id=cdev0,host=,port=9000,server,nowait,ipv4=off
-chardev socket,id=cdev0,host=,port=9000,server,nowait,ipv6=on
-chardev socket,id=cdev0,host=::,port=9000,server,nowait,ipv4=off
-chardev socket,id=cdev0,host=::,port=9000,server,nowait,ipv6=on
which all mistakenly accepted IPv4 clients
Acked-by: Gerd Hoffmann <kraxel@gmail.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
When inet_parse() parses the hostname, it is forcing the
has_ipv6 && ipv6 flags if the address contains a ":". This
means that if the user had set the ipv4=on flag, to try to
restrict the listener to just ipv4, an error would not have
been raised. eg
-incoming tcp:[::]:9000,ipv4
should have raised an error because listening for IPv4
on "::" is a non-sensical combination. With this removed,
we now call getaddrinfo() on "::" passing PF_INET and
so getaddrinfo reports an error about the hostname being
incompatible with the requested protocol:
qemu-system-x86_64: -incoming tcp:[::]:9000,ipv4: address resolution
failed for :::9000: Address family for hostname not supported
Likewise it is explicitly setting the has_ipv4 & ipv4
flags when the address contains only digits + '.'. This
has no ill-effect, but also has no benefit, so is removed.
Acked-by: Gerd Hoffmann <kraxel@gmail.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
When binding to an IPv6 socket we currently force the
IPV6_V6ONLY flag to off. This means that the IPv6 socket
will accept both IPv4 & IPv6 sockets when QEMU is launched
with something like
-vnc :::1
While this is good for that case, it is bad for other
cases. For example if an empty hostname is given,
getaddrinfo resolves it to 2 addresses 0.0.0.0 and ::,
in that order. We will thus bind to 0.0.0.0 first, and
then fail to bind to :: on the same port. The same
problem can happen if any other hostname lookup causes
the IPv4 address to be reported before the IPv6 address.
When we get an IPv6 bind failure, we should re-try the
same port, but with IPV6_V6ONLY turned on again, to
avoid clash with any IPv4 listener.
This ensures that
-vnc :1
will bind successfully to both 0.0.0.0 and ::, and also
avoid
-vnc :1,to=2
from mistakenly using a 2nd port for the :: listener.
This is a regression due to commit 396f935 "ui: add ability to
specify multiple VNC listen addresses".
Acked-by: Gerd Hoffmann <kraxel@gmail.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Implement warn_report_err() and warn_reportf_err() functions which
are the same as the error_report_err() and error_reportf_err()
functions except report a warning instead of an error.
Signed-off-by: Alistair Francis <alistair.francis@xilinx.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <276ff93eadc0b01b8243cc61ffc331f77922c0d0.1499866456.git.alistair.francis@xilinx.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Add warn_report(), warn_vreport() for reporting warnings, and
info_report(), info_vreport() for informational messages.
These are implemented them with a helper function factored out of
error_vreport(), suitably generalized. This patch makes no changes
to the output of the original error_report() function.
Signed-off-by: Alistair Francis <alistair.francis@xilinx.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <c89e9980019f296ec9aa38d7689ac4d5c369296d.1499866456.git.alistair.francis@xilinx.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Rename the error_print_loc() function in preparation for using it to
print warnings as well.
Signed-off-by: Alistair Francis <alistair.francis@xilinx.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-Id: <661b215695db878a0aef8401b506fb3da50e981a.1499866456.git.alistair.francis@xilinx.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Add bdrv_dirty_bitmap_deserialize_ones() function, which is needed for
qcow2 bitmap loading, to handle unallocated bitmap parts, marked as
all-ones.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 20170628120530.31251-7-vsementsov@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
Make dirty iter resistant to resetting bits in corresponding HBitmap.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 20170628120530.31251-4-vsementsov@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
Now that qcow & qcow2 are wired up to get encryption keys
via the QCryptoSecret object, nothing is relying on the
interactive prompting for passwords. All the code related
to password prompting can thus be ripped out.
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 20170623162419.26068-17-berrange@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Mao Zhongyi <maozy.fnst@cn.fujitsu.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Clang generates the following warning on aarch64 host:
CC util/cacheinfo.o
/home/pranith/qemu/util/cacheinfo.c:121:48: warning: value size does not match register size specified by the constraint and modifier [-Wasm-operand-widths]
asm volatile("mrs\t%0, ctr_el0" : "=r"(ctr));
^
/home/pranith/qemu/util/cacheinfo.c:121:28: note: use constraint modifier "w"
asm volatile("mrs\t%0, ctr_el0" : "=r"(ctr));
^~
%w0
Constraint modifier 'w' is not (yet?) accepted by gcc. Fix this by increasing the ctr size.
Tested-by: Emilio G. Cota <cota@braap.org>
Reviewed-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
Message-Id: <20170630153946.11997-1-bobby.prani@gmail.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Not all platforms check whether a lock is initialized before used. In
particular Linux seems to be more permissive than OSX.
Check initialization state explicitly in our code to catch such bugs
earlier.
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-Id: <20170704122325.25634-1-famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The original ready < nhandles - 1 can be re-written as ready + 1 <
nhandles. The check was actually incorrect because
WAIT_OBJECT_0 was not subtracted from ready; it worked because
WAIT_OBJECT_0 is zero. After subtracting WAIT_OBJECT_0,
the result is the same condition that we are checking on the first
itteration of the for loop. This means we can remove the if statement
and let the for loop check the code.
Signed-off-by: Alistair Francis <alistair.francis@xilinx.com>
Message-Id: <a14083d681951f3999a0e9314605cb706381ae8d.1498756113.git.alistair.francis@xilinx.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The 'sun_path' field in the sockaddr_un struct is not required
to be NUL termianted, so when reporting an error, we must use
the separate 'path' variable which is guaranteed terminated.
Fixes a bug spotted by coverity that was introduced in
commit ad9579aaa1
Author: Daniel P. Berrange <berrange@redhat.com>
Date: Thu May 25 16:53:00 2017 +0100
sockets: improve error reporting if UNIX socket path is too long
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-Id: <20170626103756.22974-1-berrange@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJZSRWrAAoJEDhwtADrkYZTVOQP/RK8br2A1Cn7LeVG6jnKz5hJ
OqyII77x8I2RachWvnwxQeHPMEVDuz3y2WjL+80s6peXlpR6y/13w7oX0f6aiJuo
+T9khTqMv2I7HsM5UCsXAJpFPHT7r90b4x8nstY80YLGe7lA7L6yk6PGyCxHThwA
mOiTKDw6/Xb/yZGrS2Favrun7juNpAs0Ec1IAkaA8xsEgVkd6tDv281rmHqvibl/
//90VfJp3nHFZ12FCQ1HzA42Eigtmo/fIk9LnAzBoYG0zw0cnzjuv0BNzs/JwuUZ
/VskeD1cViQ4yzFnPpjOavjYjTN854/JTJzm7gZ7dTQ6/l3ykoY6NDE8p1BLuHlC
p2RKkg20EeZlpOEtMQ4g6iyG6EUxaKcEiXmQ31LqN/LJwxTYbo5B5nCHMjrt4gxe
MqFBJQSNsJ7QjZ7Qa7pADMCi/G0m7/0dN8vBqSr4vcbLVvdbw/yb/9s33wXGrUj1
PyXM2ymi+vvSqcXtNXKshsJLxJSJxO1tm2tRIANDTabQ00yxs8dOYnQnbQFR94fp
6nrE2PnjZqgqk69aNDJEbngj6Tgx44nyTr1+Q17juZf9nTCE5QmBE1J0IRoykCJn
E8+T63ZxtIxVV2yLi5xBjmZaZtPyJRGGeUXunA10SuWrHzupEcBuhFhFYd2MFM5L
fsojALN2K3Gdx2+CmAo2
=O9Vv
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/armbru/tags/pull-qapi-2017-06-09-v2' into staging
QAPI patches for 2017-06-09
# gpg: Signature made Tue 20 Jun 2017 13:31:39 BST
# gpg: using RSA key 0x3870B400EB918653
# gpg: Good signature from "Markus Armbruster <armbru@redhat.com>"
# gpg: aka "Markus Armbruster <armbru@pond.sub.org>"
# Primary key fingerprint: 354B C8B3 D7EB 2A6B 6867 4E5F 3870 B400 EB91 8653
* remotes/armbru/tags/pull-qapi-2017-06-09-v2: (41 commits)
tests/qdict: check more get_try_int() cases
console: use get_uint() for "head" property
i386/cpu: use get_uint() for "min-level"/"min-xlevel" properties
numa: use get_uint() for "size" property
pnv-core: use get_uint() for "core-pir" property
pvpanic: use get_uint() for "ioport" property
auxbus: use get_uint() for "addr" property
arm: use get_uint() for "mp-affinity" property
xen: use get_uint() for "max-ram-below-4g" property
pc: use get_uint() for "hpet-intcap" property
pc: use get_uint() for "apic-id" property
pc: use get_uint() for "iobase" property
acpi: use get_uint() for "pci-hole*" properties
acpi: use get_uint() for various acpi properties
acpi: use get_uint() for "acpi-pcihp-io*" properties
platform-bus: use get_uint() for "addr" property
bcm2835_fb: use {get, set}_uint() for "vcram-size" and "vcram-base"
aspeed: use {set, get}_uint() for "ram-size" property
pcihp: use get_uint() for "bsel" property
pc-dimm: make "size" property uint64
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
We would like to use a same QObject type to represent numbers, whether
they are int, uint, or floats. Getters will allow some compatibility
between the various types if the number fits other representations.
Add a few more tests while at it.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20170607163635.17635-7-marcandre.lureau@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
[parse_stats_intervals() simplified a bit, comment in
test_visitor_in_int_overflow() tidied up, suppress bogus warnings]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Add helpers to gather cache info from the host at init-time.
For now, only export the host's I/D cache line sizes, which we
will use to improve cache locality to avoid false sharing.
Suggested-by: Richard Henderson <rth@twiddle.net>
Suggested-by: Geert Martin Ijewski <gm.ijewski@web.de>
Tested-by: Geert Martin Ijewski <gm.ijewski@web.de>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <1496794624-4083-1-git-send-email-cota@braap.org>
[rth: Move all implementations from tcg/ppc/]
Signed-off-by: Richard Henderson <rth@twiddle.net>
This module provides fast paths for 64-bit atomic operations on machines
that only have 32-bit atomic access.
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-Id: <20170605123908.18777-11-pbonzini@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
The 'struct sockaddr_un' only allows 108 bytes for the socket
path.
If the user supplies a path, QEMU uses snprintf() to silently
truncate it when too long. This is undesirable because the user
will then be unable to connect to the path they asked for.
If the user doesn't supply a path, QEMU builds one based on
TMPDIR, but if that leads to an overlong path, it mistakenly
uses error_setg_errno() with a stale errno value, because
snprintf() does not set errno on truncation.
In solving this the code needed some refactoring to ensure we
don't pass 'un.sun_path' directly to any APIs which expect
NUL-terminated strings, because the path is not required to
be terminated.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-Id: <20170525155300.22743-1-berrange@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Submission of requests on linux aio is a bit tricky and can lead to
requests completions on submission path:
44713c9e85 ("linux-aio: Handle io_submit() failure gracefully")
0ed93d84ed ("linux-aio: process completions from ioq_submit()")
That means that any coroutine which has been yielded in order to wait
for completion can be resumed from submission path and be eventually
terminated (freed).
The following use-after-free crash was observed when IO throttling
was enabled:
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7f5813dff700 (LWP 56417)]
virtqueue_unmap_sg (elem=0x7f5804009a30, len=1, vq=<optimized out>) at virtio.c:252
(gdb) bt
#0 virtqueue_unmap_sg (elem=0x7f5804009a30, len=1, vq=<optimized out>) at virtio.c:252
^^^^^^^^^^^^^^
remember the address
#1 virtqueue_fill (vq=0x5598b20d21b0, elem=0x7f5804009a30, len=1, idx=0) at virtio.c:282
#2 virtqueue_push (vq=0x5598b20d21b0, elem=elem@entry=0x7f5804009a30, len=<optimized out>) at virtio.c:308
#3 virtio_blk_req_complete (req=req@entry=0x7f5804009a30, status=status@entry=0 '\000') at virtio-blk.c:61
#4 virtio_blk_rw_complete (opaque=<optimized out>, ret=0) at virtio-blk.c:126
#5 blk_aio_complete (acb=0x7f58040068d0) at block-backend.c:923
#6 coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at coroutine-ucontext.c:78
(gdb) p * elem
$8 = {index = 77, out_num = 2, in_num = 1,
in_addr = 0x7f5804009ad8, out_addr = 0x7f5804009ae0,
in_sg = 0x0, out_sg = 0x7f5804009a50}
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
'in_sg' and 'out_sg' are invalid.
e.g. it is impossible that 'in_sg' is zero,
instead its value must be equal to:
(gdb) p/x 0x7f5804009ad8 + sizeof(elem->in_addr[0]) + 2 * sizeof(elem->out_addr[0])
$26 = 0x7f5804009af0
Seems 'elem' was corrupted. Meanwhile another thread raised an abort:
Thread 12 (Thread 0x7f57f2ffd700 (LWP 56426)):
#0 raise () from /lib/x86_64-linux-gnu/libc.so.6
#1 abort () from /lib/x86_64-linux-gnu/libc.so.6
#2 qemu_coroutine_enter (co=0x7f5804009af0) at qemu-coroutine.c:113
#3 qemu_co_queue_run_restart (co=0x7f5804009a30) at qemu-coroutine-lock.c:60
#4 qemu_coroutine_enter (co=0x7f5804009a30) at qemu-coroutine.c:119
^^^^^^^^^^^^^^^^^^
WTF?? this is equal to elem from crashed thread
#5 qemu_co_queue_run_restart (co=0x7f57e7f16ae0) at qemu-coroutine-lock.c:60
#6 qemu_coroutine_enter (co=0x7f57e7f16ae0) at qemu-coroutine.c:119
#7 qemu_co_queue_run_restart (co=0x7f5807e112a0) at qemu-coroutine-lock.c:60
#8 qemu_coroutine_enter (co=0x7f5807e112a0) at qemu-coroutine.c:119
#9 qemu_co_queue_run_restart (co=0x7f5807f17820) at qemu-coroutine-lock.c:60
#10 qemu_coroutine_enter (co=0x7f5807f17820) at qemu-coroutine.c:119
#11 qemu_co_queue_run_restart (co=0x7f57e7f18e10) at qemu-coroutine-lock.c:60
#12 qemu_coroutine_enter (co=0x7f57e7f18e10) at qemu-coroutine.c:119
#13 qemu_co_enter_next (queue=queue@entry=0x5598b1e742d0) at qemu-coroutine-lock.c:106
#14 timer_cb (blk=0x5598b1e74280, is_write=<optimized out>) at throttle-groups.c:419
Crash can be explained by access of 'co' object from the loop inside
qemu_co_queue_run_restart():
while ((next = QSIMPLEQ_FIRST(&co->co_queue_wakeup))) {
QSIMPLEQ_REMOVE_HEAD(&co->co_queue_wakeup, co_queue_next);
^^^^^^^^^^^^^^^^^^^^
on each iteration 'co' is accessed,
but 'co' can be already freed
qemu_coroutine_enter(next);
}
When 'next' coroutine is resumed (entered) it can in its turn resume
'co', and eventually free it. That's why we see 'co' (which was freed)
has the same address as 'elem' from the first backtrace.
The fix is obvious: use temporary queue and do not touch coroutine after
first qemu_coroutine_enter() is invoked.
The issue is quite rare and happens every ~12 hours on very high IO
and CPU load (building linux kernel with -j512 inside guest) when IO
throttling is enabled. With the fix applied guest is running ~35 hours
and is still alive so far.
Signed-off-by: Roman Pen <roman.penyaev@profitbricks.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20170601160847.23720-1-roman.penyaev@profitbricks.com
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Fam Zheng <famz@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Kevin Wolf <kwolf@redhat.com>
Cc: qemu-devel@nongnu.org
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Assert that the return value is not an error. This issue was found by
Coverity.
CID: 1374831
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
CC: groug@kaod.org
CC: pbonzini@redhat.com
CC: Eric Blake <eblake@redhat.com>
Message-Id: <1494356693-13190-2-git-send-email-sstabellini@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Those are apparently unnecessary includes.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
mapped-file security mode (especially for the virtfs root).
-----BEGIN PGP SIGNATURE-----
iEYEABECAAYFAlktdVYACgkQAvw66wEB28LaTgCfS/bunOy4Wp+I+DO/Gx/5bfNp
7/IAn3qfJYBqRnpfz8KNKRZJ8QG/Bu5X
=KTTf
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/gkurz/tags/for-upstream' into staging
Various bugfixes and code cleanups. Most notably, it fixes metadata handling in
mapped-file security mode (especially for the virtfs root).
# gpg: Signature made Tue 30 May 2017 14:36:22 BST
# gpg: using DSA key 0x02FC3AEB0101DBC2
# gpg: Good signature from "Greg Kurz <groug@kaod.org>"
# gpg: aka "Greg Kurz <groug@free.fr>"
# gpg: aka "Greg Kurz <gkurz@linux.vnet.ibm.com>"
# gpg: aka "Gregory Kurz (Groug) <groug@free.fr>"
# gpg: aka "[jpeg image of size 3330]"
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 2BD4 3B44 535E C0A7 9894 DBA2 02FC 3AEB 0101 DBC2
* remotes/gkurz/tags/for-upstream:
9pfs: local: metadata file for the VirtFS root
9pfs: local: simplify file opening
9pfs: local: resolve special directories in paths
9pfs: check return value of v9fs_co_name_to_path()
util: drop old utimensat() compat code
9pfs: assume utimensat() and futimens() are present
fsdev: fix virtfs-proxy-helper cwd
9pfs: local: fix unlink of alien files in mapped-file mode
9pfs: drop pdu_push_and_notify()
fsdev: don't allow unknown format in marshal/unmarshal
virtio-9p/xen-9p: move 9p specific bits to core 9p code
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Alternates are sum types like unions, but use the JSON type on the
wire / QType in QObject instead of an explicit tag. That's why we
require alternate members to have distinct QTypes.
The recently introduced keyval_parse() (commit d454dbe) can only
produce string scalars. The qobject_input_visitor_new_keyval() input
visitor mostly hides the difference, so code using a QObject input
visitor doesn't have to care whether its input was parsed from JSON or
KEY=VALUE,... The difference leaks for alternates, as noted in commit
0ee9ae7: a non-string, non-enum scalar alternate value can't currently
be expressed.
In part, this is just our insufficiently sophisticated implementation.
Consider alternate type 'GuestFileWhence'. It has an integer member
and a 'QGASeek' member. The latter is an enumeration with values
'set', 'cur', 'end'. The meaning of b=set, b=cur, b=end, b=0, b=1 and
so forth is perfectly obvious. However, our current implementation
falls apart at run time for b=0, b=1, and so forth. Fixable, but not
today; add a test case and a TODO comment.
Now consider an alternate type with a string and an integer member.
What's the meaning of a=42? Is it the string "42" or the integer 42?
Whichever meaning you pick makes the other inexpressible. This isn't
just an implementation problem, it's fundamental. Our current
implementation will pick string.
So far, we haven't needed such alternates. To make sure we stop and
think before we add one that cannot sanely work with keyval_parse(),
let's require alternate members to have sufficiently distinct
representation in KEY=VALUE,... syntax:
* A string member clashes with any other scalar member
* An enumeration member clashes with bool members when it has value
'on' or 'off'.
* An enumeration member clashes with numeric members when it has a
value that starts with '-', '+', or a decimal digit. This is a
rather lazy approximation of the actual number syntax accepted by
the visitor.
Note that enumeration values starting with '-' and '+' are rejected
elsewhere already, but better safe than sorry.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <1495471335-23707-5-git-send-email-armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Now that 9pfs and virtfs-proxy-helper have been converted to utimensat(),
we don't need to keep qemu_utimens() anymore.
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
socket_address_flatten() leaks a SocketAddress when its argument is
null. Happens when opening a ChardevBackend of type 'udp' that is
configured without a local address. Screwed up in commit bd269ebc due
to last minute semantic conflict resolution. Spotted by Coverity.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <1494866344-11013-1-git-send-email-armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Moving the algorithm from print_type_size() into size_to_str() so that
other component can also leverage it. With that, refactor
print_type_size().
The assert() in that logic is removed though, since even UINT64_MAX
would not overflow.
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <1494562661-9063-3-git-send-email-peterx@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
The GThread implementation is not functional enough to actually
run QEMU reliably. While it was potentially useful for debugging,
we have a scripts/qemugdb/coroutine.py to enable tracing of
ucontext coroutines in GDB, so that removes the only reason for
GThread to exist.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Acked-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
They are wrappers of POSIX fcntl "file private locking", with a
convenient "try lock" wrapper implemented with F_OFD_GETLK.
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
SocketAddressLegacy is a simple union, and simple unions are awkward:
they have their variant members wrapped in a "data" object on the
wire, and require additional indirections in C. SocketAddress is the
equivalent flat union. Convert all users of SocketAddressLegacy to
SocketAddress, except for existing external interfaces.
See also commit fce5d53..9445673 and 85a82e8..c5f1ae3.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <1493192202-3184-7-git-send-email-armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
[Minor editing accident fixed, commit message and a comment tweaked]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
The next commit will rename SocketAddressFlat to SocketAddress, and
the commit after that will replace most uses of SocketAddressLegacy by
SocketAddress, replacing most of this commit's renames right back.
Note that checkpatch emits a few "line over 80 characters" warnings.
The long lines are all temporary; the SocketAddressLegacy replacement
will shorten them again.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <1493192202-3184-5-git-send-email-armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
I'm going to flatten SocketAddress: rename SocketAddress to
SocketAddressLegacy, SocketAddressFlat to SocketAddress, eliminate
SocketAddressLegacy except in external interfaces.
inet_parse() returns a newly allocated InetSocketAddress. Lift the
allocation from inet_parse() into its caller socket_parse() to prepare
for flattening SocketAddress.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <1493192202-3184-3-git-send-email-armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
[Straightforward rebase]
I'm going to flatten SocketAddress: rename SocketAddress to
SocketAddressLegacy, SocketAddressFlat to SocketAddress, eliminate
SocketAddressLegacy except in external interfaces.
vsock_parse() returns a newly allocated VsockSocketAddress. Lift the
allocation from vsock_parse() into its caller socket_parse() to
prepare for flattening SocketAddress.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <1493192202-3184-2-git-send-email-armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Noticed while investigating Coccinelle cleanups. There is no need
for a temporary variable when we can use the new macro to do the
same thing with less typing.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20170427215821.19397-9-eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
We now have macros in place to make it less verbose to add a scalar
to QDict and QList, so use them.
Patch created mechanically via:
spatch --sp-file scripts/coccinelle/qobject.cocci \
--macro-file scripts/cocci-macro-file.h --dir . --in-place
then touched up manually to fix a couple of '?:' back to original
spacing, as well as avoiding a long line in monitor.c.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20170427215821.19397-7-eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
but it'll come in the next pull request.
* use GDB XML register description for x86
* use _Static_assert in QEMU_BUILD_BUG_ON
* add "R:" to MAINTAINERS and get_maintainers
* checkpatch improvements
* dump threading fixes
* first part of vhost-user-scsi support
* QemuMutex tracing
* vmw_pvscsi and megasas fixes
* sgabios module update
* use Rev3 (ACPI 2.0) FADT
* deprecate -hdachs
* improve -accel documentation
* hax fix
* qemu-char GSource bugfix
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQEcBAABAgAGBQJZDE+gAAoJEL/70l94x66DIpYH/1IOz3u8ObD8D4Lor07LkCCZ
vWFnTBMgGi9gTL5JQDnukRR3cmNp9EVOtAP5Yf+v+/Xqyq/FNGnoVWxCxEby7LtN
zrIXbsKMCaEcGzRNJFcbKV+KZnzkJrz92J0NHy29ruCK1AsslOXAWf4Qb1MV+fQl
6w2Upsh35usvWCNpFm2o8arzMEmNuE2xJDPKUB11GMrZT6TExq4Zqa8Zj1Ihc0sX
XcDr+eeBmb65Vv3jQLntOhSWAy0Xxf/fDXYTQx+JLHFgvpSOIWMiS+fqIVXtT0bH
0E4hQrBr0qjes8n8+9WGGQW2k8Ak0QlDvrZnQ97hTeV1k6SxW+2ATO2mLeJp9TM=
=5hf2
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'bonzini/tags/for-upstream' into staging
A large set of small patches. I have not included yet vhost-user-scsi,
but it'll come in the next pull request.
* use GDB XML register description for x86
* use _Static_assert in QEMU_BUILD_BUG_ON
* add "R:" to MAINTAINERS and get_maintainers
* checkpatch improvements
* dump threading fixes
* first part of vhost-user-scsi support
* QemuMutex tracing
* vmw_pvscsi and megasas fixes
* sgabios module update
* use Rev3 (ACPI 2.0) FADT
* deprecate -hdachs
* improve -accel documentation
* hax fix
* qemu-char GSource bugfix
# gpg: Signature made Fri 05 May 2017 06:10:40 AM EDT
# gpg: using RSA key 0xBFFBD25F78C7AE83
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>"
# gpg: aka "Paolo Bonzini <pbonzini@redhat.com>"
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1
# Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83
* bonzini/tags/for-upstream: (21 commits)
vhost-scsi: create a vhost-scsi-common abstraction
libvhost-user: replace vasprintf() to fix build
get_maintainer: add subsystem to reviewer output
get_maintainer: --r (list reviewer) is on by default
get_maintainer: it's '--pattern-depth', not '-pattern-depth'
get_maintainer: Teach get_maintainer.pl about the new "R:" tag
MAINTAINERS: Add "R:" tag for self-appointed reviewers
Fix the -accel parameter and the documentation for 'hax'
dump: Acquire BQL around vm_start() in dump thread
hax: Fix memory mapping de-duplication logic
checkpatch: Disallow glib asserts in main code
trace: add qemu mutex lock and unlock trace events
vmw_pvscsi: check message ring page count at initialisation
sgabios: update for "fix wrong video attrs for int 10h,ah==13h"
scsi: avoid an off-by-one error in megasas_mmio_write
vl: deprecate the "-hdachs" option
use _Static_assert in QEMU_BUILD_BUG_ON
target/i386: Add GDB XML register description support
char: Fix removing wrong GSource that be found by fd_in_tag
hw/i386: Build-time assertion on pc/q35 reset register being identical.
...
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Change malloc/strdup/free to g_malloc/g_strdup/g_free in
util/envlist.c.
Remove NULL checks for pointers returned from g_malloc and g_strdup
as they exit in case of failure. Also, update calls to envlist_create
to reflect this.
Free array and array contents returned by envlist_to_environ using
g_free in bsd-user/main.c and linux-user/main.c.
Update comments to reflect change in semantics.
Signed-off-by: Saurav Sachidanand <sauravsachidanand@gmail.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
- LOADPARM machine property, exposed to the guest via SCLP and
diagnose 308
- Use LOADPARM in the s390-ccw bios to select a boot entry
- Fix a crash in the ipl device code when a virtio-scsi-pci device
has been specified
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJZCIm2AAoJEN7Pa5PG8C+v/ckP/3doTOMyiYWQZ91MOKk2UP4W
paX4+hqEmuy68Jpk6/owk9t3Mzlx2c0xLuDQX7QWuifi3Vk1y5kD1DbRNx14t6oy
icqzWh9wSGMT8Suacf44WjFR+V5TDX9DxrB2c/YDSI1UWg6cEJDM6//hZrbkSH+s
Ig4O5rpmhpMNycnDYwSAQXCOZgyqDG8pVdVk0H+nWU/1+9uxVNYEnB8yMpCRSB9a
IaDU7LNqkMNBRjCYWLCfMn+UNuRV7bB33iaM3KmaDkAu+XuCZTQuT4H3r5bV5Ata
UUCmJtfiWUsGJAOaoxbBrNVNWEUW4yZxsjbVceBkYrUHw0xU6ZKCtyA/AhMCDraF
yDwxUbHRQ4rmvqfeyyUGZRMErn7WfdXoPjwWzN3/Cj4DSXSpgz/FTlRmap2B3yqK
1u6lyQFyfTvHgvLVvtfPSQSYGPSGA0/qotzE3C1DoUHZAJtpnYtCRW2mlAd/RxWy
5xUfW6W9p+mE2OHI/zEs47pWe8U4uiG7A+jfEzqKSlMe7g8EGDVvoVFmwL6nPwbD
xhzZAsiw3yK3YSeVcYN0bsrvvP6PUBLpnumU0u1QlZlSOoTv79KlJQSXksvrsqqr
t/fSsXrW+ZUGDuMnK8qr7Fg1gCPmzfA6bo8o5rmWCE+zM2oDln5UMlZy+QhA1J05
HqG+2eHpaCyLCBKZyF9u
=fFQl
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'cohuck/tags/s390x-20170502' into staging
More s390x patches, this time boot related:
- LOADPARM machine property, exposed to the guest via SCLP and
diagnose 308
- Use LOADPARM in the s390-ccw bios to select a boot entry
- Fix a crash in the ipl device code when a virtio-scsi-pci device
has been specified
# gpg: Signature made Tue 02 May 2017 02:29:26 PM BST
# gpg: using RSA key 0xDECF6B93C6F02FAF
# gpg: Good signature from "Cornelia Huck <conny@cornelia-huck.de>"
# gpg: aka "Cornelia Huck <cohuck@kernel.org>"
# gpg: aka "Cornelia Huck <cornelia.huck@de.ibm.com>"
# gpg: aka "Cornelia Huck <huckc@linux.vnet.ibm.com>"
# Primary key fingerprint: C3D0 D66D C362 4FF6 A8C0 18CE DECF 6B93 C6F0 2FAF
* cohuck/tags/s390x-20170502:
hw/s390x/ipl: Fix crash with virtio-scsi-pci device
pc-bios/s390-ccw.img: update image
pc-bios/s390-ccw: add boot entry selection to El Torito routine
pc-bios/s390-ccw: add boot entry selection for ECKD DASD
pc-bios/s390-ccw: provide entry selection on LOADPARM for SCSI disk
pc-bios/s390-ccw: provide a function to interpret LOADPARM value
pc-bios/s390-ccw: get LOADPARM stored in SCP Read Info
pc-bios/s390-ccw: Make ebcdic/ascii conversion public
util/qemu-config: Add loadparm to qemu machine_opts
hw/s390x/sclp: update LOADPARM in SCP Info
hw/s390x/ipl: enable LOADPARM in IPIB for a boot device
hw/s390x: provide loadparm property for the machine
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
These trace events were very useful to help me to understand and find a
reordering issue in vfio, for example:
qemu_mutex_lock locked mutex 0x10905ad8
vfio_region_write (0001:03:00.0:region1+0xc0, 0x2020c, 4)
qemu_mutex_unlock unlocked mutex 0x10905ad8
qemu_mutex_lock locked mutex 0x10905ad8
vfio_region_write (0001:03:00.0:region1+0xc4, 0xa0000, 4)
qemu_mutex_unlock unlocked mutex 0x10905ad8
that also helped me to see the desired result after the fix:
qemu_mutex_lock locked mutex 0x10905ad8
vfio_region_write (0001:03:00.0:region1+0xc0, 0x2000c, 4)
vfio_region_write (0001:03:00.0:region1+0xc4, 0xb0000, 4)
qemu_mutex_unlock unlocked mutex 0x10905ad8
So it could be a good idea to have these traces implemented. It's worth
mentioning that they should be surgically enabled during the debugging,
otherwise it can flood the trace logs with lock/unlock messages.
How to use it:
trace-event qemu_mutex_lock on|off
trace-event qemu_mutex_unlock on|off
or
trace-event qemu_mutex* on|off
Signed-off-by: Jose Ricardo Ziviani <joserz@linux.vnet.ibm.com>
Message-Id: <1493054398-26013-1-git-send-email-joserz@linux.vnet.ibm.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
[Also handle trylock, cond_wait and win32; trace "unlocked" while still
in the critical section, so that "unlocked" always comes before the
next "locked" tracepoint. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Add S390CcwMachineState machine parameter "loadparm" to qemu machine_opts so
libvirt can query for it.
Signed-off-by: Farhan Ali <alifm@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Currently we only print progress information on retrieval of SIGUSR1.
Some systems have a dedicated SIGINFO for this, however, so it should be
handled appropriately if it is available.
Buglink: https://bugs.launchpad.net/qemu/+bug/1662468
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20170207235757.2026-1-mreitz@redhat.com
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20170413160952.29918-1-mreitz@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Throttling has a weird property that throttle_get_config() does not
always return the same throttling settings that were given with
throttle_config(). In other words, the set and get functions aren't
symmetric.
If .max is 0 then the throttling code assigns a default value of .avg /
10 in throttle_config(). This is an implementation detail of the
throttling algorithm. When throttle_get_config() is called the .max
value returned should still be 0.
Users are exposed to this quirk via "info block" or "query-block"
monitor commands. This has caused confusion because it looks like a bug
when an unexpected value is reported.
This patch hides the .max value adjustment in throttle_get_config() and
updates test-throttle.c appropriately.
Reported-by: Nini Gu <ngu@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-id: 20170301115026.22621-4-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
They start the coroutine on the specified context.
Signed-off-by: Fam Zheng <famz@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
It's a variant of qemu_coroutine_enter with an explicit AioContext
parameter.
Signed-off-by: Fam Zheng <famz@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
When running virt-rescue the serial console hangs from time to time.
Virt-rescue runs an ordinary Linux kernel "appliance", but there is
only a single idle process running inside, so the qemu main loop is
largely idle. With virt-rescue >= 1.37 you may be able to observe the
hang by doing:
$ virt-rescue -e ^] --scratch
><rescue> while true; do ls -l /usr/bin; done
The hang in virt-rescue can be resolved by pressing a key on the
serial console.
Possibly with the same root cause, we also observed hangs during very
early boot of regular Linux VMs with a serial console. Those hangs
are extremely rare, but you may be able to observe them by running
this command on baremetal for a sufficiently long time:
$ while libguestfs-test-tool -t 60 >& /tmp/log ; do echo -n . ; done
(Check in /tmp/log that the failure was caused by a hang during early
boot, and not some other reason)
During investigation of this bug, Paolo Bonzini wrote:
> glib is expecting QEMU to use g_main_context_acquire around accesses to
> GMainContext. However QEMU is not doing that, instead it is taking its
> own mutex. So we should add g_main_context_acquire and
> g_main_context_release in the two implementations of
> os_host_main_loop_wait; these should undo the effect of Frediano's
> glib patch.
This patch exactly implements Paolo's suggestion in that paragraph.
This fixes the serial console hang in my testing, across 3 different
physical machines (AMD, Intel Core i7 and Intel Xeon), over many hours
of automated testing. I wasn't able to reproduce the early boot hangs
(but as noted above, these are extremely rare in any case).
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1435432
Reported-by: Richard W.M. Jones <rjones@redhat.com>
Tested-by: Richard W.M. Jones <rjones@redhat.com>
Signed-off-by: Richard W.M. Jones <rjones@redhat.com>
Message-Id: <20170331205133.23906-1-rjones@redhat.com>
[Paolo: this is actually a glib bug: recent glib versions are also
expecting g_main_context_acquire around g_poll---but that is not
documented and probably not even intended].
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
SocketAddress is a simple union, and simple unions are awkward: they
have their variant members wrapped in a "data" object on the wire, and
require additional indirections in C. I intend to limit its use to
existing external interfaces. New ones should use SocketAddressFlat.
I further intend to convert all internal interfaces to
SocketAddressFlat. This helper should go away then.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-id: 1490895797-29094-8-git-send-email-armbru@redhat.com
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
We have quite a few switches over SocketAddressKind. Some have case
labels for all enumeration values, others rely on a default label.
Some abort when the value isn't a valid SocketAddressKind, others
report an error then.
Unify as follows. Always provide case labels for all enumeration
values, to clarify intent. Abort when the value isn't a valid
SocketAddressKind, because the program state is messed up then.
Improve a few error messages while there.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 1490895797-29094-4-git-send-email-armbru@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
Certain features make sense only with certain address families. For
instance, passing file descriptors requires AF_UNIX. Testing
SocketAddress's saddr->type == SOCKET_ADDRESS_KIND_UNIX is obvious,
but problematic: it can't recognize AF_UNIX when type ==
SOCKET_ADDRESS_KIND_FD.
Mark such tests of saddr->type TODO. We may want to check the address
family with getsockname() there.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: Daniel P. Berrange <berrange@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 1490895797-29094-2-git-send-email-armbru@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
Let's set the handles to the underlying facilities to their extremal
value so no accidental misuse can happen, and to make it obvious that the
notifier is dysfunctional. E.g. if we just close an fd but do not touch
the int holding the fd eventually a read/write could succeed again when
the fd gets reused, and corrupt the file addressed by the fd.
Signed-off-by: Halil Pasic <pasic@linux.vnet.ibm.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
We first snprintf() to a fixed buffer, then g_strdup() the result
*boggle*.
Worse, the size of the fixed buffer INET6_ADDRSTRLEN + 5 + 4 is bogus:
the 4 correctly accounts for '[', ']', ':' and '\0', but
INET6_ADDRSTRLEN is not a suitable limit for inet->host, and 5 is not
one for inet->port! They are for host and port in *numeric* form
(exploiting that INET6_ADDRSTRLEN > INET_ADDRSTRLEN), but inet->host
can also be a hostname, and inet->port can be a service name, to be
resolved with getaddrinfo().
Fortunately, the only user so far is the "socket" network backend's
net_socket_connected(), which uses it to initialize a NetSocketState's
info_str[]. info_str[] has considerable more space: 256 instead of
55. So the bug's impact appears to be limited to truncated "info
networks" with the "socket" network backend.
The fix is obvious: use g_strdup_printf().
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <1490268208-23368-1-git-send-email-armbru@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
The multithreaded TCG implementation exposed deadlocks in the win32
condition variables: as implemented, qemu_cond_broadcast waited on
receivers, whereas the pthreads API it was intended to emulate does
not. This was causing a deadlock because broadcast was called while
holding the IO lock, as well as all possible waiters blocked on the
same lock.
This patch replaces all the custom synchronisation code for mutexes
and condition variables with native Windows primitives (SRWlocks and
condition variables) with the same semantics as their POSIX
equivalents. To enable that, it requires a Windows Vista or newer host
OS.
Signed-off-by: Andrey Shedel <ashedel@microsoft.com>
[AB: edited commit message]
Signed-off-by: Andrew Baumann <Andrew.Baumann@microsoft.com>
Message-Id: <20170324220141.10104-1-Andrew.Baumann@microsoft.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This was spotted by Coverity, in case where sysconf(_SC_NPROCESSORS_ONLN)
fails and returns -1. This results in memset_num_threads getting set to -1.
Which we then pass to g_new0().
The patch replaces MAX_MEM_PREALLOC_THREAD_COUNT macro with a function call
get_memset_num_threads() to handle sysconf() failure gracefully. In case
sysconf() fails, we fall back to single threaded.
(Spotted by Coverity, CID 1372465.)
Signed-off-by: Jitendra Kolhe <jitendra.kolhe@hpe.com>
Message-Id: <1490079006-32495-1-git-send-email-jitendra.kolhe@hpe.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <1490014548-15083-5-git-send-email-armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <1490014548-15083-3-git-send-email-armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
qemu-ga's socket activation support was not obeying the LISTEN_PID
environment variable, which avoids that a process uses a socket-activation
file descriptor meant for its parent.
Mess can for example ensue if a process forks a children before consuming
the socket-activation file descriptor and therefore setting O_CLOEXEC
on it.
Luckily, qemu-nbd also got socket activation code, and its copy does
support LISTEN_PID. Some extra fixups are needed to ensure that the
code can be used for both, but that's what this patch does. The
main change is to replace get_listen_fds's "consume" argument with
the FIRST_SOCKET_ACTIVATION_FD macro from the qemu-nbd code.
Cc: "Richard W.M. Jones" <rjones@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
si_band is not found in OpenBSD. It is marked as obsolescent in
POSIX, so we can delete it without any remorse.
Reported-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 20170317152214.6148-1-pbonzini@redhat.com
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Tested-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
commit 3c80ca15 fixed a deadlock scenarion with nested aio_poll invocations.
However, the rescheduling of the completion BH introcuded unnecessary spinning
in the main-loop. On very fast file backends this can even lead to the
"WARNING: I/O thread spun for 1000 iterations" message popping up.
Callgrind reports about 3-4% less instructions with this patch running
qemu-img bench on a ramdisk based VMDK file.
Fixes: 3c80ca158c
Cc: qemu-stable@nongnu.org
Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
When using a memory-backend object with prealloc turned on, QEMU
will memset() the first byte in every memory page to zero. While
this might have been acceptable for memory backends associated
with RAM, this corrupts application data for NVDIMMs.
Instead of setting every page to zero, read the current byte
value and then just write that same value back, so we are not
corrupting the original data. Directly write the value instead
of memset()ing it, since there's no benefit to memset for a
single byte write.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Message-id: 20170303113255.28262-1-berrange@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
icount has become much slower after tcg_cpu_exec has stopped
using the BQL. There is also a latent bug that is masked by
the slowness.
The slowness happens because every occurrence of a QEMU_CLOCK_VIRTUAL
timer now has to wake up the I/O thread and wait for it. The rendez-vous
is mediated by the BQL QemuMutex:
- handle_icount_deadline wakes up the I/O thread with BQL taken
- the I/O thread wakes up and waits on the BQL
- the VCPU thread releases the BQL a little later
- the I/O thread raises an interrupt, which calls qemu_cpu_kick
- the VCPU thread notices the interrupt, takes the BQL to
process it and waits on it
All this back and forth is extremely expensive, causing a 6 to 8-fold
slowdown when icount is turned on.
One may think that the issue is that the VCPU thread is too dependent
on the BQL, but then the latent bug comes in. I first tried removing
the BQL completely from the x86 cpu_exec, only to see everything break.
The only way to fix it (and make everything slow again) was to add a dummy
BQL lock/unlock pair.
This is because in -icount mode you really have to process the events
before the CPU restarts executing the next instruction. Therefore, this
series moves the processing of QEMU_CLOCK_VIRTUAL timers straight in
the vCPU thread when running in icount mode.
The required changes include:
- make the timer notification callback wake up TCG's single vCPU thread
when run from another thread. By using async_run_on_cpu, the callback
can override all_cpu_threads_idle() when the CPU is halted.
- move handle_icount_deadline after qemu_tcg_wait_io_event, so that
the timer notification callback is invoked after the dummy work item
wakes up the vCPU thread
- make handle_icount_deadline run the timers instead of just waking the
I/O thread.
- stop processing the timers in the main loop
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
There is no change for now, because the callback just invokes
qemu_notify_event.
Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This dependency is the wrong way, and we will need util/qemu-timer.h from
sysemu/cpus.h in the next patch.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
If the first timer is exactly at the current value of the clock, the
deadline is met and the timer should fire. This fixes itself on the next
iteration of the loop without icount; with icount, however, execution
of instructions will stop exactly at the deadline and won't proceed.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Suramya Shah <shah.suramya@gmail.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20170310163948.7567-1-shah.suramya@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Using "-mem-prealloc" option for a large guest leads to higher guest
start-up and migration time. This is because with "-mem-prealloc" option
qemu tries to map every guest page (create address translations), and
make sure the pages are available during runtime. virsh/libvirt by
default, seems to use "-mem-prealloc" option in case the guest is
configured to use huge pages. The patch tries to map all guest pages
simultaneously by spawning multiple threads. Currently limiting the
change to QEMU library functions on POSIX compliant host only, as we are
not sure if the problem exists on win32. Below are some stats with
"-mem-prealloc" option for guest configured to use huge pages.
------------------------------------------------------------------------
Idle Guest | Start-up time | Migration time
------------------------------------------------------------------------
Guest stats with 2M HugePage usage - single threaded (existing code)
------------------------------------------------------------------------
64 Core - 4TB | 54m11.796s | 75m43.843s
64 Core - 1TB | 8m56.576s | 14m29.049s
64 Core - 256GB | 2m11.245s | 3m26.598s
------------------------------------------------------------------------
Guest stats with 2M HugePage usage - map guest pages using 8 threads
------------------------------------------------------------------------
64 Core - 4TB | 5m1.027s | 34m10.565s
64 Core - 1TB | 1m10.366s | 8m28.188s
64 Core - 256GB | 0m19.040s | 2m10.148s
-----------------------------------------------------------------------
Guest stats with 2M HugePage usage - map guest pages using 16 threads
-----------------------------------------------------------------------
64 Core - 4TB | 1m58.970s | 31m43.400s
64 Core - 1TB | 0m39.885s | 7m55.289s
64 Core - 256GB | 0m11.960s | 2m0.135s
-----------------------------------------------------------------------
Changed in v2:
- modify number of memset threads spawned to min(smp_cpus, 16).
- removed 64GB memory restriction for spawning memset threads.
Changed in v3:
- limit number of threads spawned based on
min(sysconf(_SC_NPROCESSORS_ONLN), 16, smp_cpus)
- implement memset thread specific siglongjmp in SIGBUS signal_handler.
Changed in v4
- remove sigsetjmp/siglongjmp and SIGBUS unblock/block for main thread
as main thread no longer touches any pages.
- simplify code my returning memset_thread_failed status from
touch_all_pages.
Signed-off-by: Jitendra Kolhe <jitendra.kolhe@hpe.com>
Message-Id: <1487907103-32350-1-git-send-email-jitendra.kolhe@hpe.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Additionally permit non-negative integers as key components. A
dictionary's keys must either be all integers or none. If all keys
are integers, convert the dictionary to a list. The set of keys must
be [0,N].
Examples:
* list.1=goner,list.0=null,list.1=eins,list.2=zwei
is equivalent to JSON [ "null", "eins", "zwei" ]
* a.b.c=1,a.b.0=2
is inconsistent: a.b.c clashes with a.b.0
* list.0=null,list.2=eins,list.2=zwei
has a hole: list.1 is missing
Similar design flaw as for objects: there is no way to denote an empty
list. While interpreting "key absent" as empty list seems natural
(removing a list member from the input string works when there are
multiple ones, so why not when there's just one), it doesn't work:
"key absent" already means "optional list absent", which isn't the
same as "empty list present".
Update the keyval object visitor to use this a.0 syntax in error
messages rather than the usual a[0].
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <1488317230-26248-25-git-send-email-armbru@redhat.com>
[Off-by-one fix squashed in, as per Kevin's review]
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Until now, key components are separated by '.'. This leaves little
room for evolving the syntax, and is incompatible with the __RFQDN_
prefix convention for downstream extensions.
Since key components will be commonly used as QAPI member names by the
QObject input visitor, we can just as well borrow the QAPI naming
rules here: letters, digits, hyphen and period starting with a letter,
with an optional __RFQDN_ prefix for downstream extensions.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <1488317230-26248-20-git-send-email-armbru@redhat.com>
keyval_parse() parses KEY=VALUE,... into a QDict. Works like
qemu_opts_parse(), except:
* Returns a QDict instead of a QemuOpts (d'oh).
* Supports nesting, unlike QemuOpts: a KEY is split into key
fragments at '.' (dotted key convention; the block layer does
something similar on top of QemuOpts). The key fragments are QDict
keys, and the last one's value is updated to VALUE.
* Each key fragment may be up to 127 bytes long. qemu_opts_parse()
limits the entire key to 127 bytes.
* Overlong key fragments are rejected. qemu_opts_parse() silently
truncates them.
* Empty key fragments are rejected. qemu_opts_parse() happily
accepts empty keys.
* It does not store the returned value. qemu_opts_parse() stores it
in the QemuOptsList.
* It does not treat parameter "id" specially. qemu_opts_parse()
ignores all but the first "id", and fails when its value isn't
id_wellformed(), or duplicate (a QemuOpts with the same ID is
already stored). It also screws up when a value contains ",id=".
* Implied value is not supported. qemu_opts_parse() desugars "foo" to
"foo=on", and "nofoo" to "foo=off".
* An implied key's value can't be empty, and can't contain ','.
I intend to grow this into a saner replacement for QemuOpts. It'll
take time, though.
Note: keyval_parse() provides no way to do lists, and its key syntax
is incompatible with the __RFQDN_ prefix convention for downstream
extensions, because it blindly splits at '.', even in __RFQDN_. Both
issues will be addressed later in the series.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <1488317230-26248-4-git-send-email-armbru@redhat.com>
This will probably be my last pull request before the hard freeze. It
has some new work, but that has all been posted in draft before the
soft freeze, so I think it's reasonable to include in qemu-2.9.
This batch has:
* A substantial amount of POWER9 work
* Implements the legacy (hash) MMU for POWER9
* Some more preliminaries for implementing the POWER9 radix
MMU
* POWER9 has_work
* Basic POWER9 compatibility mode handling
* Removal of some premature tests
* Some cleanups and fixes to the existing MMU code to make the
POWER9 work simpler
* A bugfix for TCG multiply adds on power
* Allow pseries guests to access PCIe extended config space
This also includes a code-motion not strictly in ppc code - moving
getrampagesize() from ppc code to exec.c. This will make some future
VFIO improvements easier, Paolo said it was ok to merge via my tree.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJYuOEEAAoJEGw4ysog2bOSHD0P/jBg/qr/4KnsB1KhnlVrB2sP
vy2d3bGGlUWr9Z+CK/PMCRB8ekFgQLjidLIXji6mviUocv6m3WsVrnbLF/oOL/IT
NPMVAffw7q804YVu1Ns9R82d6CIqHTy//bpg69tFMcJmhL9fqPan3wTZZ9JeiyAm
SikqkAHBSW4SxKqg8ApaSqx5L2QTqyfkClR0sLmgM0JtmfJrbobpQ6bMtdPjUZ9L
n2gnpO2vaWCa1SEQrRrdELqvcD8PHkSJapWOBXOkpGWxoeov/PYxOgkpdDUW4qYY
lVLtp1Vd3OB/h3Unqfw32DNiHA5p89hWPX5UybKMgRVL9Cv2/lyY47pcY8XTeNzn
bv84YRbFJeI+GgoEnghmtq+IM8XiW/cr9rWm9wATKfKGcmmFauumALrsffUpHVCM
4hSNgBv5t2V9ptZ+MDlM/Ku+zk9GoqwQ+hemdpVtiyhOtGUPGFBn5YLE4c2DHFxV
+L9JtBnFn8obnssNoz0wL+QvZchT1qUHMhH5CWAanjw9CTDp/YwQ2P01zK+00s9d
4cB7fUG3WNto5eXXEGMaXeDsUEz8z//hTe3j5sVbnHsXi0R3dhv7iryifmx4bUKU
H9EwAc+uNUHbvBy7u6IWg0I8P2n00CCO6JqXijQ92zELJ5j0XhzHUI2dOXn+zyEo
3FZu56LFnSSUBEXuTjq4
=PcNw
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-2.9-20170303' into staging
ppc patch queuye for 2017-03-03
This will probably be my last pull request before the hard freeze. It
has some new work, but that has all been posted in draft before the
soft freeze, so I think it's reasonable to include in qemu-2.9.
This batch has:
* A substantial amount of POWER9 work
* Implements the legacy (hash) MMU for POWER9
* Some more preliminaries for implementing the POWER9 radix
MMU
* POWER9 has_work
* Basic POWER9 compatibility mode handling
* Removal of some premature tests
* Some cleanups and fixes to the existing MMU code to make the
POWER9 work simpler
* A bugfix for TCG multiply adds on power
* Allow pseries guests to access PCIe extended config space
This also includes a code-motion not strictly in ppc code - moving
getrampagesize() from ppc code to exec.c. This will make some future
VFIO improvements easier, Paolo said it was ok to merge via my tree.
# gpg: Signature made Fri 03 Mar 2017 03:20:36 GMT
# gpg: using RSA key 0x6C38CACA20D9B392
# gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>"
# gpg: aka "David Gibson (Red Hat) <dgibson@redhat.com>"
# gpg: aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>"
# gpg: aka "David Gibson (kernel.org) <dwg@kernel.org>"
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E 87DC 6C38 CACA 20D9 B392
* remotes/dgibson/tags/ppc-for-2.9-20170303:
target/ppc: rewrite f[n]m[add,sub] using float64_muladd
spapr: Small cleanup of PPC MMU enums
spapr_pci: Advertise access to PCIe extended config space
target/ppc: Rework hash mmu page fault code and add defines for clarity
target/ppc: Move no-execute and guarded page checking into new function
target/ppc: Add execute permission checking to access authority check
target/ppc: Add Instruction Authority Mask Register Check
hw/ppc/spapr: Add POWER9 to pseries cpu models
target/ppc/POWER9: Add cpu_has_work function for POWER9
target/ppc/POWER9: Add POWER9 pa-features definition
target/ppc/POWER9: Add POWER9 mmu fault handler
target/ppc: Don't gen an SDR1 on POWER9 and rework register creation
target/ppc: Add patb_entry to sPAPRMachineState
target/ppc/POWER9: Add POWERPC_MMU_V3 bit
powernv: Don't test POWER9 CPU yet
exec, kvm, target-ppc: Move getrampagesize() to common code
target/ppc: Add POWER9/ISAv3.00 to compat_table
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
The cast is there because sigbus_handler is invoked via sigfd_handler.
But it feels just wrong to use struct qemu_signalfd_siginfo in the
prototype of a function that is passed to sigaction.
Instead, do a simple-minded conversion of qemu_signalfd_siginfo to
siginfo_t.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
getrampagesize() returns the largest supported page size and mainly
used to know if huge pages are enabled.
However is implemented in target-ppc/kvm.c and not available
in TCG or other architectures.
This renames and moves gethugepagesize() to mmap-alloc.c where
fd-based analog of it is already implemented. This renames and moves
getrampagesize() to exec.c as it seems to be the common place for
helpers like this.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Similarly to allocation, do it from an inline function. This allows
tests to only use the headers for allocation/free of timer.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Commit 75cdcd1 neglected to update tests/qemu-iotests/049.out, and
made the error message for negative size worse. Fix that.
Reported-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
parse_option_size()'s checking for overflow and trailing crap is
wrong. Has always been that way. qemu_strtosz() gets it right, so
use that.
This adds support for size suffixes 'P', 'E', and ignores case for all
suffixes, not just 'k'.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <1487708048-2131-25-git-send-email-armbru@redhat.com>
This will permit its use in parse_option_size().
Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com> (maintainer:X86)
Cc: Kevin Wolf <kwolf@redhat.com> (supporter:Block layer core)
Cc: Max Reitz <mreitz@redhat.com> (supporter:Block layer core)
Cc: qemu-block@nongnu.org (open list:Block layer core)
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <1487708048-2131-24-git-send-email-armbru@redhat.com>
This makes qemu_strtosz(), qemu_strtosz_mebi() and
qemu_strtosz_metric() similar to qemu_strtoi64(), except negative
values are rejected.
Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com> (maintainer:X86)
Cc: Kevin Wolf <kwolf@redhat.com> (supporter:Block layer core)
Cc: Max Reitz <mreitz@redhat.com> (supporter:Block layer core)
Cc: qemu-block@nongnu.org (open list:Block layer core)
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <1487708048-2131-23-git-send-email-armbru@redhat.com>
Change the qemu_strtosz() & friends to return -EINVAL when @endptr is
null and the conversion doesn't consume the string completely.
Matches how qemu_strtol() & friends work.
Only test_qemu_strtosz_simple() passes a null @endptr. No functional
change there, because its conversion consumes the string.
Simplify callers that use @endptr only to fail when it doesn't point
to '\0' to pass a null @endptr instead.
Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com> (maintainer:X86)
Cc: Kevin Wolf <kwolf@redhat.com> (supporter:Block layer core)
Cc: Max Reitz <mreitz@redhat.com> (supporter:Block layer core)
Cc: qemu-block@nongnu.org (open list:Block layer core)
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <1487708048-2131-22-git-send-email-armbru@redhat.com>
Writing QEMU_STRTOSZ_DEFSUFFIX_* instead of '*' gains nothing. Get
rid of these eyesores.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <1487708048-2131-18-git-send-email-armbru@redhat.com>
Most callers of qemu_strtosz_suffix() pass QEMU_STRTOSZ_DEFSUFFIX_B.
Capture the pattern in new qemu_strtosz().
Inline qemu_strtosz_suffix() into its only remaining caller.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <1487708048-2131-17-git-send-email-armbru@redhat.com>
With qemu_strtosz(), no suffix means mebibytes. It's used rarely.
I'm going to add a similar function where no suffix means bytes.
Rename qemu_strtosz() to qemu_strtosz_MiB() to make the name
qemu_strtosz() available for the new function.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <1487708048-2131-16-git-send-email-armbru@redhat.com>
To parse numbers with metric suffixes, we use
qemu_strtosz_suffix_unit(nptr, &eptr, QEMU_STRTOSZ_DEFSUFFIX_B, 1000)
Capture this in a new function for legibility:
qemu_strtosz_metric(nptr, &eptr)
Replace test_qemu_strtosz_suffix_unit() by test_qemu_strtosz_metric().
Rename qemu_strtosz_suffix_unit() to do_strtosz() and give it internal
linkage.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <1487708048-2131-15-git-send-email-armbru@redhat.com>
parse_option_number() fails to check for these errors after
strtoull(). Has always been broken. Fix that.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <1487708048-2131-10-git-send-email-armbru@redhat.com>
Reorder check_strtox_error() to make it obvious that we always store
through a non-null @endptr.
Transform
if (some error) {
error case ...
err = value for error case;
} else {
normal case ...
err = value for normal case;
}
return err;
to
if (some error) {
error case ...
return value for error case;
}
normal case ...
return value for normal case;
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <1487708048-2131-9-git-send-email-armbru@redhat.com>
Name same things the same, different things differently.
* qemu_strtol()'s parameter @nptr is called @p in
check_strtox_error(). Rename the latter.
* qemu_strtol()'s parameter @endptr is called @next in
check_strtox_error(). Rename the latter.
* qemu_strtol()'s variable @p is called @endptr in
check_strtox_error(). Rename both to @ep.
* qemu_strtol()'s variable @err is *negative* errno,
check_strtox_error()'s parameter @err is *positive*. Rename the
latter to @libc_errno.
Same for qemu_strtoul(), qemu_strtoi64(), qemu_strtou64(), of course.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <1487708048-2131-8-git-send-email-armbru@redhat.com>
The name qemu_strtoll() suggests conversion to long long, but it
actually converts to int64_t. Rename to qemu_strtoi64().
The name qemu_strtoull() suggests conversion to unsigned long long,
but it actually converts to uint64_t. Rename to qemu_strtou64().
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <1487708048-2131-7-git-send-email-armbru@redhat.com>
Fixes the following documentation bugs:
* Fails to document that null @nptr is safe.
* Fails to document that we return -EINVAL when no conversion could be
performed (commit 47d4be1).
* Confuses long long with int64_t, and unsigned long long with
uint64_t.
* Claims the unsigned conversions can underflow. They can't.
While there, mark problematic assumptions that int64_t is long long,
and uint64_t is unsigned long long with FIXME comments.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <1487708048-2131-6-git-send-email-armbru@redhat.com>
Plenty of code relies on QemuOpt member @str not being null, including
qemu_opts_print(), qemu_opts_to_qdict(), and callbacks passed to
qemu_opt_foreach().
Begs the question whether it can be null. Only opt_set() creates
QemuOpt. It sets member @str to its argument @value. Passing null
for @value would plant a time bomb. Callers:
* opts_do_parse() can't pass null.
* qemu_opt_set() passes its argument @value. Callers:
- qemu_opts_from_qdict_1() can't pass null
- qemu_opts_set() passes its argument @value, but none of its
callers pass null.
- Many more outside qemu-option.c, but they shouldn't pass null,
either.
Assert member @str isn't null, so that misuse is caught right away.
Simplify parse_option_bool(), parse_option_number() and
parse_option_size() accordingly. Best viewed with whitespace changes
ignored.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <1487708048-2131-3-git-send-email-armbru@redhat.com>
This adds a CoMutex around the existing CoQueue. Because the write-side
can just take CoMutex, the old "writer" field is not necessary anymore.
Instead of removing it altogether, count the number of pending writers
during a read-side critical section and forbid further readers from
entering.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 20170213181244.16297-7-pbonzini@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
All that CoQueue needs in order to become thread-safe is help
from an external mutex. Add this to the API.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 20170213181244.16297-6-pbonzini@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Running a very small critical section on pthread_mutex_t and CoMutex
shows that pthread_mutex_t is much faster because it doesn't actually
go to sleep. What happens is that the critical section is shorter
than the latency of entering the kernel and thus FUTEX_WAIT always
fails. With CoMutex there is no such latency but you still want to
avoid wait and wakeup. So introduce it artificially.
This only works with one waiters; because CoMutex is fair, it will
always have more waits and wakeups than a pthread_mutex_t.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 20170213181244.16297-3-pbonzini@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
This uses the lock-free mutex described in the paper '"Blocking without
Locking", or LFTHREADS: A lock-free thread library' by Gidenstam and
Papatriantafilou. The same technique is used in OSv, and in fact
the code is essentially a conversion to C of OSv's code.
[Added missing coroutine_fn in tests/test-aio-multithread.c.
--Stefan]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 20170213181244.16297-2-pbonzini@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Pull the increment/decrement pair out of aio_bh_poll and into the
callers.
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 20170213135235.12274-18-pbonzini@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
This patch prepares for the removal of unnecessary lockcnt inc/dec pairs.
Extract the dispatching loop for file descriptor handlers into a new
function aio_dispatch_handlers, and then inline aio_dispatch into
aio_poll.
aio_dispatch can now become void.
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 20170213135235.12274-17-pbonzini@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 20170213135235.12274-16-pbonzini@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 20170213135235.12274-15-pbonzini@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
This covers both file descriptor callbacks and polling callbacks,
since they execute related code.
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 20170213135235.12274-14-pbonzini@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 20170213135235.12274-13-pbonzini@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
The AioContext data structures are now protected by list_lock and/or
they are walked with FOREACH_RCU primitives. There is no need anymore
to acquire the AioContext for the entire duration of aio_dispatch.
Instead, just acquire it before and after invoking the callbacks.
The next step is then to push it further down.
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 20170213135235.12274-12-pbonzini@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
As a small step towards the introduction of multiqueue, we want
coroutines to remain on the same AioContext that started them,
unless they are moved explicitly with e.g. aio_co_schedule. This patch
avoids that coroutines switch AioContext when they use a CoMutex.
For now it does not make much of a difference, because the CoMutex
is not thread-safe and the AioContext itself is used to protect the
CoMutex from concurrent access. However, this is going to change.
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 20170213135235.12274-9-pbonzini@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
aio_co_wake provides the infrastructure to start a coroutine on a "home"
AioContext. It will be used by CoMutex and CoQueue, so that coroutines
don't jump from one context to another when they go to sleep on a
mutex or waitqueue. However, it can also be used as a more efficient
alternative to one-shot bottom halves, and saves the effort of tracking
which AioContext a coroutine is running on.
aio_co_schedule is the part of aio_co_wake that starts a coroutine
on a remove AioContext, but it is also useful to implement e.g.
bdrv_set_aio_context callbacks.
The implementation of aio_co_schedule is based on a lock-free
multiple-producer, single-consumer queue. The multiple producers use
cmpxchg to add to a LIFO stack. The consumer (a per-AioContext bottom
half) grabs all items added so far, inverts the list to make it FIFO,
and goes through it one item at a time until it's empty. The data
structure was inspired by OSv, which uses it in the very code we'll
"port" to QEMU for the thread-safe CoMutex.
Most of the new code is really tests.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 20170213135235.12274-3-pbonzini@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
AioContext is fairly self contained, the only dependency is QEMUTimer but
that in turn doesn't need anything else. So move them out of block-obj-y
to avoid introducing a dependency from io/ to block-obj-y.
main-loop and its dependency iohandler also need to be moved, because
later in this series io/ will call iohandler_get_aio_context.
[Changed copyright "the QEMU team" to "other QEMU contributors" as
suggested by Daniel Berrange and agreed by Paolo.
--Stefan]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 20170213135235.12274-2-pbonzini@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
The member VMStateField.start is used for two things, partial data
migration for VBUFFER data (basically provide migration for a
sub-buffer) and for locating next in QTAILQ.
The implementation of the VBUFFER feature is broken when VMSTATE_ALLOC
is used. This however goes unnoticed because actually partial migration
for VBUFFER is not used at all.
Let's consolidate the usage of VMStateField.start by removing support
for partial migration for VBUFFER.
Signed-off-by: Halil Pasic <pasic@linux.vnet.ibm.com>
Message-Id: <20170203175217.45562-1-pasic@linux.vnet.ibm.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
To iterate over all QemuOpts currently requires using a callback
function which is inconvenient for control flow. Add support for
using iterator functions more directly
QemuOptsIter iter;
QemuOpt *opt;
qemu_opts_iter_init(&iter, opts, "repeated-key");
while ((opt = qemu_opts_iter_next(&iter)) != NULL) {
....do something...
}
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 20170203120649.15637-8-berrange@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Implements 128-bit left shift and right shift as well as their
testcases. By design, shift silently mods by 128, so the caller is
responsible to assert the shift range if necessary.
Left shift sets the overflow flag if any non-zero digit is shifted out.
Examples:
ulshift(&low, &high, 250, &overflow);
equivalent: n << 122
urshift(&low, &high, -2);
equivalent: n << 126
Signed-off-by: Jose Ricardo Ziviani <joserz@linux.vnet.ibm.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
[dwg: Added test-shift128 to .gitignore]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
It is not possible to implement functions in host-utils.c for
architectures with quadwords because the guard is implemented in the
Makefile. This patch move the guard out of the Makefile to the
implementation file.
Signed-off-by: Jose Ricardo Ziviani <joserz@linux.vnet.ibm.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Bitmaps with a granularity of 58 or above can be neither serialized nor
deserialized (see the comment in the function added in this series for
an explanation). This patch adds a function so that we can check whether
a bitmap actually can be (de-)serialized at all, thus avoiding failing
the necessary assertion in hbitmap_serialization_granularity().
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20161115225746.3590-2-mreitz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
A fix has been committed in upstream glib commit
210a9796f78eb90f76f1bd6a304e9fea05e97617.
(See also related bug https://bugzilla.gnome.org/show_bug.cgi?id=764415)
It is desirable to use the glib version instead of qemu copy, since it
provides more debugging facilities (G_MAIN_POLL_DEBUG etc), and
hopefully has a better maintainance. Hopefully, we can drop the qemu
copy in a few years.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Reviewed-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
1st mmap returns *ptr* which aligns to host page size,
| size + align |
------------------------------------------
ptr
input param *align* could be 1M, or 2M, or host page size. After
QEMU_ALIGN_UP, offset will >= 0
2nd mmap use flag MAP_FIXED, then it return ptr+offset, or else fail.
If it success, then we will have something like:
| offset | size |
--------------------------------------
ptr ptr1
*ptr1* is what we really want to return, it equals ptr+offset.
Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Currently DNS resolution is done automatically as part
of the creation of a QIOChannelSocket object instance.
This works ok for network clients where you just end
up a single network socket, but for servers, the results
of DNS resolution may require creation of multiple
sockets.
Introducing a DNS resolver API allows DNS resolution
to be separated from the socket object creation. This
will make it practical to create multiple QIOChannelSocket
instances for servers.
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Add a 'numeric' flag to the InetSocketAddress struct to allow the
caller to indicate that DNS should be skipped for the host/port
fields. This is useful if the caller knows the address is already
numeric and wants to guarantee no (potentially blocking) DNS
lookups are attempted.
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
commit e1123a3b introduced a data corruption regression
in the iscsi driver because it passed -1 as nr to bitmap_set
and bitmap_clear. Add an assertion to catch such flaws earlier.
Suggested-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Peter Lieven <pl@kamp.de>
Message-Id: <1484844230-24490-1-git-send-email-pl@kamp.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Use the Intel HAX is kernel-based hardware acceleration module for
Windows (similar to KVM on Linux).
Based on the "target/i386: Add Intel HAX to android emulator" patch
from David Chou <david.j.chou@intel.com>
Signed-off-by: Vincent Palatin <vpalatin@chromium.org>
Message-Id: <7b9cae28a0c379ab459c7a8545c9a39762bd394f.1484045952.git.vpalatin@chromium.org>
[Drop hax_populate_ram stub. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
QEMU will crash with the follow backtrace if the new created thread exited before
we call qemu_thread_set_name() for it.
(gdb) bt
#0 0x00007f9a68b095d7 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1 0x00007f9a68b0acc8 in __GI_abort () at abort.c:90
#2 0x00007f9a69cda389 in PAT_abort () from /usr/lib64/libuvpuserhotfix.so
#3 0x00007f9a69cdda0d in patchIllInsHandler () from /usr/lib64/libuvpuserhotfix.so
#4 <signal handler called>
#5 pthread_setname_np (th=140298470549248, name=name@entry=0x8cc74a "io-task-worker") at ../nptl/sysdeps/unix/sysv/linux/pthread_setname.c:49
#6 0x00000000007f5f20 in qemu_thread_set_name (thread=thread@entry=0x7ffd2ac09680, name=name@entry=0x8cc74a "io-task-worker") at util/qemu_thread_posix.c:459
#7 0x00000000007f679e in qemu_thread_create (thread=thread@entry=0x7ffd2ac09680, name=name@entry=0x8cc74a "io-task-worker",start_routine=start_routine@entry=0x7c1300 <qio_task_thread_worker>, arg=arg@entry=0x7f99b8001720, mode=mode@entry=1) at util/qemu_thread_posix.c:498
#8 0x00000000007c15b6 in qio_task_run_in_thread (task=task@entry=0x7f99b80033d0, worker=worker@entry=0x7bd920 <qio_channel_socket_connect_worker>, opaque=0x7f99b8003370, destroy=0x7c6220 <qapi_free_SocketAddress>) at io/task.c:133
#9 0x00000000007bda04 in qio_channel_socket_connect_async (ioc=0x7f99b80014c0, addr=0x37235d0, callback=callback@entry=0x54ad00 <qemu_chr_socket_connected>, opaque=opaque@entry=0x38118b0, destroy=destroy@entry=0x0) at io/channel_socket.c:191
#10 0x00000000005487f6 in socket_reconnect_timeout (opaque=0x38118b0) at qemu_char.c:4402
#11 0x00007f9a6a1533b3 in g_timeout_dispatch () from /usr/lib64/libglib-2.0.so.0
#12 0x00007f9a6a15299a in g_main_context_dispatch () from /usr/lib64/libglib-2.0.so.0
#13 0x0000000000747386 in glib_pollfds_poll () at main_loop.c:227
#14 0x0000000000747424 in os_host_main_loop_wait (timeout=404000000) at main_loop.c:272
#15 0x0000000000747575 in main_loop_wait (nonblocking=nonblocking@entry=0) at main_loop.c:520
#16 0x0000000000557d31 in main_loop () at vl.c:2170
#17 0x000000000041c8b7 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:5083
Let's detach the new thread after calling qemu_thread_set_name().
Signed-off-by: Caoxinhua <caoxinhua@huawei.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Message-Id: <1483493521-9604-1-git-send-email-zhang.zhanghailiang@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Remove the useless is_external argument. Since the iohandler
AioContext is never used for block devices, aio_disable_external
is never called on it. This lets us remove stubs/iohandler.c.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This is complex, but I think it is reasonably documented in the source.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20170112180800.21085-5-pbonzini@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
A QemuLockCnt comprises a counter and a mutex, with primitives
to increment and decrement the counter, and to take and release the
mutex. It can be used to do lock-free visits to a data structure
whenever mutexes would be too heavy-weight and the critical section
is too long for RCU.
This could be implemented simply by protecting the counter with the
mutex, but QemuLockCnt is harder to misuse and more efficient.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20170112180800.21085-3-pbonzini@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
In the context of asynchronous work, if we have a worker coroutine that
didn't yield, the parent coroutine cannot be reentered because it hasn't
yielded yet. In this case we don't even have to reenter the parent
because it will see that the work is already done and won't even yield.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
The new AioPollFn io_poll() argument to aio_set_fd_handler() and
aio_set_event_handler() is used in the next patch.
Keep this code change separate due to the number of files it touches.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 20161201192652.9509-3-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
An hbitmap's granularity may be anything from 0 to 63, so when shifting
constants by its value, they should not be plain ints.
Even having changed the types, hbitmap_serialization_granularity() still
tries to shift 64 to the right by the granularity. This operation is
undefined if the granularity is greater than 57. Adding an assertion is
fine for now, because serializing is done only in tests so far, but this
means that only bitmaps with a granularity below 58 can be serialized
and we should thus add a hbitmap_is_serializable() function later.
One of the two places touched in this patch uses
QEMU_ALIGN_UP(x, 1 << y). We can use ROUND_UP() there, since the second
parameter is obviously a power of two.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20161115224732.1334-1-mreitz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Include sys/user.h for declaration of 'struct kinfo_proc'.
Add -lutil to qemu-ga link for kinfo_getproc.
Signed-off-by: Ed Maste <emaste@freebsd.org>
Message-id: 1479778365-11315-1-git-send-email-emaste@freebsd.org
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Leave the implementation of error_vprintf and error_vprintf_unless_qmp
(the latter now trivially wrapped by error_printf_unless_qmp) to
libqemustub.a and monitor.c. This has two advantages: it lets us
remove the monitor_printf and monitor_vprintf stubs, and it lets
tests provide a different implementation of the functions that uses
g_test_message.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <1477326663-67817-2-git-send-email-pbonzini@redhat.com>
Add the AF_VSOCK address family so that qemu-ga will be able to use
virtio-vsock.
The AF_VSOCK address family uses <cid, port> address tuples. The cid is
the unique identifier comparable to an IP address. AF_VSOCK does not
use name resolution so it's easy to convert between struct sockaddr_vm
and strings.
This patch defines a VsockSocketAddress instead of trying to piggy-back
on InetSocketAddress. This is cleaner in the long run since it avoids
lots of IPv4 vs IPv6 vs vsock special casing.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* treat trailing commas as garbage when parsing (Eric Blake)
* add configure check instead of checking AF_VSOCK directly
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Make inet_connect_saddr() in util/qemu-sockets.c public in order to be
able to use it with InetSocketAddress sockets outside of
util/qemu-sockets.c independently.
Signed-off-by: Ashijeet Acharya <ashijeetacharya@gmail.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Some files contain multiple #includes of the same header file.
Removed most of those unnecessary duplicate entries using
scripts/clean-includes.
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Anand J <anand.indukala@gmail.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
It is simpler and a bit faster, and QEMU does not need the contention
callbacks (and thus the fairness) anymore.
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <1477565348-5458-21-git-send-email-pbonzini@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
GRecMutex is new in glib 2.32, so we cannot use it. Introduce
a recursive mutex in qemu-thread instead, which will be used
instead of RFifoLock.
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <1477565348-5458-20-git-send-email-pbonzini@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>