Commit Graph

928 Commits

Author SHA1 Message Date
Roman Bolshakov
21a43af0f1 qemu-thread: Don't block SEGV, ILL and FPE
If any of these signals happen on macOS, they are not delivered to other
threads and signalfd_compat receives nothing. Indeed, POSIX reference
and sigprocmask(2) note that an attempt to block the signals results in
undefined behaviour. SEGV and FPE can't also be received by signalfd(2)
on Linux.

An ability to retrieve SIGBUS via signalfd(2) is used by QEMU for
memory preallocation therefore we can't unblock it without consequences.
But it's important to leave a remark that the signal is lost on macOS.

Signed-off-by: Roman Bolshakov <r.bolshakov@yadro.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2019-01-08 12:34:46 +00:00
Roman Bolshakov
479a57475e util: Implement debug-threads for macOS
macOS provides pthread_setname_np that doesn't have thread id argument.

Signed-off-by: Roman Bolshakov <r.bolshakov@yadro.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2019-01-08 12:34:46 +00:00
Markus Armbruster
b7d89466dd Clean up includes
Clean up includes so that osdep.h is included first and headers
which it implies are not included manually.

This commit was created with scripts/clean-includes, with the changes
to the following files manually reverted:

    contrib/libvhost-user/libvhost-user-glib.h
    contrib/libvhost-user/libvhost-user.c
    contrib/libvhost-user/libvhost-user.h
    linux-user/mips64/cpu_loop.c
    linux-user/mips64/signal.c
    linux-user/sparc64/cpu_loop.c
    linux-user/sparc64/signal.c
    linux-user/x86_64/cpu_loop.c
    linux-user/x86_64/signal.c
    target/s390x/gen-features.c
    tests/migration/s390x/a-b-bios.c
    tests/test-rcu-simpleq.c
    tests/test-rcu-tailq.c

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20181204172535.2799-1-armbru@redhat.com>
Acked-by: Eduardo Habkost <ehabkost@redhat.com>
Acked-by: Halil Pasic <pasic@linux.ibm.com>
Acked-by: Yuval Shaia <yuval.shaia@oracle.com>
Acked-by: Viktor Prutyanov <viktor.prutyanov@phystech.edu>
2018-12-20 10:29:08 +01:00
Emilio G. Cota
fe656e3185 include: move exec/tb-hash-xx.h to qemu/xxhash.h
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-12-17 06:04:44 +03:00
Emilio G. Cota
c971d8fa73 exec: introduce qemu_xxhash{2,4,5,6,7}
Before moving them all to include/qemu/xxhash.h.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-12-17 06:04:44 +03:00
David Hildenbrand
af02f4c517 cutils: Fix qemu_strtosz() & friends to reject non-finite sizes
qemu_strtosz() & friends reject NaNs, but happily accept infinities.
They shouldn't. Fix that.

The fix makes use of qemu_strtod_finite(). To avoid ugly casts,
change the @end parameter of qemu_strtosz() & friends from char **
to const char **.

Also, add two test cases, testing that "inf" and "NaN" are properly
rejected. While at it, also fixup the function documentation.

Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20181121164421.20780-3-david@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2018-12-13 19:10:06 +01:00
David Hildenbrand
ca28f54816 cutils: Add qemu_strtod() and qemu_strtod_finite()
Let's provide a wrapper for strtod().

Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20181121164421.20780-2-david@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2018-12-13 19:10:06 +01:00
Eric Blake
53a90b9724 cutils: Assert in-range base for string-to-integer conversions
POSIX states that the value of endptr is unspecified if strtol()
fails with EINVAL due to an invalid base argument.  Since none of
the callers to check_strtox_error() initialized endptr, we could
end up propagating uninitialized data back to a caller on error.
However, passing an out-of-range base is already a sign of poor
programming, so let's just assert that base is in range, at which
point check_strtox_error() can be tightened to assert that it is
receiving an initialized ep that points somewhere within the
caller's original string, regardless of whether strto*() succeeded
or failed with ERANGE.

Reported-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20181206151856.77503-1-eblake@redhat.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2018-12-11 18:28:47 +01:00
Li Qiang
9e722ebc06 util: vfio-helpers: use ARRAY_SIZE in qemu_vfio_init_pci()
Cc: qemu-trivial@nongnu.org

Signed-off-by: Li Qiang <liq3ea@gmail.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <1543571638-2892-1-git-send-email-liq3ea@gmail.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2018-12-11 18:28:47 +01:00
Markus Armbruster
549b50a31d vfio-helpers: Fix qemu_vfio_open_pci() crash
qemu_vfio_open_common() initializes s->lock only after passing s to
qemu_vfio_dma_map() via qemu_vfio_init_ramblock().
qemu_vfio_dma_map() tries to lock the uninitialized lock and crashes.

Fix by initializing s->lock first.

RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1645840
Fixes: 418026ca43
Cc: qemu-stable@nongnu.org
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20181127084143.1113-1-armbru@redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-11-27 18:39:32 +00:00
Peter Maydell
a458774ad7 util/qemu-thread-posix: Fix qemu_thread_atexit* for OSX
Our current implementation of qemu_thread_atexit* is broken on OSX.
This is because it works by cerating a piece of thread-specific
data with pthread_key_create() and using the destructor function
for that data to run the notifier function passed to it by
the caller of qemu_thread_atexit_add(). The expected use case
is that the caller uses a __thread variable as the notifier,
and uses the callback to clean up information that it is
keeping per-thread in __thread variables.

Unfortunately, on OSX this does not work, because on OSX
a __thread variable may be destroyed (freed) before the
pthread_key_create() destructor runs. (POSIX imposes no
ordering constraint here; the OSX implementation happens
to implement __thread variables in terms of pthread_key_create((),
whereas Linux uses different mechanisms that mean the __thread
variables will still be present when the pthread_key_create()
destructor is run.)

Fix this by switching to a scheme similar to the one qemu-thread-win32
uses for qemu_thread_atexit: keep the thread's notifiers on a
__thread variable, and run the notifiers on calls to
qemu_thread_exit() and on return from the start routine passed
to qemu_thread_start(). We do this with the pthread_cleanup_push()
API.

We take advantage of the qemu_thread_atexit_add() API
permission not to run thread notifiers on process exit to
avoid having to special case the main thread.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20181105135538.28025-3-peter.maydell@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-11-06 21:35:06 +01:00
Brad Smith
fc3d1bad1e oslib-posix: Use MAP_STACK in qemu_alloc_stack() on OpenBSD
Use MAP_STACK in qemu_alloc_stack() on OpenBSD.

Added to our 6.4 release.

MAP_STACK      Indicate that the mapping is used as a stack.  This
               flag must be used in combination with MAP_ANON and
               MAP_PRIVATE.

Implement MAP_STACK option for mmap().  Synchronous faults (pagefault and
syscall) confirm the stack register points at MAP_STACK memory, otherwise
SIGSEGV is delivered. sigaltstack() and pthread_attr_setstack() are modified
to create a MAP_STACK sub-region which satisfies alignment requirements.
Observe that MAP_STACK can only be set/cleared by mmap(), which zeroes the
contents of the region -- there is no mprotect() equivalent operation, so
there is no MAP_STACK-adding gadget.

Signed-off-by: Brad Smith <brad@comstyle.com>
Reviewed-by: Kamil Rytarowski <n54@gmx.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20181019125239.GA13884@humpty.home.comstyle.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-11-06 10:52:23 +00:00
Max Reitz
638987127d option: Make option help nicer to read
This adds some whitespace into the option help (including indentation)
and puts angle brackets around the type names.  Furthermore, the list
name is no longer printed as part of every line, but only once in
advance, and only if the caller did not print a caption already.

This patch also restores the description alignment we had before commit
9cbef9d68e, just at 24 instead of 16 characters like we used to.
This increase is because now we have the type and two spaces of
indentation before the description, and with a usual type name length of
three chracters, this sums up to eight additional characters -- which
means that we now need 24 characters to get the same amount of padding
for most options.  Also, 24 is a third of 80, which makes it kind of a
round number in terminal terms.

Finally, this patch amends the reference output of iotest 082 to match
the changes (and thus makes it pass again).

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-11-05 15:17:48 +01:00
Eric Blake
d1dde7149e bitmap: Update count after a merge
We need an accurate count of the number of bits set in a bitmap
after a merge. In particular, since the merge operation short-circuits
a merge from an empty source, if you have bitmaps A, B, and C where
B started empty, then merge C into B, and B into A, an inaccurate
count meant that A did not get the contents of C.

In the worst case, we may falsely regard the bitmap as empty when
it has had new writes merged into it.

Fixes: be58721db
CC: qemu-stable@nongnu.org
Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20181002233314.30159-1-jsnow@redhat.com
Signed-off-by: John Snow <jsnow@redhat.com>
2018-10-29 16:23:17 -04:00
Vladimir Sementsov-Ogievskiy
fa000f2f9f dirty-bitmap: make it possible to restore bitmap after merge
Add backup parameter to bdrv_merge_dirty_bitmap() to be used then with
bdrv_restore_dirty_bitmap() if it needed to restore the bitmap after
merge operation.

This is needed to implement bitmap merge transaction action in further
commit.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: John Snow <jsnow@redhat.com>
2018-10-29 16:23:15 -04:00
Li Qiang
82dfee5a68 util: aio-posix: fix a typo
Cc: qemu-trivial@nongnu.org
Signed-off-by: Li Qiang <liq3ea@gmail.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 1538964972-3223-1-git-send-email-liq3ea@gmail.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2018-10-29 13:35:22 +00:00
Peter Maydell
13399aad4f Error reporting patches for 2018-10-22
-----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJbzcCHAAoJEDhwtADrkYZT3YsP/2qE4HNY/htj3IP6vNJuSaqw
 CLPRTz7zWmUBTE6FqSkvLsq3X2BMFFLeaIPA9EFcbyn2km6qPqBYgg9ElXXvPZBm
 6hDeRIoC8FdRD0Apozd5MGC94/lE47PheDRV8V+4KrGLaaMXEPxMZ0wP4AfdS5pS
 6Pt2xuF7nPu1+OWVxMk0fXadGjGLEuOQQmTh3B21J5RaynQ3gtd6h7XFC/LJyOGG
 LC/6GyPc0h7KU83VnvrRjH/EOpu1wENgrsvWsS0sem8op35Z+i9jU5BfCp4qFkDy
 gCHHUEyEeyexS+W+Tj87eBtK2gfrqQx9ovo8CIsWcUwpKbdD6AMK4FKGsDNMNHab
 Kg5u/M+O8nHCB7DuursF+3mqEbZHb05cfKe6JEtiq49EuORMV5hp4Ap966noSwTw
 UEU0NJNA1p8EdmXVudyyyYR7wpoSSmZpoenA+bJ3nthK8K0KcU4RUGk6ZEbxfJy+
 7ENl+3R2IxmxzgXv/x0tz0uFisaVW1rltTXtMte+ElQsO0qy74iHdfR7JHsmLxj9
 CO/ABMVoYsWq2OJv8pWLrdKpT4v3HQLJdHhknyu0ZcJGDyICqX29ULLEhPrNEZvW
 rxVxAkiemlaqxlUjbrM46CDQQm+w03OCnk7aCYcV4oK+u5+o3mCag705gMPErapZ
 6uOE3fAjiWw43sA31mek
 =kPZX
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/armbru/tags/pull-error-2018-10-22' into staging

Error reporting patches for 2018-10-22

# gpg: Signature made Mon 22 Oct 2018 13:20:23 BST
# gpg:                using RSA key 3870B400EB918653
# gpg: Good signature from "Markus Armbruster <armbru@redhat.com>"
# gpg:                 aka "Markus Armbruster <armbru@pond.sub.org>"
# Primary key fingerprint: 354B C8B3 D7EB 2A6B 6867  4E5F 3870 B400 EB91 8653

* remotes/armbru/tags/pull-error-2018-10-22: (40 commits)
  error: Drop bogus "use error_setg() instead" admonitions
  vpc: Fail open on bad header checksum
  block: Clean up bdrv_img_create()'s error reporting
  vl: Simplify call of parse_name()
  vl: Fix exit status for -drive format=help
  blockdev: Convert drive_new() to Error
  vl: Assert drive_new() does not fail in default_drive()
  fsdev: Clean up error reporting in qemu_fsdev_add()
  spice: Clean up error reporting in add_channel()
  tpm: Clean up error reporting in tpm_init_tpmdev()
  numa: Clean up error reporting in parse_numa()
  vnc: Clean up error reporting in vnc_init_func()
  ui: Convert vnc_display_init(), init_keyboard_layout() to Error
  ui/keymaps: Fix handling of erroneous include files
  vl: Clean up error reporting in device_init_func()
  vl: Clean up error reporting in parse_fw_cfg()
  vl: Clean up error reporting in mon_init_func()
  vl: Clean up error reporting in machine_set_property()
  vl: Clean up error reporting in chardev_init_func()
  qom: Clean up error reporting in user_creatable_add_opts_foreach()
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-10-23 17:20:23 +01:00
Markus Armbruster
80313fb53d error: Drop bogus "use error_setg() instead" admonitions
Commit 97f40301f1 "error: Functions to report warnings and
informational messages" copied the "use error_setg() instead"
admonition from the error reporting functions to new functions even
though it doesn't actually apply there.  Drop it.  Also drop it from
vreport(), where it doesn't apply anymore.

Reported-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20181019123923.26649-1-armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2018-10-19 15:47:09 +02:00
Markus Armbruster
4b5766488f error: Fix use of error_prepend() with &error_fatal, &error_abort
From include/qapi/error.h:

  * Pass an existing error to the caller with the message modified:
  *     error_propagate(errp, err);
  *     error_prepend(errp, "Could not frobnicate '%s': ", name);

Fei Li pointed out that doing error_propagate() first doesn't work
well when @errp is &error_fatal or &error_abort: the error_prepend()
is never reached.

Since I doubt fixing the documentation will stop people from getting
it wrong, introduce error_propagate_prepend(), in the hope that it
lures people away from using its constituents in the wrong order.
Update the instructions in error.h accordingly.

Convert existing error_prepend() next to error_propagate to
error_propagate_prepend().  If any of these get reached with
&error_fatal or &error_abort, the error messages improve.  I didn't
check whether that's the case anywhere.

Cc: Fei Li <fli@suse.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20181017082702.5581-2-armbru@redhat.com>
2018-10-19 14:51:34 +02:00
Artem Pisarenko
e81f86790f qemu-timer: avoid checkpoints for virtual clock timers in external subsystems
Adds EXTERNAL attribute definition to qemu timers subsystem and assigns
it to virtual clock timers, used in slirp (ICMP IPv6) and ui (key queue).
Virtual clock processing in rr mode can use this attribute instead of a
separate clock type.

Fixes: 87f4fe7653
Fixes: 775a412bf8
Fixes: 9888091404
Signed-off-by: Artem Pisarenko <artem.k.pisarenko@gmail.com>
Message-Id: <e771f96ab94e86b54b9a783c974f2af3009fe5d1.1539764043.git.artem.k.pisarenko@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-10-19 13:44:03 +02:00
Artem Pisarenko
89a603a0c8 qemu-timer: introduce timer attributes
Attributes are simple flags, associated with individual timers for their
whole lifetime.  They intended to be used to mark individual timers for
special handling when they fire.

New/init functions family in timer interface updated and refactored (new
'attribute' argument added, timer_list replaced with timer_list_group+type
combinations, comments improved to avoid info duplication).  Also existing
aio interface extended with attribute-enabled variants of functions,
which create/initialize timers.

Signed-off-by: Artem Pisarenko <artem.k.pisarenko@gmail.com>
Message-Id: <f47b81dbce734e9806f9516eba8ca588e6321c2f.1539764043.git.artem.k.pisarenko@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-10-19 13:44:03 +02:00
Artem Pisarenko
05ff8dc32f Revert some patches from recent [PATCH v6] "Fixing record/replay and adding reverse debugging"
That patch series introduced new virtual clock type for use in external
subsystems. It breaks desired behavior in non-record/replay usage
scenarios due to a small change to existing behavior.  Processing of
virtual timers belonging to new clock type is kicked off to the main
loop, which makes these timers asynchronous with vCPU thread and,
in icount mode, with whole guest execution. This breaks expected
determinism in non-record/replay icount mode of emulation where these
"external subsystems" are isolated from the host (i.e. they are
external only to guest core, not to the entire emulation environment).

Example for slirp ("user" backend for network device):
User runs qemu in icount mode with rtc clock=vm without any external
communication interfaces but with "-netdev user,restrict=on". It expects
deterministic execution, because network services are emulated inside
qemu and isolated from host. There are no reasons to get reply from DHCP
server with different delay or something like that.

The next patches revert reimplements the same changes in a better way.
This reverts commit 87f4fe7653.
This reverts commit 775a412bf8.
This reverts commit 9888091404.

Signed-off-by: Artem Pisarenko <artem.k.pisarenko@gmail.com>
Message-Id: <18b1e7c8f155fe26976f91be06bde98eef6f8751.1539764043.git.artem.k.pisarenko@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-10-19 13:44:03 +02:00
Marc-André Lureau
9cbef9d68e qemu-option: improve qemu_opts_print_help() output
Modify qemu_opts_print_help():
- to print expected argument type
- skip description if not available
- sort lines
- prefix with the list name (like qdev, to avoid confusion)
- drop 16-chars alignment, use a '-' as seperator for option name and
  description

For ex, "-spice help" output is changed from:

port             No description available
tls-port         No description available
addr             No description available
[...]
gl               No description available
rendernode       No description available

to:

spice.addr=str
spice.agent-mouse=bool (on/off)
spice.disable-agent-file-xfer=bool (on/off)
[...]
spice.x509-key-password=str
spice.zlib-glz-wan-compression=str

"qemu-img create -f qcow2 -o help", changed from:

size             Virtual disk size
compat           Compatibility level (0.10 or 1.1)
backing_file     File name of a base image
[...]
lazy_refcounts   Postpone refcount updates
refcount_bits    Width of a reference count entry in bits

to:

backing_file=str - File name of a base image
backing_fmt=str - Image format of the base image
cluster_size=size - qcow2 cluster size
[...]
refcount_bits=num - Width of a reference count entry in bits
size=size - Virtual disk size

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2018-10-05 16:14:22 +04:00
Marc-André Lureau
e05979d0d7 qemu-option: add help fallback to print the list of options
QDev options accept 'help' (or '?', but that's problematic with shell
globbing) in the list of parameters, which is handy to list the
available options.

Unfortunately, this isn't built in QemuOpts. qemu_opts_parse_noisily()
seems to be the common path for command line options, so place a
fallback to print help, listing the available options.

This is quite handy, for example with qemu "-spice help".

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2018-10-05 16:14:22 +04:00
Marc-André Lureau
85e33a2818 cutils: add qemu_pstrcmp0()
A char** variant of g_strcmp0().

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
2018-10-05 16:14:22 +04:00
Pavel Dovgalyuk
87f4fe7653 timer: introduce new virtual clock
Slirp and VNC modules use virtual clock for processing some events that
are related to the guest execution speed.
But virtual clock-related events are consideres to be deterministic and
are recorded/replayed by icount mechanism. But slirp and VNC lie outside
the recorded guest core (which includes CPU and peripherals).
Therefore slirp and VNC are external for the guest, but should work at
guest speed.
This patch introduces new virtual clock which can be used for external
subsystems for running timers that are synchronized with the guest.

Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgaluk@ispras.ru>
Message-Id: <20180912082002.3228.82417.stgit@pasha-VirtualBox>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-10-02 19:08:58 +02:00
Marc-André Lureau
35f7f3fb5c util: use fcntl() for qemu_write_pidfile() locking
Daniel Berrangé suggested to use fcntl() locks rather than lockf().

'man lockf':

   On Linux, lockf() is just an interface on top of fcntl(2) locking.
   Many other systems implement lockf() in this way, but note that
   POSIX.1 leaves the relationship between lockf() and fcntl(2) locks
   unspecified.  A portable application should probably avoid mixing
   calls to these interfaces.

IOW, if its just a shim around fcntl() on many systems, it is clearer
if we just use fcntl() directly, as we then know how fcntl() locks will
behave if they're on a network filesystem like NFS.

Suggested-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20180831145314.14736-3-marcandre.lureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-10-02 18:47:55 +02:00
Marc-André Lureau
9e6bdef224 util: add qemu_write_pidfile()
There are variants of qemu_create_pidfile() in qemu-pr-helper and
qemu-ga. Let's have a common implementation in libqemuutil.

The code is initially based from pr-helper write_pidfile(), with
various improvements and suggestions from Daniel Berrangé:

  QEMU will leave the pidfile existing on disk when it exits which
  initially made me think it avoids the deletion race. The app
  managing QEMU, however, may well delete the pidfile after it has
  seen QEMU exit, and even if the app locks the pidfile before
  deleting it, there is still a race.

  eg consider the following sequence

        QEMU 1        libvirtd        QEMU 2

  1.    lock(pidfile)

  2.    exit()

  3.                 open(pidfile)

  4.                 lock(pidfile)

  5.                                  open(pidfile)

  6.                 unlink(pidfile)

  7.                 close(pidfile)

  8.                                  lock(pidfile)

  IOW, at step 8 the new QEMU has successfully acquired the lock, but
  the pidfile no longer exists on disk because it was deleted after
  the original QEMU exited.

  While we could just say no external app should ever delete the
  pidfile, I don't think that is satisfactory as people don't read
  docs, and admins don't like stale pidfiles being left around on
  disk.

  To make this robust, I think we might want to copy libvirt's
  approach to pidfile acquisition which runs in a loop and checks that
  the file on disk /after/ acquiring the lock matches the file that
  was locked. Then we could in fact safely let QEMU delete its own
  pidfiles on clean exit..

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20180831145314.14736-2-marcandre.lureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-10-02 18:47:55 +02:00
Marc-André Lureau
3829640049 hostmem-memfd: add checks before adding hostmem-memfd & properties
Run some memfd-related checks before registering hostmem-memfd &
various properties. This will help libvirt to figure out what the host
is supposed to be capable of.

qemu_memfd_check() is changed to a less optimized version, since it is
used with various flags, it no longer caches the result.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20180906161415.8543-1-marcandre.lureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-10-02 18:47:55 +02:00
Emilio G. Cota
ac8c77486c qsp: use atomic64 accessors
With the seqlock, we either have to use atomics to remain
within defined behaviour (and note that 64-bit atomics aren't
always guaranteed to compile, irrespective of __nocheck), or
drop the atomics and be in undefined behaviour territory.

Fix it by dropping the seqlock and using atomic64 accessors.
This will limit scalability when !CONFIG_ATOMIC64, but those
machines (1) don't have many users and (2) are unlikely to
have many cores.

- With CONFIG_ATOMIC64:
$ tests/atomic_add-bench -n 1 -m -p
 Throughput:         13.00 Mops/s

- Forcing !CONFIG_ATOMIC64:
$ tests/atomic_add-bench -n 1 -m -p
 Throughput:         10.89 Mops/s

Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <20180910232752.31565-5-cota@braap.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-10-02 18:47:55 +02:00
Emilio G. Cota
782da5b292 util: add atomic64
This introduces read/set accessors for int64_t and uint64_t.

Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <20180910232752.31565-3-cota@braap.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-10-02 18:47:55 +02:00
Emilio G. Cota
5fe2103429 cacheinfo: add i/d cache_linesize_log
Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <20180910232752.31565-2-cota@braap.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-10-02 18:47:55 +02:00
Peter Maydell
07f426c35e Queued tcg patches
-----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJbq8+KAAoJEGTfOOivfiFfbiQH/if+cTUAU+Fr2Qez96avYt7t
 jimDOUc7bG+FFrNZYveVNPiP/feKWUIYPJVs9ZoT4jxvT4NOBm/drRkW+BiZO7Tt
 zqceA+/1Hoc7RlSeo/6AbcIXQLjnTnpFlUW24zNGF0QkG6iS92BPcqezgcR3sRS0
 Outf68NxQh7hW/TnHGlL/nxTuHzMfKXZLGiphu6ykzWWXUckrzYmXT4R3tfVVxHV
 S48nASWsZb8Cga/F1KdCHDv8qYAK8qoEA+01tt//zc/l/ivxfy71HenueN6Dj1Xy
 8+HStsh/fRrfq4NSqkXLtBPmeq4bhPsiEx1aCOcnXVIG0hTOe3/QO9Hc+qhkf8Y=
 =cOeh
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/rth/tags/pull-tcg-20180926' into staging

Queued tcg patches

# gpg: Signature made Wed 26 Sep 2018 19:27:22 BST
# gpg:                using RSA key 64DF38E8AF7E215F
# gpg: Good signature from "Richard Henderson <richard.henderson@linaro.org>"
# Primary key fingerprint: 7A48 1E78 868B 4DB6 A85A  05C0 64DF 38E8 AF7E 215F

* remotes/rth/tags/pull-tcg-20180926:
  tcg/i386: fix vector operations on 32-bit hosts
  qht-bench: add -p flag to precompute hash values
  qht: constify arguments to some internal functions
  qht: constify qht_statistics_init
  qht: constify qht_lookup
  qht: fix comment in qht_bucket_remove_entry
  qht: drop ht argument from qht iterators
  test-qht: speed up + test qht_resize
  test-qht: test deletion of the last entry in a bucket
  test-qht: test removal of non-existent entries
  test-qht: test qht_iter_remove
  qht: add qht_iter_remove
  qht: remove unused map param from qht_remove__locked

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-09-28 18:56:09 +01:00
Peter Maydell
099bea113f Block and testing patches
- Paolo's AIO fixes.
 - VMDK streamOptimized corner case fix
 - VM testing improvment on -cpu
 -----BEGIN PGP SIGNATURE-----
 
 iQFEBAABCAAuFiEEUAN8t5cGD3bwIa1WyjViTGqRccYFAluq9NAQHGZhbXpAcmVk
 aGF0LmNvbQAKCRDKNWJMapFxxm6mB/0XBuWySxCchAZDmkdcIqosrO7XZ6dKFpvW
 3uegPH3gUCpH6tw1YtigQSS+Se7tdnrUkOA5/5yOt8v2h9+6ORbIEYkkjGFpsPAL
 jITjIGSxXb2yjR0Ss+zKoR4tFQGYEVRmQSGZ8UHYiDVFHU0FbcwNkagHIyluoRuF
 +IspfB7bMXqrZ1qCCWROQDy7Cd2TJDeZ+jUtoiOACh7kyq5Q+PcVoeyeKKk1berH
 Kv8ZhdOk2t/te/CSaCpUTe5zYv3QI1tjcU1D6huYAkT9M8/3SasOeXHBNNG9zLE+
 zAtNKufd+VQoG7l24AfcPaUVqJlQ4GRDXFdumcgW3zA2Qf5CGyN7
 =pEHW
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/famz/tags/staging-pull-request' into staging

Block and testing patches

- Paolo's AIO fixes.
- VMDK streamOptimized corner case fix
- VM testing improvment on -cpu

# gpg: Signature made Wed 26 Sep 2018 03:54:08 BST
# gpg:                using RSA key CA35624C6A9171C6
# gpg: Good signature from "Fam Zheng <famz@redhat.com>"
# Primary key fingerprint: 5003 7CB7 9706 0F76 F021  AD56 CA35 624C 6A91 71C6

* remotes/famz/tags/staging-pull-request:
  vmdk: align end of file to a sector boundary
  tests/vm: Use -cpu max rather than -cpu host
  aio-posix: do skip system call if ctx->notifier polling succeeds
  aio-posix: compute timeout before polling
  aio-posix: fix concurrent access to poll_disable_cnt

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-09-28 13:35:26 +01:00
Emilio G. Cota
1911c8a3bd qht: constify arguments to some internal functions
These functions do not modify their @ht or @bucket arguments.
Constify those arguments.

Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-09-26 08:55:54 -07:00
Emilio G. Cota
6579f10779 qht: constify qht_statistics_init
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-09-26 08:55:54 -07:00
Emilio G. Cota
e6c5829950 qht: constify qht_lookup
seqlock_read_begin takes a const param since c04649eeea
("seqlock: constify seqlock_read_begin", 2018-08-23), so
we can constify the entire lookup.

Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-09-26 08:55:54 -07:00
Emilio G. Cota
9650ad3e99 qht: fix comment in qht_bucket_remove_entry
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-09-26 08:55:54 -07:00
Emilio G. Cota
78255ba2cc qht: drop ht argument from qht iterators
Accessing the HT from an iterator results almost always
in a deadlock. Given that only one qht-internal function
uses this argument, drop it from the interface.

Suggested-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-09-26 08:55:54 -07:00
Emilio G. Cota
69d55e9cc2 qht: add qht_iter_remove
This currently has no users, but the use case is so common that I
think we must support it.

Note that without the appended we cannot safely remove a set of
elements; a 2-step approach (i.e. qht_iter first, keep track of
the to-be-deleted elements, and then a bunch of qht_remove calls)
would be racy, since between the iteration and the removals other
threads might insert additional elements.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-09-26 08:55:54 -07:00
Emilio G. Cota
e2f07efadd qht: remove unused map param from qht_remove__locked
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-09-26 08:55:53 -07:00
Paolo Bonzini
cfeb35d677 aio-posix: do skip system call if ctx->notifier polling succeeds
Commit 70232b5253 ("aio-posix: Don't count ctx->notifier as progress when
2018-08-15), by not reporting progress, causes aio_poll to execute the
system call when polling succeeds because of ctx->notifier.  This introduces
latency before the call to aio_bh_poll() and negates the advantages of
polling, unfortunately.

The fix builds on the previous patch, separating the effect of polling on
the timeout from the progress reported to aio_poll().  ctx->notifier
does zero the timeout, causing the caller to skip the system call,
but it does not report progress, so that the bug fix of commit 70232b5253
still stands.

Fixes: 70232b5253
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20180912171040.1732-4-pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
2018-09-26 10:46:21 +08:00
Paolo Bonzini
e30cffa04d aio-posix: compute timeout before polling
This is a preparation for the next patch, and also a very small
optimization.  Compute the timeout only once, before invoking
try_poll_mode, and adjust it in run_poll_handlers.  The adjustment
is the polling time when polling fails, or zero (non-blocking) if
polling succeeds.

Fixes: 70232b5253
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20180912171040.1732-3-pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
2018-09-26 10:46:21 +08:00
Paolo Bonzini
d7be5dd19c aio-posix: fix concurrent access to poll_disable_cnt
It is valid for an aio_set_fd_handler to happen concurrently with
aio_poll.  In that case, poll_disable_cnt can change under the heels
of aio_poll, and the assertion on poll_disable_cnt can fail in
run_poll_handlers.

Therefore, this patch simply checks the counter on every polling
iteration.  There are no particular needs for ordering, since the
polling loop is terminated anyway by aio_notify at the end of
aio_set_fd_handler.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20180912171040.1732-2-pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
2018-09-26 10:46:21 +08:00
Peter Maydell
866ba83854 - Deprecate the usage of a network backend via "name" instead of "id"
- Deprecate the "enforce-config-section" machine parameter
 - Re-enable the wdt_ib700, endianness and vmxnet3 qtests
 - Some trivial fixes and doc update patches that crossed my way
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABAgAGBQJbqlsyAAoJEC7Z13T+cC21RbAP/3IvGfBxuRm6rBWoghjQgbl8
 KU8nPnlZUtqjxmfUTILO/h+pJ3na5MQ8hh7v8JHi+xlQ2DPkECW21DtnfdxntVjw
 +b+N5Ap6J22GHyEq4HJXPWAk2rDInqkU966DvL40RiMvOTfXdg9EO0TDX0VsVgZv
 BR1r7/t3T0P7hiQ0XWb9U2JchRIC+Zgk34gXZPSTpoIv89fUhzNoK5LvAA6yV1FQ
 TvE8VTKJm4wkqThH1ShtbJCBKjHjW/W8LYZr3YMothcs8vGjEdEcDL4BoJZDn3bF
 h4VTkU+k8lp7W9LmlnPnu1WH/5ezhzdwJTeFaPJt4U10WKJptAS4vbK03DXlds9O
 9d2BOXKrima2kSr1ejSe1f0kcE8fis1XFmSuhF61Nbw6ngT5+pP2JSc1XwFazd2K
 zQwV4GXBLzAGnd4F2Ec+5TKzbGFVfczxeBDiBkkVmG+XdX/UXJpkpPYGAaw7DDiK
 JwKVVYIPk1ll6MAbR6qEGsvE/adHNEm8lUdjXqwgbQlIeUZ2H0hCu9lJ0X81mtoQ
 WZP+nMa/87COnlPX6VPVgxM2TXQOH/UbGz/WmYzZ6/gPKTX+gfwrHQGdp7Tjl33U
 KxFKWioFnoqGuyWasvTtKEK67/IlrY+w1nXuuqKJg8J2/qx1SVtx45FHkRkxkIDx
 4boRpx0XUqpDVdf8VhRB
 =dXgp
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/huth-gitlab/tags/pull-request-2018-09-25' into staging

- Deprecate the usage of a network backend via "name" instead of "id"
- Deprecate the "enforce-config-section" machine parameter
- Re-enable the wdt_ib700, endianness and vmxnet3 qtests
- Some trivial fixes and doc update patches that crossed my way

# gpg: Signature made Tue 25 Sep 2018 16:58:42 BST
# gpg:                using RSA key 2ED9D774FE702DB5
# gpg: Good signature from "Thomas Huth <th.huth@gmx.de>"
# gpg:                 aka "Thomas Huth <thuth@redhat.com>"
# gpg:                 aka "Thomas Huth <huth@tuxfamily.org>"
# gpg:                 aka "Thomas Huth <th.huth@posteo.de>"
# Primary key fingerprint: 27B8 8847 EEE0 2501 18F3  EAB9 2ED9 D774 FE70 2DB5

* remotes/huth-gitlab/tags/pull-request-2018-09-25:
  Revert "check: Move VMXNET3 test to common"
  Revert "check: Move endianess test to common"
  Revert "check: Move wdt_ib700 test to common"
  tests/migration: Speed up the test on ppc64
  hw/qdev-core: Fix description of instance_init
  qdev: fix a typo in comment
  docs: Fix some typos (most found by codespell)
  trivial: Make bios files and source files non-executable
  memfd: fix possible usage of the uninitialized file descriptor
  hw/core/machine: Officially deprecate the enforce-config-section parameter
  net/slirp: Deprecate the [hub_id name] parameter tuple
  net: Deprecate the "name" parameter of -net
  Makefile: Add missing dependency for qemu-deprecated.texi

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-09-25 18:09:52 +01:00
Dima Stepanov
1e7ec6cf06 memfd: fix possible usage of the uninitialized file descriptor
The qemu_memfd_alloc_check() routine allocates the fd variable on stack.
This variable is initialized inside the qemu_memfd_alloc() function.
There are several cases when *fd will be left unintialized which can
lead to the unexpected close() in the qemu_memfd_free() call.

Set file descriptor to -1 before calling the qemu_memfd_alloc routine.

Signed-off-by: Dima Stepanov <dimastep@yandex-team.ru>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
2018-09-25 17:26:18 +02:00
Kevin Wolf
cfe29d8294 block: Use a single global AioWait
When draining a block node, we recurse to its parent and for subtree
drains also to its children. A single AIO_WAIT_WHILE() is then used to
wait for bdrv_drain_poll() to become true, which depends on all of the
nodes we recursed to. However, if the respective child or parent becomes
quiescent and calls bdrv_wakeup(), only the AioWait of the child/parent
is checked, while AIO_WAIT_WHILE() depends on the AioWait of the
original node.

Fix this by using a single AioWait for all callers of AIO_WAIT_WHILE().

This may mean that the draining thread gets a few more unnecessary
wakeups because an unrelated operation got completed, but we already
wake it up when something _could_ have changed rather than only if it
has certainly changed.

Apart from that, drain is a slow path anyway. In theory it would be
possible to use wakeups more selectively and still correctly, but the
gains are likely not worth the additional complexity. In fact, this
patch is a nice simplification for some places in the code.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2018-09-25 15:50:15 +02:00
Kevin Wolf
aa1361d54a block: Add missing locking in bdrv_co_drain_bh_cb()
bdrv_do_drained_begin/end() assume that they are called with the
AioContext lock of bs held. If we call drain functions from a coroutine
with the AioContext lock held, we yield and schedule a BH to move out of
coroutine context. This means that the lock for the home context of the
coroutine is released and must be re-acquired in the bottom half.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2018-09-25 15:50:15 +02:00
Sergio Lopez
6808ae0417 util/async: use qemu_aio_coroutine_enter in co_schedule_bh_cb
AIO Coroutines shouldn't by managed by an AioContext different than the
one assigned when they are created. aio_co_enter avoids entering a
coroutine from a different AioContext, calling aio_co_schedule instead.

Scheduled coroutines are then entered by co_schedule_bh_cb using
qemu_coroutine_enter, which just calls qemu_aio_coroutine_enter with the
current AioContext obtained with qemu_get_current_aio_context.
Eventually, co->ctx will be set to the AioContext passed as an argument
to qemu_aio_coroutine_enter.

This means that, if an IO Thread's AioConext is being processed by the
Main Thread (due to aio_poll being called with a BDS AioContext, as it
happens in AIO_WAIT_WHILE among other places), the AioContext from some
coroutines may be wrongly replaced with the one from the Main Thread.

This is the root cause behind some crashes, mainly triggered by the
drain code at block/io.c. The most common are these abort and failed
assertion:

util/async.c:aio_co_schedule
456     if (scheduled) {
457         fprintf(stderr,
458                 "%s: Co-routine was already scheduled in '%s'\n",
459                 __func__, scheduled);
460         abort();
461     }

util/qemu-coroutine-lock.c:
286     assert(mutex->holder == self);

But it's also known to cause random errors at different locations, and
even SIGSEGV with broken coroutine backtraces.

By using qemu_aio_coroutine_enter directly in co_schedule_bh_cb, we can
pass the correct AioContext as an argument, making sure co->ctx is not
wrongly altered.

Signed-off-by: Sergio Lopez <slp@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-09-25 15:50:15 +02:00
Cornelia Huck
c55510b722 qemu-error: add {error, warn}_report_once_cond
Add two functions to print an error/warning report once depending
on a passed-in condition variable and flip it if printed. This is
useful if you want to print a message not once-globally, but e.g.
once-per-device.

Inspired by warn_once() in hw/vfio/ccw.c, which has been replaced
with warn_report_once_cond().

Signed-off-by: Cornelia Huck <cohuck@redhat.com>
Message-Id: <20180830145902.27376-2-cohuck@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
[Function comments reworded]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2018-09-24 17:13:07 +02:00