Commit Graph

1143 Commits

Author SHA1 Message Date
Peter Maydell
3591ddd399 * Various fixes
* libdaxctl support to correctly align devdax character devices (Jingqi)
 * initial-all-set support for live migration (Jay)
 * forbid '-numa node, mem' for 5.1 and newer machine types (Igor)
 * x87 fixes (Joseph)
 * Tighten memory_region_access_valid (Michael) and fix fallout (myself)
 * Replay fixes (Pavel)
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAl71+zkUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroO4MAgAo/aPLzCXJTzFOP88TclEETfSUeyG
 GFs6mAEJpoNnkAzY+y6ZIjtbp346UZB2KMxHTQcd7p2tO+jXSDPpr0UBLqU95j0/
 ucOnP1X9E5ee8P5Z7bXeGCtkfEippI5/TU+gHlx/SKeyVHdMKBsWCg/9LN5JXMJR
 ncQ6MxkU8huOksOLL32dxh1OqtdDiBoq9rswmHFXwDcRuIkteTlQo3Ze9BSb8t04
 7ZImKXNr+wIaq/xXAqltYNGhHoi31Rz+W8W7T84tYNr7wI1LWaLi2jzQ2qJthAdq
 25zXVz5QJjcfIemlrV03PN8IZfKqTfnOvf+DNW1ns/EdflQem/Mb0Q9KOg==
 =NfSA
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging

* Various fixes
* libdaxctl support to correctly align devdax character devices (Jingqi)
* initial-all-set support for live migration (Jay)
* forbid '-numa node, mem' for 5.1 and newer machine types (Igor)
* x87 fixes (Joseph)
* Tighten memory_region_access_valid (Michael) and fix fallout (myself)
* Replay fixes (Pavel)

# gpg: Signature made Fri 26 Jun 2020 14:42:17 BST
# gpg:                using RSA key F13338574B662389866C7682BFFBD25F78C7AE83
# gpg:                issuer "pbonzini@redhat.com"
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [full]
# gpg:                 aka "Paolo Bonzini <pbonzini@redhat.com>" [full]
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4  E2F7 7E15 100C CD36 69B1
#      Subkey fingerprint: F133 3857 4B66 2389 866C  7682 BFFB D25F 78C7 AE83

* remotes/bonzini/tags/for-upstream: (31 commits)
  i386: Mask SVM features if nested SVM is disabled
  ibex_uart: fix XOR-as-pow
  vmport: move compat properties to hw_compat_5_0
  hyperv: vmbus: Remove the 2nd IRQ
  kvm: i386: allow TSC to differ by NTP correction bounds without TSC scaling
  numa: forbid '-numa node, mem' for 5.1 and newer machine types
  osdep: Make MIN/MAX evaluate arguments only once
  target/i386: Add notes for versioned CPU models
  target/i386: reimplement fpatan using floatx80 operations
  target/i386: reimplement fyl2x using floatx80 operations
  target/i386: reimplement fyl2xp1 using floatx80 operations
  target/i386: reimplement fprem, fprem1 using floatx80 operations
  softfloat: return low bits of quotient from floatx80_modrem
  softfloat: do not set denominator high bit for floatx80 remainder
  softfloat: do not return pseudo-denormal from floatx80 remainder
  softfloat: fix floatx80 remainder pseudo-denormal check for zero
  softfloat: merge floatx80_mod and floatx80_rem
  target/i386: reimplement f2xm1 using floatx80 operations
  xen: Actually fix build without passthrough
  Makefile: Install qemu-[qmp/ga]-ref.* into the directory "interop"
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2020-06-26 16:55:20 +01:00
Peter Maydell
87fb952da8 Pull request
-----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEhpWov9P5fNqsNXdanKSrs4Grc8gFAl7zJJUACgkQnKSrs4Gr
 c8ix3Qf/ZpEKTCWJcZZuJPEI4CSgHZTsmDilkhnI/SoSBIK+6do+oBtCWrNdfP/m
 BpAZspaGsKUu5kJe6HGl4Rvmjd/sTg+9+F6UnQVrWccttwmJgr+y0r9uTMEgxgdm
 2xeTzkzfwfxRLn4wb8k1kX/weQUcsbJUe2F9Nvm3HzeKGkaxWlYsRwqXAluC7gjx
 ZK0yHBz9JXKAreAfBRmNduLDElyzc6yYikY2gsJEOYTA7/h/ksmuNWYqNPRzWYGQ
 wRjAPyRMg+q+pZhoir5+6qgKLt6vNk5uQOjPaiLYhSMi7fiTIXrrVrO0dSx1Pkun
 2vlb2WOF7nbj5T1veJQE29/onKPhzA==
 =IYfR
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/stefanha/tags/block-pull-request' into staging

Pull request

# gpg: Signature made Wed 24 Jun 2020 11:01:57 BST
# gpg:                using RSA key 8695A8BFD3F97CDAAC35775A9CA4ABB381AB73C8
# gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>" [full]
# gpg:                 aka "Stefan Hajnoczi <stefanha@gmail.com>" [full]
# Primary key fingerprint: 8695 A8BF D3F9 7CDA AC35  775A 9CA4 ABB3 81AB 73C8

* remotes/stefanha/tags/block-pull-request:
  block/nvme: support nested aio_poll()
  block/nvme: keep BDRVNVMeState pointer in NVMeQueuePair
  block/nvme: clarify that free_req_queue is protected by q->lock
  block/nvme: switch to a NVMeRequest freelist
  block/nvme: don't access CQE after moving cq.head
  block/nvme: drop tautologous assertion
  block/nvme: poll queues without q->lock
  check-block: enable iotests with SafeStack
  configure: add flags to support SafeStack
  coroutine: add check for SafeStack in sigaltstack
  coroutine: support SafeStack in ucontext backend
  minikconf: explicitly set encoding to UTF-8

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2020-06-26 13:48:54 +01:00
Pavel Dovgalyuk
677a3baba4 replay: synchronize on every virtual timer callback
Sometimes virtual timer callbacks depend on order
of virtual timer processing and warping of virtual clock.
Therefore every callback should be logged to make replay deterministic.
This patch creates a checkpoint before every virtual timer callback.
With these checkpoints virtual timers processing and clock warping
events order is completely deterministic.

Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgaluk@ispras.ru>
Acked-by: Alex Bennée <alex.bennee@linaro.org>

--

v2:
  - remove mutex lock/unlock for virtual clock checkpoint since it is
    not process any asynchronous events (commit ca9759c2a9)
  - bump record/replay log file version
Message-Id: <159012932716.27256.8854065545365559921.stgit@pasha-ThinkPad-X280>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-26 06:45:30 -04:00
David CARLIER
ae2b72072b util/getauxval: Porting to FreeBSD getauxval feature
From d7f9d40777d1ed7c9450b0be4f957da2993dfc72 Mon Sep 17 00:00:00 2001
From: David Carlier <devnexen@gmail.com>
Date: Fri, 12 Jun 2020 09:39:17 +0100
Subject: [PATCH] util/getauxval: Porting to FreeBSD getauxval feature

FreeBSD has a similar API for auxiliary vector.

Signed-off-by: David Carlier <devnexen@gmail.com>
Message-Id: <CA+XhMqxTU6PUSQBpbA9VrS1QZfqgrCAKUCtUF-x2aF=fCMTDOw@mail.gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-26 06:45:29 -04:00
Daniele Buono
ff76097ad8 coroutine: add check for SafeStack in sigaltstack
Current implementation of LLVM's SafeStack is not compatible with
code that uses an alternate stack created with sigaltstack().
Since coroutine-sigaltstack relies on sigaltstack(), it is not
compatible with SafeStack. The resulting binary is incorrect, with
different coroutines sharing the same unsafe stack and producing
undefined behavior at runtime.

In the future LLVM may provide a SafeStack implementation compatible with
sigaltstack(). In the meantime, if SafeStack is desired, the coroutine
implementation from coroutine-ucontext should be used.
As a safety check, add a control in coroutine-sigaltstack to throw a
preprocessor #error if SafeStack is enabled and we are trying to
use coroutine-sigaltstack to implement coroutines.

Signed-off-by: Daniele Buono <dbuono@linux.vnet.ibm.com>
Message-id: 20200529205122.714-3-dbuono@linux.vnet.ibm.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-06-23 15:46:05 +01:00
Daniele Buono
58ebc2c313 coroutine: support SafeStack in ucontext backend
LLVM's SafeStack instrumentation does not yet support programs that make
use of the APIs in ucontext.h
With the current implementation of coroutine-ucontext, the resulting
binary is incorrect, with different coroutines sharing the same unsafe
stack and producing undefined behavior at runtime.
This fix allocates an additional unsafe stack area for each coroutine,
and sets the new unsafe stack pointer before calling swapcontext() in
qemu_coroutine_new.
This is the only place where the pointer needs to be manually updated,
since sigsetjmp/siglongjmp are already instrumented by LLVM to properly
support SafeStack.
The additional stack is then freed in qemu_coroutine_delete.

Signed-off-by: Daniele Buono <dbuono@linux.vnet.ibm.com>
Message-id: 20200529205122.714-2-dbuono@linux.vnet.ibm.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-06-23 15:46:05 +01:00
David CARLIER
2032e243a5 util/oslib-posix : qemu_init_exec_dir implementation for Mac
From 3025a0ce3fdf7d3559fc35a52c659f635f5c750c Mon Sep 17 00:00:00 2001
From: David Carlier <devnexen@gmail.com>
Date: Tue, 26 May 2020 21:35:27 +0100
Subject: [PATCH] util/oslib-posix : qemu_init_exec_dir implementation for Mac

Using dyld API to get the full path of the current process.

Signed-off-by: David Carlier <devnexen@gmail.com>
Message-id: CA+XhMqxwC10XHVs4Z-JfE0-WLAU3ztDuU9QKVi31mjr59HWCxg@mail.gmail.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2020-06-23 11:39:46 +01:00
Robert Foley
ce9f0e5b26 util: Added tsan annotate for thread name.
This allows us to see the name of the thread in tsan
warning reports such as this:

  Thread T7 'CPU 1/TCG' (tid=24317, running) created by main thread at:

Signed-off-by: Robert Foley <robert.foley@linaro.org>
Reviewed-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20200609200738.445-12-robert.foley@linaro.org>
Message-Id: <20200612190237.30436-15-alex.bennee@linaro.org>
2020-06-16 14:49:05 +01:00
Emilio G. Cota
5107a47bb2 qht: call qemu_spin_destroy for head buckets
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Robert Foley <robert.foley@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
[AJB: add implied cota s-o-b c.f. github.com/cota/qemu/tree/tsan @ 1bd1209]
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20200609200738.445-6-robert.foley@linaro.org>
Message-Id: <20200612190237.30436-9-alex.bennee@linaro.org>
2020-06-16 14:49:05 +01:00
Lingfeng Yang
0aebab04b9 configure: add --enable-tsan flag + fiber annotations for coroutine-ucontext
We tried running QEMU under tsan in 2016, but tsan's lack of support for
longjmp-based fibers was a blocker:
  https://groups.google.com/forum/#!topic/thread-sanitizer/se0YuzfWazw

Fortunately, thread sanitizer gained fiber support in early 2019:
  https://reviews.llvm.org/D54889

This patch brings tsan support upstream by importing the patch that annotated
QEMU's coroutines as tsan fibers in Android's QEMU fork:
  https://android-review.googlesource.com/c/platform/external/qemu/+/844675

Tested with '--enable-tsan --cc=clang-9 --cxx=clang++-9 --disable-werror'
configure flags.

Signed-off-by: Lingfeng Yang <lfy@google.com>
Signed-off-by: Emilio G. Cota <cota@braap.org>
[cota: minor modifications + configure changes]
Signed-off-by: Robert Foley <robert.foley@linaro.org>
[RF: configure changes, coroutine fix + minor modifications]
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20200609200738.445-2-robert.foley@linaro.org>
Message-Id: <20200612190237.30436-5-alex.bennee@linaro.org>
2020-06-16 14:49:05 +01:00
David Carlier
9548a89173 util/oslib: Returns the real thread identifier on FreeBSD and NetBSD
getpid is good enough in a mono thread context, however thr_self/_lwp_self
reflects the real current thread identifier from a given process.

Signed-off-by: David Carlier <devnexen@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Signed-off-by: David Carlier <devnexen@gmail.com>
2020-06-10 12:10:48 -04:00
Philippe Mathieu-Daudé
e4d6d41ce2 util/Makefile: Reduce the user-mode object list
These objects are not required when configured with --disable-system.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Tested-by: Laurent Vivier <laurent@vivier.eu>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20200522172510.25784-6-philmd@redhat.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2020-06-05 21:23:22 +02:00
xiaoqiang zhao
776b97d360 qemu-sockets: add abstract UNIX domain socket support
unix_listen/connect_saddr now support abstract address types

two aditional BOOL switches are introduced:
tight: whether to set @addrlen to the minimal string length,
       or the maximum sun_path length. default is TRUE
abstract: whether we use abstract address. default is FALSE

cli example:
-monitor unix:/tmp/unix.socket,abstract,tight=off
OR
-chardev socket,path=/tmp/unix.socket,id=unix1,abstract,tight=on

Signed-off-by: xiaoqiang zhao <zxq_yx_007@163.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2020-05-20 10:34:40 +01:00
Stefan Hajnoczi
ba607ca8bf aio-posix: disable fdmon-io_uring when GSource is used
The glib event loop does not call fdmon_io_uring_wait() so fd handlers
waiting to be submitted build up in the list. There is no benefit is
using io_uring when the glib GSource is being used, so disable it
instead of implementing a more complex fix.

This fixes a memory leak where AioHandlers would build up and increasing
amounts of CPU time were spent iterating them in aio_pending(). The
symptom is that guests become slow when QEMU is built with io_uring
support.

Buglink: https://bugs.launchpad.net/qemu/+bug/1877716
Fixes: 73fd282e7b ("aio-posix: add io_uring fd monitoring implementation")
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-by: Oleksandr Natalenko <oleksandr@redhat.com>
Message-id: 20200511183630.279750-3-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-05-18 18:16:00 +01:00
Stefan Hajnoczi
de137e44f7 aio-posix: don't duplicate fd handler deletion in fdmon_io_uring_destroy()
The io_uring file descriptor monitoring implementation has an internal
list of fd handlers that are pending submission to io_uring.
fdmon_io_uring_destroy() deletes all fd handlers on the list.

Don't delete fd handlers directly in fdmon_io_uring_destroy() for two
reasons:
1. This duplicates the aio-posix.c AioHandler deletion code and could
   become outdated if the struct changes.
2. Only handlers with the FDMON_IO_URING_REMOVE flag set are safe to
   remove. If the flag is not set then something still has a pointer to
   the fd handler. Let aio-posix.c and its user worry about that. In
   practice this isn't an issue because fdmon_io_uring_destroy() is only
   called when shutting down so all users have removed their fd
   handlers, but the next patch will need this!

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-by: Oleksandr Natalenko <oleksandr@redhat.com>
Message-id: 20200511183630.279750-2-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-05-18 18:16:00 +01:00
Peter Maydell
f19d118bed nbd patches for 2020-05-04
- reduce client-side fragmentation of NBD trim and status requests
 - fix iotest 41 when run in deep tree
 - fix socket activation in qemu-nbd
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEccLMIrHEYCkn0vOqp6FrSiUnQ2oFAl6whTUACgkQp6FrSiUn
 Q2r4nAf7BtGSFMkUu6nWYeq+Ggg+Xwmz2FLAzWTK/rccGDC44c9ETzOIbWEddo6X
 FHpU07VXdLW1h2M7ox8lQVo0DZEFxTRBYTPtUtjB7izfkAs4CkYeElJsZAPAZKgU
 GsKqa3RM6uXubsQaXXXjMFCGlYgqi1dVkmkgtPebt7evSe0ATlTfYfd0y9gb5f9C
 cbHD3CVcGKQe4ZtNcSBpTzOvXJSrBZznyCyhBO2qmVXTynt/5Ygog+Ulq3DHZsPX
 UkRkTPohKA0BhXuS7wD49danlzCLiTlvswr62fAncM1+AJTbmIa+apy3SwiOkwMh
 Aawq5vDtaFV+HEBKbMC0QRhgtoEe1w==
 =ExlI
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/ericb/tags/pull-nbd-2020-05-04' into staging

nbd patches for 2020-05-04

- reduce client-side fragmentation of NBD trim and status requests
- fix iotest 41 when run in deep tree
- fix socket activation in qemu-nbd

# gpg: Signature made Mon 04 May 2020 22:12:21 BST
# gpg:                using RSA key 71C2CC22B1C4602927D2F3AAA7A16B4A2527436A
# gpg: Good signature from "Eric Blake <eblake@redhat.com>" [full]
# gpg:                 aka "Eric Blake (Free Software Programmer) <ebb9@byu.net>" [full]
# gpg:                 aka "[jpeg image of size 6874]" [full]
# Primary key fingerprint: 71C2 CC22 B1C4 6029 27D2  F3AA A7A1 6B4A 2527 436A

* remotes/ericb/tags/pull-nbd-2020-05-04:
  block/nbd-client: drop max_block restriction from discard
  block/nbd-client: drop max_block restriction from block_status
  iotests/041: Fix NBD socket path
  tools: Fix use of fcntl(F_SETFD) during socket activation

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2020-05-05 15:47:44 +01:00
Peter Maydell
a2261b2754 trivial patches (20200504)
Silent static analyzer warning
 Remove dead assignments
 Support -chardev serial on macOS
 Update MAINTAINERS
 Some cosmetic changes
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEEzS913cjjpNwuT1Fz8ww4vT8vvjwFAl6wOI4SHGxhdXJlbnRA
 dml2aWVyLmV1AAoJEPMMOL0/L748p7UQAIFSNN0FrDV+K7i8qqq0X+JrS+dNOHNm
 DSpOf8IaGm/BezzL6XirXBVpFxg9iB5DQVLsjP1kUggO7rbBO0blx5H5eOPhnXZj
 xg60kLN16ty7NZ/WPS1G9jF4nDsjz0ZUtCXb0OXsuGJIOrsmN2r/lxdJwcjHZaqJ
 RzbcCSFXlvL0g7mOakJinMJH5r/nWCiUoEYsikhP10DcvuSBoCnjr+LYV6Ef02G0
 Y5lgKN2G0EAMgWTJaL3gIF27zS8QLDNll+eO+PIU5K4yo75/wRCKr4e3PpErZlf6
 B+hCAAPnXCpDKw+8sK2z+9OZXUGe1hQ8LHNgNNM921C66f+vLLXpIDTAECihM4K4
 0wThYlFDwT4j+PMHFNlzIobGMtb33ui8m40lepMt/YOVFqY4tr8u3MLhHkVDo2+8
 sNuOOWLXAoFOYyRqgTeVJvZvMUFQqtDiftghw1BR55TyIpDWjvLYRqae5CI+MGXs
 6YylZVHGzVjMVptxvivvIQ735Nq8LaKq7N8Cb7uvcbRaCki39BsxXVPZx4p6NdwN
 dMndUOz/y75dNlRMDjK8l/oRFPJa/p1Yz8mZhl0uVOO6JeJhBwYmk+WkQ7g/GHZb
 Rx15HnVWRu6C/Icbw4kqZYyqrgl5lykS8aAWURePdpjzKY77rY1H71FesMhjifRN
 ZGgfUdWI88M4
 =ibgH
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/vivier2/tags/trivial-branch-for-5.1-pull-request' into staging

trivial patches (20200504)

Silent static analyzer warning
Remove dead assignments
Support -chardev serial on macOS
Update MAINTAINERS
Some cosmetic changes

# gpg: Signature made Mon 04 May 2020 16:45:18 BST
# gpg:                using RSA key CD2F75DDC8E3A4DC2E4F5173F30C38BD3F2FBE3C
# gpg:                issuer "laurent@vivier.eu"
# gpg: Good signature from "Laurent Vivier <lvivier@redhat.com>" [full]
# gpg:                 aka "Laurent Vivier <laurent@vivier.eu>" [full]
# gpg:                 aka "Laurent Vivier (Red Hat) <lvivier@redhat.com>" [full]
# Primary key fingerprint: CD2F 75DD C8E3 A4DC 2E4F  5173 F30C 38BD 3F2F BE3C

* remotes/vivier2/tags/trivial-branch-for-5.1-pull-request:
  hw/timer/pxa2xx_timer: Add assertion to silent static analyzer warning
  hw/timer/stm32f2xx_timer: Remove dead assignment
  hw/gpio/aspeed_gpio: Remove dead assignment
  hw/isa/i82378: Remove dead assignment
  hw/ide/sii3112: Remove dead assignment
  hw/input/adb-kbd: Remove dead assignment
  hw/i2c/pm_smbus: Remove dead assignment
  blockdev: Remove dead assignment
  block: Avoid dead assignment
  Compress lines for immediate return
  chardev: Add macOS to list of OSes that support -chardev serial
  MAINTAINERS: Update Keith Busch's email address
  elf_ops: Don't try to g_mapped_file_unref(NULL)
  hw/mem/pc-dimm: Fix line over 80 characters warning
  hw/mem/pc-dimm: Print slot number on error at pc_dimm_pre_plug()
  MAINTAINERS: Mark the LatticeMico32 target as orphan
  timer/exynos4210_mct: Remove redundant statement in exynos4210_mct_write()
  display/blizzard: use extract16() for fix clang analyzer warning in blizzard_draw_line16_32()
  scsi/esp-pci: add g_assert() for fix clang analyzer warning in esp_pci_io_write()

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2020-05-05 14:03:28 +01:00
Eric Blake
474a6e64f2 tools: Fix use of fcntl(F_SETFD) during socket activation
Blindly setting FD_CLOEXEC without a read-modify-write will
inadvertently clear any other intentionally-set bits, such as a
proposed new bit for designating a fd that must behave in 32-bit mode.
However, we cannot use our wrapper qemu_set_cloexec(), because that
wrapper intentionally abort()s on failure, whereas the probe here
intentionally tolerates failure to deal with incorrect socket
activation gracefully.  Instead, fix the code to do the proper
read-modify-write.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200420175309.75894-3-eblake@redhat.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
2020-05-04 14:54:35 -05:00
Daniel Brodsky
6e8a355de6 lockable: replaced locks with lock guard macros where appropriate
- ran regexp "qemu_mutex_lock\(.*\).*\n.*if" to find targets
- replaced result with QEMU_LOCK_GUARD if all unlocks at function end
- replaced result with WITH_QEMU_LOCK_GUARD if unlock not at end

Signed-off-by: Daniel Brodsky <dnbrdsky@gmail.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-id: 20200404042108.389635-3-dnbrdsky@gmail.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-05-04 16:07:43 +01:00
Simran Singhal
b3ac2b94cd Compress lines for immediate return
Compress two lines into a single line if immediate return statement is found.

It also remove variables progress, val, data, ret and sock
as they are no longer needed.

Remove space between function "mixer_load" and '(' to fix the
checkpatch.pl error:-
ERROR: space prohibited between function name and open parenthesis '('

Done using following coccinelle script:
@@
local idexpression ret;
expression e;
@@

-ret =
+return
     e;
-return ret;

Signed-off-by: Simran Singhal <singhalsimran0@gmail.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20200401165314.GA3213@simran-Inspiron-5558>
[lv: in handle_aiocb_write_zeroes_unmap() move "int ret" inside the #ifdef]
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2020-05-04 14:43:22 +02:00
Markus Armbruster
2500f6f30b qemu-option: Clean up after the previous commit
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20200415083048.14339-6-armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2020-04-30 06:51:15 +02:00
Markus Armbruster
7b1cd1c65a qobject: Eliminate qdict_iter(), use qdict_first(), qdict_next()
qdict_iter() has just three uses and no test coverage.  Replace by
qdict_first(), qdict_next() for more concise code and less type
punning.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20200415083048.14339-5-armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2020-04-30 06:51:15 +02:00
Markus Armbruster
80c710cb06 qemu-img: Move is_valid_option_list() to qemu-img.c and rewrite
is_valid_option_list()'s purpose is ensuring qemu-img.c's can safely
join multiple parameter strings separated by ',' like this:

        g_strdup_printf("%s,%s", params1, params2);

How it does that is anything but obvious.  A close reading of the code
reveals that it fails exactly when its argument starts with ',' or
ends with an odd number of ','.  Makes sense, actually, because when
the argument starts with ',', a separating ',' preceding it would get
escaped, and when it ends with an odd number of ',', a separating ','
following it would get escaped.

Move it to qemu-img.c and rewrite it the obvious way.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200415074927.19897-9-armbru@redhat.com>
2020-04-29 08:01:52 +02:00
Markus Armbruster
56a9efa199 qemu-option: Avoid has_help_option() in qemu_opts_parse_noisily()
When opts_parse() sets @invalidp to true, qemu_opts_parse_noisily()
uses has_help_option() to decide whether to print help.  This parses
the input string a second time.

Easy to avoid: replace @invalidp by @help_wanted.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200415074927.19897-7-armbru@redhat.com>
2020-04-29 08:01:51 +02:00
Markus Armbruster
80a9485573 qemu-option: Fix has_help_option()'s sloppy parsing
has_help_option() uses its own parser.  It's inconsistent with
qemu_opts_parse(), as demonstrated by test-qemu-opts case
/qemu-opts/has_help_option.  Fix by reusing the common parser.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20200415074927.19897-5-armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2020-04-29 08:01:51 +02:00
Markus Armbruster
933d152778 qemu-option: Fix sloppy recognition of "id=..." after ",,"
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20200415074927.19897-4-armbru@redhat.com>
2020-04-29 08:01:51 +02:00
Markus Armbruster
6129803b55 qemu-options: Factor out get_opt_name_value() helper
The next commits will put it to use.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20200415074927.19897-3-armbru@redhat.com>
2020-04-29 08:01:51 +02:00
Peter Maydell
e33d61cc9a Bugfixes, and reworking of the atomics documentation.
-----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAl6UDRYUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroOSGggAgy/pnlhh5NjGc0PZLhz09O1MlOiT
 iS8/RCudLR/yDJ0K7pweWKc1lGrS11G1n1P+58G6sK7al4NOdlMMgtk1VtZAlMJ4
 dSQ+DGV7JaoPztu5ec2V7LiJmhyxrVaKx7xg9JGx0bZ/1wCC1GqZUlZ2hYdgQ8L4
 EchdwqzRd2sznlUVAP19ZcPb6sYG2VlkIzFytd5p3xZqrr0g3RJa7nmWRWAnEx1L
 5/13U6g2PEU3jFKTtOcELFq8F/tB8id+fwIE2GB3glKzBHXnJSAfpzBV3/8L72xV
 JqSUa62O12qGX5k5F9BJPgcfxs40wyEkWTBJuW+WQvsmI73EJ3B30gjkhw==
 =8Dsv
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging

Bugfixes, and reworking of the atomics documentation.

# gpg: Signature made Mon 13 Apr 2020 07:56:22 BST
# gpg:                using RSA key F13338574B662389866C7682BFFBD25F78C7AE83
# gpg:                issuer "pbonzini@redhat.com"
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [full]
# gpg:                 aka "Paolo Bonzini <pbonzini@redhat.com>" [full]
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4  E2F7 7E15 100C CD36 69B1
#      Subkey fingerprint: F133 3857 4B66 2389 866C  7682 BFFB D25F 78C7 AE83

* remotes/bonzini/tags/for-upstream:
  module: increase dirs array size by one
  memory: Do not allow direct write access to rom_device regions
  vl.c: error out if -mem-path is used together with -M memory-backend
  rcu: do not mention atomic_mb_read/set in documentation
  atomics: update documentation
  atomics: convert to reStructuredText
  oslib-posix: take lock before qemu_cond_broadcast
  piix: fix xenfv regression, add compat machine xenfv-4.2

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2020-04-13 13:11:38 +01:00
Bruce Rogers
267514b33f module: increase dirs array size by one
With the module upgrades code change, the statically sized dirs array
can now overflow. Increase it's size by one, according to the new
maximum possible usage.

Fixes: bd83c861c0 ("modules: load modules from versioned /var/run dir")
Signed-off-by: Bruce Rogers <brogers@suse.com>
Message-Id: <20200411010746.472295-1-brogers@suse.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-04-13 02:56:18 -04:00
Bauerchen
278fb16273 oslib-posix: take lock before qemu_cond_broadcast
In touch_all_pages, if the mutex is not taken around qemu_cond_broadcast,
qemu_cond_broadcast may be called before all touch page threads enter
qemu_cond_wait. In this case, the touch page threads wait forever for the
main thread to wake them up, causing a deadlock.

Signed-off-by: Bauerchen <bauerchen@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-04-11 08:49:20 -04:00
Paolo Bonzini
5710a3e09f async: use explicit memory barriers
When using C11 atomics, non-seqcst reads and writes do not participate
in the total order of seqcst operations.  In util/async.c and util/aio-posix.c,
in particular, the pattern that we use

          write ctx->notify_me                 write bh->scheduled
          read bh->scheduled                   read ctx->notify_me
          if !bh->scheduled, sleep             if ctx->notify_me, notify

needs to use seqcst operations for both the write and the read.  In
general this is something that we do not want, because there can be
many sources that are polled in addition to bottom halves.  The
alternative is to place a seqcst memory barrier between the write
and the read.  This also comes with a disadvantage, in that the
memory barrier is implicit on strongly-ordered architectures and
it wastes a few dozen clock cycles.

Fortunately, ctx->notify_me is never written concurrently by two
threads, so we can assert that and relax the writes to ctx->notify_me.
The resulting solution works and performs well on both aarch64 and x86.

Note that the atomic_set/atomic_read combination is not an atomic
read-modify-write, and therefore it is even weaker than C11 ATOMIC_RELAXED;
on x86, ATOMIC_RELAXED compiles to a locked operation.

Analyzed-by: Ying Fang <fangying1@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Tested-by: Ying Fang <fangying1@huawei.com>
Message-Id: <20200407140746.8041-6-pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-04-09 16:17:14 +01:00
Stefan Hajnoczi
636b836d5f aio-posix: signal-proof fdmon-io_uring
The io_uring_enter(2) syscall returns with errno=EINTR when interrupted
by a signal.  Retry the syscall in this case.

It's essential to do this in the io_uring_submit_and_wait() case.  My
interpretation of the Linux v5.5 io_uring_enter(2) code is that it
shouldn't affect the io_uring_submit() case, but there is no guarantee
this will always be the case.  Let's check for -EINTR around both APIs.

Note that the liburing APIs have -errno return values.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20200408091139.273851-1-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-04-09 15:55:06 +01:00
Alex Bennée
01ef6b9e4e linux-user: factor out reading of /proc/self/maps
Unfortunately reading /proc/self/maps is still considered the gold
standard for a process finding out about it's own memory layout. As we
will want this data in other contexts soon factor out the code to read
and parse the data. Rather than just blindly copying the existing
sscanf based code we use a more modern glib version of the parsing
code to make a more general purpose map structure.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20200403191150.863-9-alex.bennee@linaro.org>
2020-04-07 16:19:49 +01:00
Stefan Hajnoczi
ae60ab7eb2 aio-posix: fix test-aio /aio/event/wait with fdmon-io_uring
When a file descriptor becomes ready we must re-arm POLL_ADD.  This is
done by adding an sqe to the io_uring sq ring.  The ->need_wait()
function wasn't taking pending sqes into account and therefore
io_uring_submit_and_wait() was not being called.  Polling for cqes
failed to detect fd readiness since we hadn't submitted the sqe to
io_uring.

This patch fixes the following tests/test-aio -p /aio/event/wait
failure:

  ok 11 /aio/event/wait
  **
  ERROR:tests/test-aio.c:374:test_flush_event_notifier: assertion failed: (aio_poll(ctx, false))

Reported-by: Cole Robinson <crobinso@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Tested-by: Cole Robinson <crobinso@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20200402145434.99349-1-stefanha@redhat.com
Fixes: 73fd282e7b
       ("aio-posix: add io_uring fd monitoring implementation")
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-04-03 12:42:40 +01:00
Robert Hoo
8f13a39dc0 util/bufferiszero: improve avx2 accelerator
By increasing avx2 length_to_accel to 128, we can simplify its logic and reduce a
branch.

The authorship of this patch actually belongs to Richard Henderson
<richard.henderson@linaro.org>, I just fixed a boundary case on his
original patch.

Suggested-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Robert Hoo <robert.hu@linux.intel.com>
Message-Id: <1585119021-46593-2-git-send-email-robert.hu@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-04-01 14:24:03 -04:00
Robert Hoo
b87c99d073 util/bufferiszero: assign length_to_accel value for each accelerator case
Because in unit test, init_accel() will be called several times, each with
different accelerator type.

Signed-off-by: Robert Hoo <robert.hu@linux.intel.com>
Message-Id: <1585119021-46593-1-git-send-email-robert.hu@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-04-01 14:24:03 -04:00
Stefan Hajnoczi
ff807d5592 aio-posix: fix io_uring with external events
When external event sources are disabled fdmon-io_uring falls back to
fdmon-poll.  The ->need_wait() callback needs to watch for this so it
can return true when external event sources are disabled.

It is also necessary to call ->wait() when AioHandlers have changed
because io_uring is asynchronous and we must submit new sqes.

Both of these changes to ->need_wait() together fix tests/test-aio -p
/aio/external-client, which failed with:

  test-aio: tests/test-aio.c:404: test_aio_external_client: Assertion `aio_poll(ctx, false)' failed.

Reported-by: Julia Suvorova <jusual@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Message-id: 20200319163559.117903-1-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-03-23 11:05:44 +00:00
Vladimir Sementsov-Ogievskiy
299ea9ff01 block/dirty-bitmap: improve _next_dirty_area API
Firstly, _next_dirty_area is for scenarios when we may contiguously
search for next dirty area inside some limited region, so it is more
comfortable to specify "end" which should not be recalculated on each
iteration.

Secondly, let's add a possibility to limit resulting area size, not
limiting searching area. This will be used in NBD code in further
commit. (Note that now bdrv_dirty_bitmap_next_dirty_area is unused)

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 20200205112041.6003-8-vsementsov@virtuozzo.com
Signed-off-by: John Snow <jsnow@redhat.com>
2020-03-18 14:03:46 -04:00
Vladimir Sementsov-Ogievskiy
9399c54b75 block/dirty-bitmap: add _next_dirty API
We have bdrv_dirty_bitmap_next_zero, let's add corresponding
bdrv_dirty_bitmap_next_dirty, which is more comfortable to use than
bitmap iterators in some cases.

For test modify test_hbitmap_next_zero_check_range to check both
next_zero and next_dirty and add some new checks.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 20200205112041.6003-7-vsementsov@virtuozzo.com
Signed-off-by: John Snow <jsnow@redhat.com>
2020-03-18 14:03:46 -04:00
Vladimir Sementsov-Ogievskiy
642700fda0 block/dirty-bitmap: switch _next_dirty_area and _next_zero to int64_t
We are going to introduce bdrv_dirty_bitmap_next_dirty so that same
variable may be used to store its return value and to be its parameter,
so it would int64_t.

Similarly, we are going to refactor hbitmap_next_dirty_area to use
hbitmap_next_dirty together with hbitmap_next_zero, therefore we want
hbitmap_next_zero parameter type to be int64_t too.

So, for convenience update all parameters of *_next_zero and
*_next_dirty_area to be int64_t.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 20200205112041.6003-6-vsementsov@virtuozzo.com
Signed-off-by: John Snow <jsnow@redhat.com>
2020-03-18 14:03:46 -04:00
Vladimir Sementsov-Ogievskiy
0c88f1970c hbitmap: drop meta bitmaps as they are unused
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 20200205112041.6003-5-vsementsov@virtuozzo.com
Signed-off-by: John Snow <jsnow@redhat.com>
2020-03-18 14:03:46 -04:00
Vladimir Sementsov-Ogievskiy
30b8346cc3 hbitmap: unpublish hbitmap_iter_skip_words
Function is internal and even commented as internal. Drop its
definition from .h file.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 20200205112041.6003-4-vsementsov@virtuozzo.com
Signed-off-by: John Snow <jsnow@redhat.com>
2020-03-18 14:03:46 -04:00
Vladimir Sementsov-Ogievskiy
be24c7140c hbitmap: move hbitmap_iter_next_word to hbitmap.c
The function is definitely internal (it's not used by third party and
it has complicated interface). Move it to .c file.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 20200205112041.6003-3-vsementsov@virtuozzo.com
Signed-off-by: John Snow <jsnow@redhat.com>
2020-03-18 14:03:46 -04:00
Vladimir Sementsov-Ogievskiy
6a150995d4 hbitmap: assert that we don't create bitmap larger than INT64_MAX
We have APIs which returns signed int64_t, to be able to return error.
Therefore we can't handle bitmaps with absolute size larger than
(INT64_MAX+1). Still, keep maximum to be INT64_MAX which is a bit
safer.

Note, that bitmaps are used to represent disk images, which can't
exceed INT64_MAX anyway.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 20200205112041.6003-2-vsementsov@virtuozzo.com
Signed-off-by: John Snow <jsnow@redhat.com>
2020-03-18 14:03:46 -04:00
Stefan Hajnoczi
3284c3ddc4 lockable: add lock guards
This patch introduces two lock guard macros that automatically unlock a
lock object (QemuMutex and others):

  void f(void) {
      QEMU_LOCK_GUARD(&mutex);
      if (!may_fail()) {
          return; /* automatically unlocks mutex */
      }
      ...
  }

and:

  WITH_QEMU_LOCK_GUARD(&mutex) {
      if (!may_fail()) {
          return; /* automatically unlocks mutex */
      }
  }
  /* automatically unlocks mutex here */
  ...

Convert qemu-timer.c functions that benefit from these macros as an
example.  Manual qemu_mutex_lock/unlock() callers are left unmodified in
cases where clarity would not improve by switching to the macros.

Many other QemuMutex users remain in the codebase that might benefit
from lock guards.  Over time they can be converted, if that is
desirable.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
[Use QEMU_MAKE_LOCKABLE_NONNULL. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-17 15:18:45 +01:00
Christian Ehrhardt
bd83c861c0 modules: load modules from versioned /var/run dir
On upgrades the old .so files usually are replaced. But on the other
hand since a qemu process represents a guest instance it is usually kept
around.

That makes late addition of dynamic features e.g. 'hot-attach of a ceph
disk' fail by trying to load a new version of e.f. block-rbd.so into an
old still running qemu binary.

This adds a fallback to also load modules from a versioned directory in the
temporary /var/run path. That way qemu is providing a way for packaging
to store modules of an upgraded qemu package as needed until the next reboot.

An example how that can then be used in packaging can be seen in:
https://git.launchpad.net/~paelzer/ubuntu/+source/qemu/log/?h=bug-1847361-miss-old-so-on-upgrade-UBUNTU

Fixes: https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1847361
Signed-off-by: Christian Ehrhardt <christian.ehrhardt@canonical.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20200310145806.18335-2-christian.ehrhardt@canonical.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 23:02:22 +01:00
Paolo Bonzini
78b3f67acd oslib-posix: initialize mutex and condition variable
The mutex and condition variable were never initialized, causing
-mem-prealloc to abort with an assertion failure.

Fixes: 037fb5eb39
Reported-by: Marc Hartmayer <mhartmay@linux.ibm.com>
Cc: bauerchen <bauerchen@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 23:02:22 +01:00
Robert Hoo
27f08ea1c7 util: add util function buffer_zero_avx512()
And intialize buffer_is_zero() with it, when Intel AVX512F is
available on host.

This function utilizes Intel AVX512 fundamental instructions which
is faster than its implementation with AVX2 (in my unit test, with
4K buffer, on CascadeLake SP, ~36% faster, buffer_zero_avx512() V.S.
buffer_zero_avx2()).

Signed-off-by: Robert Hoo <robert.hu@linux.intel.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 23:02:21 +01:00
Peter Maydell
6e8a73e911 Pull request
-----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEhpWov9P5fNqsNXdanKSrs4Grc8gFAl5o3EQACgkQnKSrs4Gr
 c8jbYwgAupuS62MCITszybRE5Ote5CL80QDbMXNsZnZh6YBIGhPCqW2zdI2+m0zu
 +qoN6x6dxxNUWCvNlhbOcA45fiZVmYzS69TiEo21kCDijoK+h8+W+YbXuxGR2xJi
 cZMm8Q1DiK6Lj3vyfiwkFf4ns3VNz9DhI9hXu6CcpSkNcp79elQu87JJbzEWWWWy
 uEc7uEyBr0uCAKLEJvaLzzzWE2D2i6qKlmj3G17UbDNgCJ/Q/5HX13RUfMrgNgiP
 wmpcJ5MsB3Prz3K4XMMytUKXX/M8zpRLahp3p31t9qHelTWC3Lk1U4xzLPTJZPlm
 if/lrGRCRml+DKb9keBjWeTF4U31vg==
 =LftU
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/stefanha/tags/block-pull-request' into staging

Pull request

# gpg: Signature made Wed 11 Mar 2020 12:40:36 GMT
# gpg:                using RSA key 8695A8BFD3F97CDAAC35775A9CA4ABB381AB73C8
# gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>" [full]
# gpg:                 aka "Stefan Hajnoczi <stefanha@gmail.com>" [full]
# Primary key fingerprint: 8695 A8BF D3F9 7CDA AC35  775A 9CA4 ABB3 81AB 73C8

* remotes/stefanha/tags/block-pull-request:
  aio-posix: remove idle poll handlers to improve scalability
  aio-posix: support userspace polling of fd monitoring
  aio-posix: add io_uring fd monitoring implementation
  aio-posix: simplify FDMonOps->update() prototype
  aio-posix: extract ppoll(2) and epoll(7) fd monitoring
  aio-posix: move RCU_READ_LOCK() into run_poll_handlers()
  aio-posix: completely stop polling when disabled
  aio-posix: remove confusing QLIST_SAFE_REMOVE()
  qemu/queue.h: clear linked list pointers on remove

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2020-03-11 14:41:27 +00:00
Stefan Hajnoczi
d37d0e365a aio-posix: remove idle poll handlers to improve scalability
When there are many poll handlers it's likely that some of them are idle
most of the time.  Remove handlers that haven't had activity recently so
that the polling loop scales better for guests with a large number of
devices.

This feature only takes effect for the Linux io_uring fd monitoring
implementation because it is capable of combining fd monitoring with
userspace polling.  The other implementations can't do that and risk
starving fds in favor of poll handlers, so don't try this optimization
when they are in use.

IOPS improves from 10k to 105k when the guest has 100
virtio-blk-pci,num-queues=32 devices and 1 virtio-blk-pci,num-queues=1
device for rw=randread,iodepth=1,bs=4k,ioengine=libaio on NVMe.

[Clarified aio_poll_handlers locking discipline explanation in comment
after discussion with Paolo Bonzini <pbonzini@redhat.com>.
--Stefan]

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Link: https://lore.kernel.org/r/20200305170806.1313245-8-stefanha@redhat.com
Message-Id: <20200305170806.1313245-8-stefanha@redhat.com>
2020-03-09 16:45:16 +00:00