Not all platforms check whether a lock is initialized before used. In
particular Linux seems to be more permissive than OSX.
Check initialization state explicitly in our code to catch such bugs
earlier.
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-Id: <20170704122325.25634-1-famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
After converting to use DMA api for virtio devices, we should use
dma_as instead of address_space_memory. Otherwise it won't work if
IOMMU is enabled.
Fixes: commit 8607f5c307 ("virtio: convert to use DMA api")
Cc: qemu-stable@nongnu.org
Signed-off-by: Jason Wang <jasowang@redhat.com>
Message-Id: <1499170866-9068-1-git-send-email-jasowang@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Using signal to establish a signal handler is not portable; on
SysV systems, the signal handler would be reset to SIG_DFL after
delivery, while BSD preserves the signal handler. Daniel Berrange
reported that (to complicate matters further) the signal system call
has SysV behavior, but glibc signal() actually calls the sigaction
system call to provide BSD behavior.
However, using signal() to set a signal's disposition to SIG_DFL
or SIG_IGN is portable and is a relatively common occurrence in
QEMU source code, so allow that.
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Reviewed-by: Richard W.M. Jones <rjones@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
In QEMU's main_loop() we used to check whether we should do
a nonblocking call to main_loop(); this was deleted in commit e330c118f2,
because now that vCPUs always drop the I/O thread lock it is an unnecessary
optimization.
The loop in test-char.c copied the old QEMU main_loop() code, but
here the nonblocking check has never been necessary because this
standalone test case doesn't hold the I/O lock anyway. Remove it,
so we can drop the main_loop_wait() return value.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <1498584769-12439-2-git-send-email-peter.maydell@linaro.org>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The original ready < nhandles - 1 can be re-written as ready + 1 <
nhandles. The check was actually incorrect because
WAIT_OBJECT_0 was not subtracted from ready; it worked because
WAIT_OBJECT_0 is zero. After subtracting WAIT_OBJECT_0,
the result is the same condition that we are checking on the first
itteration of the for loop. This means we can remove the if statement
and let the for loop check the code.
Signed-off-by: Alistair Francis <alistair.francis@xilinx.com>
Message-Id: <a14083d681951f3999a0e9314605cb706381ae8d.1498756113.git.alistair.francis@xilinx.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This patch pulls out of kvm.c and into the new files the implementation
for the xsave and xrstor instructions. This so they can be shared by
kvm and hvf.
Signed-off-by: Sergio Andres Gomez Del Real <Sergio.G.DelReal@gmail.com>
Message-Id: <20170626200832.11058-1-Sergio.G.DelReal@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sergio Andres Gomez Del Real <sergio.g.delreal@gmail.com>
The 'sun_path' field in the sockaddr_un struct is not required
to be NUL termianted, so when reporting an error, we must use
the separate 'path' variable which is guaranteed terminated.
Fixes a bug spotted by coverity that was introduced in
commit ad9579aaa1
Author: Daniel P. Berrange <berrange@redhat.com>
Date: Thu May 25 16:53:00 2017 +0100
sockets: improve error reporting if UNIX socket path is too long
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-Id: <20170626103756.22974-1-berrange@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
edu.c does not contain any target-specific code, so we can put
it into common-obj-y to compile it only once for all targets.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <1498454578-18709-8-git-send-email-thuth@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
There does not seem to be any target specific code in this file, so
we can put it into "common-obj" instead of "obj" to compile it only
once for all targets.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <1498454578-18709-7-git-send-email-thuth@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
CONFIG_SOFTMMU should never be used in common code, so mark
it as poisoned, too.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <1498454578-18709-6-git-send-email-thuth@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Commit 1f5c00cfdb ("qom/cpu: move tlb_flush to cpu_common_reset")
moved the call to tlb_flush() from the target-specific reset handlers
into the common code qom/cpu.c file, and protected the call with
"#ifdef CONFIG_SOFTMMU" to avoid that it is called for linux-user
only targets. But since qom/cpu.c is common code, CONFIG_SOFTMMU is
*never* defined here, so the tlb_flush() was simply never executed
anymore. Fix it by introducing a wrapper for tlb_flush() in a file
that is re-compiled for each target, i.e. in translate-all.c.
Fixes: 1f5c00cfdb
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <1498454578-18709-5-git-send-email-thuth@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
CONFIG_KVM is only defined for target-specific code, so nobody should
use it by accident in common code. To avoid such subtle bugs,
CONFIG_KVM is now marked as poisoned in common code. The header
include/sysemu/kvm.h is somewhat special since it is included
all over the place from common code, too, so we need some extra
logic via "#ifdef NEED_CPU_H" here to make sure that we can
compile all files without problems.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <1498454578-18709-4-git-send-email-thuth@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
pc.h and sysemu/kvm.h are also included from common code (where
CONFIG_KVM is not available), so the #defines that depend on CONFIG_KVM
should not be declared here to avoid that anybody is using them in a
wrong way. Since we're also going to poison CONFIG_KVM for common code,
let's move them to kvm_i386.h instead. Most of the dummy definitions
from sysemu/kvm.h are also unused since the code that uses them is
only compiled for CONFIG_KVM (e.g. target/i386/kvm.c), so the unused
defines are also simply dropped here instead of being moved.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <1498454578-18709-3-git-send-email-thuth@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The defines of some *-linux-user targets were still missing.
Suggested-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <1498454578-18709-2-git-send-email-thuth@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Move the handling of conforming code segments before the handling
of stack switch.
Because dpl == cpl after the new "if", it's now unnecessary to check
the C bit when testing dpl < cpl. Furthermore, dpl > cpl is checked
slightly above the modified code, so the final "else" is unreachable
and we can remove it.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
In do_interrupt64(), when interrupt stack table(ist) is enabled
and the the target code segment is conforming(e2 & DESC_C_MASK), the
old implementation always set new CPL to 0, and SS.RPL to 0.
This is incorrect for when CPL3 code access a CPL0 conforming code
segment, the CPL should remain unchanged. Otherwise higher privileged
code can be compromised.
The patch fix this for always set dpl = cpl when the target code segment
is conforming, and modify the last parameter `flags`, which contains
correct new CPL, in cpu_x86_load_seg_cache().
Signed-off-by: Wu Xiang <willx8@gmail.com>
Message-Id: <20170621142152.GA18094@wxdeubuntu.ipads-lab.se.sjtu.edu.cn>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
When attaching the NBD QIOChannel to an AioContext, the TLS channel should
be used, not the underlying socket channel. This is because, trivially,
the TLS channel will be the one that we read/write to and thus the one
that will get the qio_channel_yield() call.
Fixes: ff82911cd3
Cc: qemu-stable@nongnu.org
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Tested-by: Daniel P. Berrange <berrange@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Since commit 3f2ce724f1 ("Move the qemu-ga description into a
separate chapter"), the qemu.1 man page looks pretty much screwed
up, e.g. the title was "qemu-ga - QEMU Guest Agent" instead of
"qemu-doc - QEMU Emulator User Documentation". However, that movement
of the gemu-ga chapter is not the real problem, it just triggered
another bug in the qemu-doc.texi: There are some parts in the file
which introduce a "@c man begin OPTIONS" section, but never close
it again with "@c man end". After adding the proper end tags here,
the title of the man page is right again and the previously wrongly
tagged sections now also show up correctly in the man page, too.
Reported-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <1497863771-24929-1-git-send-email-thuth@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This patch simply replaces the separate boolean field in CPUState that
kvm, hax (and upcoming hvf) have for keeping track of vcpu dirtiness
with a single shared field.
Signed-off-by: Sergio Andres Gomez Del Real <Sergio.G.DelReal@gmail.com>
Message-Id: <20170618191101.3457-1-Sergio.G.DelReal@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Some fixes and cleanups. New tests.
Configurable tx queue size for virtio-net.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQEcBAABAgAGBQJZWp5VAAoJECgfDbjSjVRpI6sIAMFpi+UBsE8NR5s6kNZOJIEc
rajYhfnuCoGmAXiDalVVgEjyEjlfeDqkdQyWb9r4XNRGfPAv76V4d9l0KNnuGEHF
5GFNduAuECYm8Hl5e6J/gSNCau/hmdBOtFUZvYWs+yhVpRw7+8lJTvhviNzBIGa0
mZBHCaAUzdyW7fvPh0inWxhXscPaUi8pHfohthsuTxRuBjKrQq4L/4zF9u4Mmtu+
zNHpNMQ/mn3uC9IjiD7csfqF9IHxMlUl0IkoKXm1waIpZjr9LQ5hPhbP1KepVP6c
fgPuJYHT+HeVfzyBIXbTZGhza2KduDLM7YCGWkprPBmjwM7OLrBl3JuWmw4Kg+4=
=g6bo
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging
pc, acpi, pci, virtio: fixes, cleanups, features, tests
Some fixes and cleanups. New tests.
Configurable tx queue size for virtio-net.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
# gpg: Signature made Mon 03 Jul 2017 20:43:17 BST
# gpg: using RSA key 0x281F0DB8D28D5469
# gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>"
# gpg: aka "Michael S. Tsirkin <mst@redhat.com>"
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67
# Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469
* remotes/mst/tags/for_upstream: (21 commits)
i386/acpi: update expected acpi files
virtio-net: fix tx queue size for !vhost-user
tests: Add unit tests for the VM Generation ID feature
vhost-user: unregister slave req handler at cleanup time
vhost: ensure vhost_ops are set before calling iotlb callback
intel_iommu: fix migration breakage on mr switch
hw/acpi: remove dead acpi code
fw_cfg: move setting of FW_CFG_VERSION_DMA bit to fw_cfg_init1()
fw_cfg: don't map the fw_cfg IO ports in fw_cfg_io_realize()
i386/kvm/pci-assign: Use errp directly rather than local_err
i386/kvm/pci-assign: Fix return type of verify_irqchip_kernel()
pci: Convert shpc_init() to Error
pci: Convert to realize
pci: Replace pci_add_capability2() with pci_add_capability()
pci: Make errp the last parameter of pci_add_capability()
pci: Fix the wrong assertion.
pci: Add comment for pci_add_capability2()
pci: Clean up error checking in pci_add_capability()
intel_iommu: relax iq tail check on VTD_GCMD_QIE enable
hw/pci-bridge/dec: Classify the DEC PCI bridge as bridge device
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
We dropped some dead code, update extected table binaries.
Fixes: 4d7e7f2702 ("hw/acpi: remove dead acpi code")
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Current code segfaults when no nic peer is specified.
Fix it up - fall back to default queue size.
Fixes: 9b02e1618c ("virtio-net: enable configurable tx queue size")
Cc: Wei Wang <wei.w.wang@intel.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The following tests are implemented:
* test that a GUID passed in by command line is propagated to the guest.
Read the GUID from guest memory
* test that the "auto" argument to the GUID generates a valid GUID, as
seen by the guest.
* test that a GUID passed in can be queried from the monitor
This patch is loosely based on a previous patch from:
Gal Hammer <ghammer@redhat.com> and Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Ben Warren <ben@skyportsystems.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
If the backend sends a request just before closing the socket,
the aio dispatcher might schedule its reading after the vhost
device has been cleaned, leading to a NULL pointer dereference
in slave_read();
vhost_user_cleanup() already closes the socket but it is not
enough, the handler has to be unregistered.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
This patch fixes a crash that happens when vhost-user iommu
support is enabled and vhost-user socket is closed.
When it happens, if an IOTLB invalidation notification is sent
by the IOMMU, vhost_ops's NULL pointer is dereferenced.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Migration is broken after the vfio integration work:
qemu-kvm: AHCI: Failed to start FIS receive engine: bad FIS receive buffer address
qemu-kvm: Failed to load ich9_ahci:ahci
qemu-kvm: error while loading state for instance 0x0 of device '0000:00:1f.2/ich9_ahci'
qemu-kvm: load of migration failed: Operation not permitted
The problem is that vfio work introduced dynamic memory region
switching (actually it is also used for future PT mode), and this memory
region layout is not properly delivered to destination when migration
happens. Solution is to rebuild the layout in post_load.
Bug: https://bugzilla.redhat.com/show_bug.cgi?id=1459906
Fixes: 558e0024 ("intel_iommu: allow dynamic switch of IOMMU region")
Reviewed-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Aleksandr Bezzubikov <zuban32s@gmail.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The setting of the FW_CFG_VERSION_DMA bit is the same across both the
TYPE_FW_CFG_MEM and TYPE_FW_CFG_IO devices, so unify the logic in
fw_cfg_init1().
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Gabriel Somlo <somlo@cmu.edu>
As indicated by Laszlo it is a QOM bug for the realize() method to actually
map the device. Set up the IO regions within fw_cfg_io_realize() and defer
the mapping with sysbus_add_io() to the caller, as already done in
fw_cfg_init_mem_wide().
This makes the iobase and dma_iobase properties now obsolete so they can be
removed.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Gabriel Somlo <somlo@cmu.edu>
In assigned_device_pci_cap_init(), first, error messages are filled
to a local_err variable, then through error_propagate() pass to
the parameter of errp. It leads to cumbersome code. In order to
avoid the extra local_err and error_propagate(), drop it and use
errp instead.
Cc: pbonzini@redhat.com
Cc: rth@twiddle.net
Cc: ehabkost@redhat.com
Cc: mst@redhat.com
Cc: armbru@redhat.com
Cc: marcel@redhat.com
Signed-off-by: Mao Zhongyi <maozy.fnst@cn.fujitsu.com>
Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
When the function no success value to transmit, it usually make the
function return void. It has turned out not to be a success, because
it means that the extra local_err variable and error_propagate() will
be needed. It leads to cumbersome code, therefore, transmit success/
failure in the return value is worth. So fix the return type to avoid
it.
Cc: pbonzini@redhat.com
Cc: rth@twiddle.net
Cc: ehabkost@redhat.com
Cc: mst@redhat.com
Cc: armbru@redhat.com
Cc: marcel@redhat.com
Signed-off-by: Mao Zhongyi <maozy.fnst@cn.fujitsu.com>
Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
In order to propagate error message better, convert shpc_init() to
Error also convert the pci_bridge_dev_initfn() to realize.
Cc: mst@redhat.com
Cc: marcel@redhat.com
Cc: armbru@redhat.com
Signed-off-by: Mao Zhongyi <maozy.fnst@cn.fujitsu.com>
Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Convert i82801b11, io3130_upstream, io3130_downstream and
pcie_root_port devices to realize.
Cc: mst@redhat.com
Cc: marcel@redhat.com
Cc: armbru@redhat.com
Signed-off-by: Mao Zhongyi <maozy.fnst@cn.fujitsu.com>
Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
After the patch 'Make errp the last parameter of pci_add_capability()',
pci_add_capability() and pci_add_capability2() now do exactly the same.
So drop the wrapper pci_add_capability() of pci_add_capability2(), then
replace the pci_add_capability2() with pci_add_capability() everywhere.
Cc: pbonzini@redhat.com
Cc: rth@twiddle.net
Cc: ehabkost@redhat.com
Cc: mst@redhat.com
Cc: dmitry@daynix.com
Cc: jasowang@redhat.com
Cc: marcel@redhat.com
Cc: alex.williamson@redhat.com
Cc: armbru@redhat.com
Suggested-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Mao Zhongyi <maozy.fnst@cn.fujitsu.com>
Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Comments for pci_add_capability2() to explain the return
value. This may help to make a correct return value check
for its callers.
Cc: mst@redhat.com
Cc: marcel@redhat.com
Cc: armbru@redhat.com
Signed-off-by: Mao Zhongyi <maozy.fnst@cn.fujitsu.com>
Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
On success, pci_add_capability2() returns a positive value. On
failure, it sets an error and return a negative value.
pci_add_capability() laboriously checks this behavior. No other
caller does. Drop the checks from pci_add_capability().
Cc: mst@redhat.com
Cc: marcel@redhat.com
Signed-off-by: Mao Zhongyi <maozy.fnst@cn.fujitsu.com>
Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The VT-d spec (section 6.5.2) prescribes software to zero the
Invalidation Queue Tail Register before enabling the VTD_GCMD_QIE
Global Command Register bit. Windows Server 2012 R2 and possibly
other older Windows versions violate the protocol and set a
non-zero queue tail first, which in effect makes them crash early
on boot with -device intel-iommu,intremap=on.
This commit relaxes the check and instead of failing to enable
VTD_GCMD_QIE with vtd_err_qi_enable, it behaves as if the tail
register was set just after enabling VTD_GCMD_QIE
(see vtd_handle_iqt_write).
Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
This way the bridge shows up in the correct section of the
"-device help" text.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>
This patch enables the virtio-net tx queue size to be configurable
between 256 (the default queue size) and 1024 by the user when the
vhost-user backend is used.
Currently, the maximum tx queue size for other backends is 512 due
to the following limitations:
- QEMU backend: the QEMU backend implementation in some cases may
send 1024+1 iovs to writev.
- Vhost_net backend: there are possibilities that the guest sends
a vring_desc of memory which crosses a MemoryRegion thereby
generating more than 1024 iovs after translation from guest-physical
address in the backend.
Signed-off-by: Wei Wang <wei.w.wang@intel.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Some code paths can lead to atomic accesses racing with memset()
on cpu->tb_jmp_cache, which can result in torn reads/writes
and is undefined behaviour in C11.
These torn accesses are unlikely to show up as bugs, but from code
inspection they seem possible. For example, tb_phys_invalidate does:
/* remove the TB from the hash list */
h = tb_jmp_cache_hash_func(tb->pc);
CPU_FOREACH(cpu) {
if (atomic_read(&cpu->tb_jmp_cache[h]) == tb) {
atomic_set(&cpu->tb_jmp_cache[h], NULL);
}
}
Here atomic_set might race with a concurrent memset (such as the
ones scheduled via "unsafe" async work, e.g. tlb_flush_page) and
therefore we might end up with a torn pointer (or who knows what,
because we are under undefined behaviour).
This patch converts parallel accesses to cpu->tb_jmp_cache to use
atomic primitives, thereby bringing these accesses back to defined
behaviour. The price to pay is to potentially execute more instructions
when clearing cpu->tb_jmp_cache, but given how infrequently they happen
and the small size of the cache, the performance impact I have measured
is within noise range when booting debian-arm.
Note that under "safe async" work (e.g. do_tb_flush) we could use memset
because no other vcpus are running. However I'm keeping these accesses
atomic as well to keep things simple and to avoid confusing analysis
tools such as ThreadSanitizer.
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <1497486973-25845-1-git-send-email-cota@braap.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
We are relying on cpu_env being defined as a global, yet most
targets (i.e. all but arm/a64) have it defined as a local variable.
Luckily all of them use the same "cpu_env" name, but really
compilation shouldn't break if the name of that local variable
changed.
Fix it by using tcg_ctx.tcg_env, which all targets set in their
translate_init function. This change also helps paving the way
for the upcoming "translation loop common to all targets" work.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <1497639397-19453-3-git-send-email-cota@braap.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <1497639397-19453-2-git-send-email-cota@braap.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Version: GnuPG v2
iQEtBAABCAAXBQJZVlttEBxmYW16QHJlZGhhdC5jb20ACgkQyjViTGqRccaSJQf/
aKBxpeES6l4zYoa09+x7eJwjQXj6RdIpUNL5N4a/dhUsVJ2keWo6lPcjC/kcbwPR
TJ4zYplm+suVzbNZG4XJGXLryo6ODIaHhpa/Ctsf3i6vQkRipxManpTbqqnyjt/e
fAnwdFu0dFKbnqJECujDQgaZo1qWLuyZP++ZFt2kiZgOX/OdHpnQPH2U4l+22Cp6
LB2FAFv0TwDjxmzM6EuOjsuLr9Rq3ckx7CVoyCnZuWkcaqn3/2cXhdNDErUB1nl7
Pa/N3khz6PJs1Q8/H3GPH+BJHfORLRFR6dZ/eD8JU6qqBJfguvkSmi9cQUlbVsko
KEBqUmL+bKJaK257lkUkfg==
=wO2z
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/famz/tags/block-pull-request' into staging
# gpg: Signature made Fri 30 Jun 2017 15:08:45 BST
# gpg: using RSA key 0xCA35624C6A9171C6
# gpg: Good signature from "Fam Zheng <famz@redhat.com>"
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 5003 7CB7 9706 0F76 F021 AD56 CA35 624C 6A91 71C6
* remotes/famz/tags/block-pull-request:
block: Exploit BDRV_BLOCK_EOF for larger zero blocks
block: Add BDRV_BLOCK_EOF to bdrv_get_block_status()
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
When we have a BDS with unallocated clusters, but asking the status
of its underlying bs->file or backing layer encounters an end-of-file
condition, we know that the rest of the unallocated area will read as
zeroes. However, pre-patch, this required two separate calls to
bdrv_get_block_status(), as the first call stops at the point where
the underlying file ends. Thanks to BDRV_BLOCK_EOF, we can now widen
the results of the primary status if the secondary status already
includes BDRV_BLOCK_ZERO.
In turn, this fixes a TODO mentioned in iotest 154, where we can now
see that all sectors in a partial cluster at the end of a file read
as zero when coupling the shorter backing file's status along with our
knowledge that the remaining sectors came from an unallocated cluster.
Also, note that the loop in bdrv_co_get_block_status_above() had an
inefficent exit: in cases where the active layer sets BDRV_BLOCK_ZERO
but does NOT set BDRV_BLOCK_ALLOCATED (namely, where we know we read
zeroes merely because our unallocated clusters lie beyond the backing
file's shorter length), we still ended up probing the backing layer
even though we already had a good answer.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20170505021500.19315-3-eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Just as the block layer already sets BDRV_BLOCK_ALLOCATED as a
shortcut for subsequent operations, there are also some optimizations
that are made easier if we can quickly tell that *pnum will advance
us to the end of a file, via a new BDRV_BLOCK_EOF which gets set
by the block layer.
This just plumbs up the new bit; subsequent patches will make use
of it.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20170505021500.19315-2-eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>