Some tests require compat=1.1 and thus set IMGOPTS='compat=1.1'
globally. That is not how it should be done; instead, they should
simply set _unsupported_imgopts to compat=0.10 (compat=1.1 is the
default anyway).
This makes the tests heed user-specified $IMGOPTS. Some do not work
with all image options, though, so we need to disable them accordingly.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Maxim Levitsky <mlevitsky@redhat.com>
Message-id: 20191107163708.833192-7-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
This test can run just fine with other values for refcount_bits, so we
should filter the value from qcow2.py's dump-header. In fact, we can
filter everything but the feature bits and header extensions, because
that is what the test is about.
(036 currently ignores user-specified image options, but that will be
fixed in the next patch.)
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-id: 20191107163708.833192-6-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
Print the feature fields as a set of bits so that filtering is easier.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-id: 20191107163708.833192-4-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
This is useful for tests that want to whitelist fields from dump-header
(with grep) but still print all header extensions.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-id: 20191107163708.833192-3-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
Probably due to blind copy-pasting, we have several instances of "qocw2"
in our iotests. Fix them.
Reported-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20191107163708.833192-2-mreitz@redhat.com
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
qcow2_can_store_new_dirty_bitmap works wrong, as it considers only
bitmaps already stored in the qcow2 image and ignores persistent
BdrvDirtyBitmap objects.
So, let's instead count persistent BdrvDirtyBitmaps. We load all qcow2
bitmaps on open, so there should not be any bitmap in the image for
which we don't have BdrvDirtyBitmaps version. If it is - it's a kind of
corruption, and no reason to check for corruptions here (open() and
close() are better places for it).
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20191014115126.15360-2-vsementsov@virtuozzo.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Max Reitz <mreitz@redhat.com>
This avoid a memory leak when qom-set is called to set throttle_group
limits, here is an easy way to reproduce:
1. run qemu-iotests as follow and check the result with asan:
./check -qcow2 184
Following is the asan output backtrack:
Direct leak of 912 byte(s) in 3 object(s) allocated from:
#0 0xffff8d7ab3c3 in __interceptor_calloc (/lib64/libasan.so.4+0xd33c3)
#1 0xffff8d4c31cb in g_malloc0 (/lib64/libglib-2.0.so.0+0x571cb)
#2 0x190c857 in qobject_input_start_struct /mnt/sdc/qemu-master/qemu-4.2.0-rc0/qapi/qobject-input-visitor.c:295
#3 0x19070df in visit_start_struct /mnt/sdc/qemu-master/qemu-4.2.0-rc0/qapi/qapi-visit-core.c:49
#4 0x1948b87 in visit_type_ThrottleLimits qapi/qapi-visit-block-core.c:3759
#5 0x17e4aa3 in throttle_group_set_limits /mnt/sdc/qemu-master/qemu-4.2.0-rc0/block/throttle-groups.c:900
#6 0x1650eff in object_property_set /mnt/sdc/qemu-master/qemu-4.2.0-rc0/qom/object.c:1272
#7 0x1658517 in object_property_set_qobject /mnt/sdc/qemu-master/qemu-4.2.0-rc0/qom/qom-qobject.c:26
#8 0x15880bb in qmp_qom_set /mnt/sdc/qemu-master/qemu-4.2.0-rc0/qom/qom-qmp-cmds.c:74
#9 0x157e3e3 in qmp_marshal_qom_set qapi/qapi-commands-qom.c:154
Reported-by: Euler Robot <euler.robot@huawei.com>
Signed-off-by: PanNengyuan <pannengyuan@huawei.com>
Message-id: 1574835614-42028-1-git-send-email-pannengyuan@huawei.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Tested-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 20191108123455.39445-6-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
Callers can use this new parameter to expect failure during the
completion process.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20191108123455.39445-5-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
Sometimes it is useful to be able to add a node to the block graph that
takes or unshare a certain set of permissions for debugging purposes.
This patch adds this capability to blkdebug.
(Note that you cannot make blkdebug release or share permissions that it
needs to take or cannot share, because this might result in assertion
failures in the block layer. But if the blkdebug node has no parents,
it will not take any permissions and share everything by default, so you
can then freely choose what permissions to take and share.)
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20191108123455.39445-4-mreitz@redhat.com
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
We can save some LoC in xdbg_graph_add_edge() by using
bdrv_qapi_perm_to_blk_perm().
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20191108123455.39445-3-mreitz@redhat.com
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
We need some way to correlate QAPI BlockPermission values with
BLK_PERM_* flags. We could:
(1) have the same order in the QAPI definition as the the BLK_PERM_*
flags are in LSb-first order. However, then there is no guarantee
that they actually match (e.g. when someone modifies the QAPI schema
without thinking of the BLK_PERM_* definitions).
We could add static assertions, but these would break what’s good
about this solution, namely its simplicity.
(2) define the BLK_PERM_* flags based on the BlockPermission values.
But this way whenever someone were to modify the QAPI order
(perfectly sensible in theory), the BLK_PERM_* values would change.
Because these values are used for file locking, this might break
file locking between different qemu versions.
Therefore, go the slightly more cumbersome way: Add a function to
translate from the QAPI constants to the BLK_PERM_* flags.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20191108123455.39445-2-mreitz@redhat.com
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Tell the compiler to do a 32bit * 32bit -> 64bit multiplication
because period_ticks is a 64bit variable. The overflow occurs
for audio timer periods larger than 4294967us.
Fixes: be1092afa0 "audio: fix audio timer rate conversion bug"
Signed-off-by: Volker Rümelin <vr_qemu@t-online.de>
Message-id: 8893a235-66a8-8fbe-7d95-862e29da90b1@t-online.de
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Don't call pa_stream_peek before the recording stream is ready.
Information to reproduce the problem.
Start and stop Audacity in the guest several times because the
problem is racy.
libvirt log file:
-audiodev pa,id=audio0,server=localhost,out.latency=30000,
out.mixing-engine=off,in.mixing-engine=off \
-sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,
resourcecontrol=deny \
-msg timestamp=on
: Domain id=4 is tainted: custom-argv
char device redirected to /dev/pts/1 (label charserial0)
audio: Device pcspk: audiodev default parameter is deprecated,
please specify audiodev=audio0
audio: Device hda: audiodev default parameter is deprecated,
please specify audiodev=audio0
pulseaudio: pa_stream_peek failed
pulseaudio: Reason: Bad state
pulseaudio: pa_stream_peek failed
pulseaudio: Reason: Bad state
Signed-off-by: Volker Rümelin <vr_qemu@t-online.de>
Message-id: 20200104091122.13971-5-vr_qemu@t-online.de
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
There is no guarantee a single call to pa_stream_peek every
timer_period microseconds can read a recording stream faster
than the data gets produced at the source. Let qpa_read try to
drain the recording stream.
To reproduce the problem:
Start qemu with -audiodev pa,id=audio0,in.mixing-engine=off
On the host connect the qemu recording stream to the monitor of
a hardware output device. While the problem can also be seen
with a hardware input device, it's obvious with the monitor of
a hardware output device.
In the guest start audio recording with audacity and notice the
slow recording data rate.
Signed-off-by: Volker Rümelin <vr_qemu@t-online.de>
Message-id: 20200104091122.13971-4-vr_qemu@t-online.de
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Every call to pa_stream_peek which returns a data length > 0
should have a corresponding pa_stream_drop. A call to qpa_read
does not necessarily call pa_stream_drop immediately after a
call to pa_stream_peek. Test in qpa_fini_in if a last
pa_stream_drop is needed.
This prevents following messages in the libvirt log file after
a recording stream gets closed and a new one opened.
pulseaudio: pa_stream_drop failed
pulseaudio: Reason: Bad state
pulseaudio: pa_stream_drop failed
pulseaudio: Reason: Bad state
To reproduce start qemu with
-audiodev pa,id=audio0,in.mixing-engine=off
and in the guest start and stop Audacity several times.
Signed-off-by: Volker Rümelin <vr_qemu@t-online.de>
Message-id: 20200104091122.13971-3-vr_qemu@t-online.de
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Apply previous commit to hda_audio_input_cb for the same
reasons.
Signed-off-by: Volker Rümelin <vr_qemu@t-online.de>
Message-id: 20200104091122.13971-2-vr_qemu@t-online.de
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Since commit 1930616b98 "audio: make mixeng optional" the
function hda_audio_output_cb can no longer assume the function
parameter avail contains the free buffer size. With the playback
mixing-engine turned off this leads to a broken playback rate
control and playback buffer drops in regular intervals.
This patch moves down the rate calculation, so the correct
buffer fill level is used for the calculation.
Signed-off-by: Volker Rümelin <vr_qemu@t-online.de>
Message-id: 20200104091122.13971-1-vr_qemu@t-online.de
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
5.0 machine type uses 4.2 compats. This seems to be incorrect, since
the latests machine type by now is 5.0 and it should use its own
compat or shouldn't use any relying on the defaults.
Seems, like this appeared because of some problems on merge/rebase.
Signed-off-by: Denis Plotnikov <dplotnikov@virtuozzo.com>
Message-Id: <20191223072856.5369-1-dplotnikov@virtuozzo.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
If the vhost-user-scsi backend supports the VHOST_USER_F_RESET_DEVICE
protocol feature, then the device can be reset when requested.
If this feature is not supported, do not try a reset as this will send
a VHOST_USER_RESET_OWNER that the backend is not expecting,
potentially putting into an inoperable state.
Signed-off-by: David Vrabel <david.vrabel@nutanix.com>
Signed-off-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
Message-Id: <1572385083-5254-3-git-send-email-raphael.norwitz@nutanix.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Add a VHOST_USER_RESET_DEVICE message which will reset the vhost user
backend. Disabling all rings, and resetting all internal state, ready
for the backend to be reinitialized.
A backend has to report it supports this features with the
VHOST_USER_PROTOCOL_F_RESET_DEVICE protocol feature bit. If it does
so, the new message is used instead of sending a RESET_OWNER which has
had inconsistent implementations.
Signed-off-by: David Vrabel <david.vrabel@nutanix.com>
Signed-off-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
Message-Id: <1572385083-5254-2-git-send-email-raphael.norwitz@nutanix.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Both functions are called by MemoryRegionOps.[read/write] handlers
with unsigned 'size' argument. Both functions call
pci_host_config_[read/write]_common() which expect a uint32_t 'len'
parameter (also unsigned).
Since it is pointless (and confuse) to use a signed value, use a
unsigned type.
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20191216002134.18279-3-philmd@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
In commit 3bf4dfdd11 we introduced the pci_cfg_[read/write]
trace events in pci_host_config_[read/write]_common().
We have the following call trace:
pci_host_data_[read/write]()
- PCI_DPRINTF()
- pci_data_[read/write]()
- PCI_DPRINTF()
- pci_host_config_[read/write]_common()
trace_pci_cfg_[read/write]()
Since the PCI_DPRINTF() calls are redundant with the trace
events, remove them.
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20191216002134.18279-2-philmd@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
At the moment when the guest writes a status of 0, we only reset the
virtio core state but not the virtio-mmio state. The virtio-mmio
specification says (v1.1 cs01, 4.2.2.1 Device Requirements:
MMIO Device Register Layout):
Upon reset, the device MUST clear all bits in InterruptStatus and
ready bits in the QueueReady register for all queues in the device.
The core already takes care of InterruptStatus by clearing isr, but we
still need to clear QueueReady.
It would be tempting to clean all registers, but since the specification
doesn't say anything more, guests could rely on the registers keeping
their state across reset. Linux for example, relies on this for
GuestPageSize in the legacy MMIO tranport.
Fixes: 44e687a4d9 ("virtio-mmio: implement modern (v2) personality (virtio-1)")
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Message-Id: <20191213095410.1516119-1-jean-philippe@linaro.org>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
ACPI table HMAT has been introduced, QEMU now builds HMAT tables for
Heterogeneous Memory with boot option '-numa node'.
Add test cases on PC and Q35 machines with 2 numa nodes.
Because HMAT is generated when system enable numa, the
following tables need to be added for this test:
tests/data/acpi/pc/APIC.acpihmat
tests/data/acpi/pc/SRAT.acpihmat
tests/data/acpi/pc/HMAT.acpihmat
tests/data/acpi/pc/DSDT.acpihmat
tests/data/acpi/q35/APIC.acpihmat
tests/data/acpi/q35/SRAT.acpihmat
tests/data/acpi/q35/HMAT.acpihmat
tests/data/acpi/q35/DSDT.acpihmat
Acked-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Daniel Black <daniel@linux.ibm.com>
Reviewed-by: Jingqi Liu <Jingqi.liu@intel.com>
Suggested-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Tao Xu <tao3.xu@intel.com>
Message-Id: <20191213011929.2520-9-tao3.xu@intel.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Check configuring HMAT usecase
Acked-by: Markus Armbruster <armbru@redhat.com>
Suggested-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Tao Xu <tao3.xu@intel.com>
Message-Id: <20191213011929.2520-8-tao3.xu@intel.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
This structure describes memory side cache information for memory
proximity domains if the memory side cache is present and the
physical device forms the memory side cache.
The software could use this information to effectively place
the data in memory to maximize the performance of the system
memory that use the memory side cache.
Acked-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Daniel Black <daniel@linux.ibm.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Liu Jingqi <jingqi.liu@intel.com>
Signed-off-by: Tao Xu <tao3.xu@intel.com>
Message-Id: <20191213011929.2520-7-tao3.xu@intel.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
This structure describes the memory access latency and bandwidth
information from various memory access initiator proximity domains.
The latency and bandwidth numbers represented in this structure
correspond to rated latency and bandwidth for the platform.
The software could use this information as hint for optimization.
Acked-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Liu Jingqi <jingqi.liu@intel.com>
Signed-off-by: Tao Xu <tao3.xu@intel.com>
Message-Id: <20191213011929.2520-6-tao3.xu@intel.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
HMAT is defined in ACPI 6.3: 5.2.27 Heterogeneous Memory Attribute Table
(HMAT). The specification references below link:
http://www.uefi.org/sites/default/files/resources/ACPI_6_3_final_Jan30.pdf
It describes the memory attributes, such as memory side cache
attributes and bandwidth and latency details, related to the
Memory Proximity Domain. The software is
expected to use this information as hint for optimization.
This structure describes Memory Proximity Domain Attributes by memory
subsystem and its associativity with processor proximity domain as well as
hint for memory usage.
In the linux kernel, the codes in drivers/acpi/hmat/hmat.c parse and report
the platform's HMAT tables.
Acked-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Daniel Black <daniel@linux.ibm.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Liu Jingqi <jingqi.liu@intel.com>
Signed-off-by: Tao Xu <tao3.xu@intel.com>
Message-Id: <20191213011929.2520-5-tao3.xu@intel.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Add -numa hmat-cache option to provide Memory Side Cache Information.
These memory attributes help to build Memory Side Cache Information
Structure(s) in ACPI Heterogeneous Memory Attribute Table (HMAT).
Before using hmat-cache option, enable HMAT with -machine hmat=on.
Acked-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Liu Jingqi <jingqi.liu@intel.com>
Signed-off-by: Tao Xu <tao3.xu@intel.com>
Message-Id: <20191213011929.2520-4-tao3.xu@intel.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Add -numa hmat-lb option to provide System Locality Latency and
Bandwidth Information. These memory attributes help to build
System Locality Latency and Bandwidth Information Structure(s)
in ACPI Heterogeneous Memory Attribute Table (HMAT). Before using
hmat-lb option, enable HMAT with -machine hmat=on.
Acked-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Liu Jingqi <jingqi.liu@intel.com>
Signed-off-by: Tao Xu <tao3.xu@intel.com>
Message-Id: <20191213011929.2520-3-tao3.xu@intel.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
In ACPI 6.3 chapter 5.2.27 Heterogeneous Memory Attribute Table (HMAT),
The initiator represents processor which access to memory. And in 5.2.27.3
Memory Proximity Domain Attributes Structure, the attached initiator is
defined as where the memory controller responsible for a memory proximity
domain. With attached initiator information, the topology of heterogeneous
memory can be described. Add new machine property 'hmat' to enable all
HMAT specific options.
Extend CLI of "-numa node" option to indicate the initiator numa node-id.
In the linux kernel, the codes in drivers/acpi/hmat/hmat.c parse and report
the platform's HMAT tables. Before using initiator option, enable HMAT with
-machine hmat=on.
Acked-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Jingqi Liu <jingqi.liu@intel.com>
Suggested-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Tao Xu <tao3.xu@intel.com>
Message-Id: <20191213011929.2520-2-tao3.xu@intel.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Virtqueue notifications are not necessary during polling, so we disable
them. This allows the guest driver to avoid MMIO vmexits.
Unfortunately the virtio-blk and virtio-scsi handler functions re-enable
notifications, defeating this optimization.
Fix virtio-blk and virtio-scsi emulation so they leave notifications
disabled. The key thing to remember for correctness is that polling
always checks one last time after ending its loop, therefore it's safe
to lose the race when re-enabling notifications at the end of polling.
There is a measurable performance improvement of 5-10% with the null-co
block driver. Real-life storage configurations will see a smaller
improvement because the MMIO vmexit overhead contributes less to
latency.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20191209210957.65087-1-stefanha@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
This patch is to add standard commands defined in docs/interop/vhost-user.rst
For vhost-user-* program
Signed-off-by: Micky Yun Chan (michiboo) <chanmickyyun@gmail.com>
Message-Id: <20191209015331.5455-1-chanmickyyun@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Currently the SLOF firmware for pseries guests will disable/re-enable
a PCI device multiple times via IO/MEM/MASTER bits of PCI_COMMAND
register after the initial probe/feature negotiation, as it tends to
work with a single device at a time at various stages like probing
and running block/network bootloaders without doing a full reset
in-between.
In QEMU, when PCI_COMMAND_MASTER is disabled we disable the
corresponding IOMMU memory region, so DMA accesses (including to vring
fields like idx/flags) will no longer undergo the necessary
translation. Normally we wouldn't expect this to happen since it would
be misbehavior on the driver side to continue driving DMA requests.
However, in the case of pseries, with iommu_platform=on, we trigger the
following sequence when tearing down the virtio-blk dataplane ioeventfd
in response to the guest unsetting PCI_COMMAND_MASTER:
#2 0x0000555555922651 in virtqueue_map_desc (vdev=vdev@entry=0x555556dbcfb0, p_num_sg=p_num_sg@entry=0x7fffe657e1a8, addr=addr@entry=0x7fffe657e240, iov=iov@entry=0x7fffe6580240, max_num_sg=max_num_sg@entry=1024, is_write=is_write@entry=false, pa=0, sz=0)
at /home/mdroth/w/qemu.git/hw/virtio/virtio.c:757
#3 0x0000555555922a89 in virtqueue_pop (vq=vq@entry=0x555556dc8660, sz=sz@entry=184)
at /home/mdroth/w/qemu.git/hw/virtio/virtio.c:950
#4 0x00005555558d3eca in virtio_blk_get_request (vq=0x555556dc8660, s=0x555556dbcfb0)
at /home/mdroth/w/qemu.git/hw/block/virtio-blk.c:255
#5 0x00005555558d3eca in virtio_blk_handle_vq (s=0x555556dbcfb0, vq=0x555556dc8660)
at /home/mdroth/w/qemu.git/hw/block/virtio-blk.c:776
#6 0x000055555591dd66 in virtio_queue_notify_aio_vq (vq=vq@entry=0x555556dc8660)
at /home/mdroth/w/qemu.git/hw/virtio/virtio.c:1550
#7 0x000055555591ecef in virtio_queue_notify_aio_vq (vq=0x555556dc8660)
at /home/mdroth/w/qemu.git/hw/virtio/virtio.c:1546
#8 0x000055555591ecef in virtio_queue_host_notifier_aio_poll (opaque=0x555556dc86c8)
at /home/mdroth/w/qemu.git/hw/virtio/virtio.c:2527
#9 0x0000555555d02164 in run_poll_handlers_once (ctx=ctx@entry=0x55555688bfc0, timeout=timeout@entry=0x7fffe65844a8)
at /home/mdroth/w/qemu.git/util/aio-posix.c:520
#10 0x0000555555d02d1b in try_poll_mode (timeout=0x7fffe65844a8, ctx=0x55555688bfc0)
at /home/mdroth/w/qemu.git/util/aio-posix.c:607
#11 0x0000555555d02d1b in aio_poll (ctx=ctx@entry=0x55555688bfc0, blocking=blocking@entry=true)
at /home/mdroth/w/qemu.git/util/aio-posix.c:639
#12 0x0000555555d0004d in aio_wait_bh_oneshot (ctx=0x55555688bfc0, cb=cb@entry=0x5555558d5130 <virtio_blk_data_plane_stop_bh>, opaque=opaque@entry=0x555556de86f0)
at /home/mdroth/w/qemu.git/util/aio-wait.c:71
#13 0x00005555558d59bf in virtio_blk_data_plane_stop (vdev=<optimized out>)
at /home/mdroth/w/qemu.git/hw/block/dataplane/virtio-blk.c:288
#14 0x0000555555b906a1 in virtio_bus_stop_ioeventfd (bus=bus@entry=0x555556dbcf38)
at /home/mdroth/w/qemu.git/hw/virtio/virtio-bus.c:245
#15 0x0000555555b90dbb in virtio_bus_stop_ioeventfd (bus=bus@entry=0x555556dbcf38)
at /home/mdroth/w/qemu.git/hw/virtio/virtio-bus.c:237
#16 0x0000555555b92a8e in virtio_pci_stop_ioeventfd (proxy=0x555556db4e40)
at /home/mdroth/w/qemu.git/hw/virtio/virtio-pci.c:292
#17 0x0000555555b92a8e in virtio_write_config (pci_dev=0x555556db4e40, address=<optimized out>, val=1048832, len=<optimized out>)
at /home/mdroth/w/qemu.git/hw/virtio/virtio-pci.c:613
I.e. the calling code is only scheduling a one-shot BH for
virtio_blk_data_plane_stop_bh, but somehow we end up trying to process
an additional virtqueue entry before we get there. This is likely due
to the following check in virtio_queue_host_notifier_aio_poll:
static bool virtio_queue_host_notifier_aio_poll(void *opaque)
{
EventNotifier *n = opaque;
VirtQueue *vq = container_of(n, VirtQueue, host_notifier);
bool progress;
if (!vq->vring.desc || virtio_queue_empty(vq)) {
return false;
}
progress = virtio_queue_notify_aio_vq(vq);
namely the call to virtio_queue_empty(). In this case, since no new
requests have actually been issued, shadow_avail_idx == last_avail_idx,
so we actually try to access the vring via vring_avail_idx() to get
the latest non-shadowed idx:
int virtio_queue_empty(VirtQueue *vq)
{
bool empty;
...
if (vq->shadow_avail_idx != vq->last_avail_idx) {
return 0;
}
rcu_read_lock();
empty = vring_avail_idx(vq) == vq->last_avail_idx;
rcu_read_unlock();
return empty;
but since the IOMMU region has been disabled we get a bogus value (0
usually), which causes virtio_queue_empty() to falsely report that
there are entries to be processed, which causes errors such as:
"virtio: zero sized buffers are not allowed"
or
"virtio-blk missing headers"
and puts the device in an error state.
This patch works around the issue by introducing virtio_set_disabled(),
which sets a 'disabled' flag to bypass checks like virtio_queue_empty()
when bus-mastering is disabled. Since we'd check this flag at all the
same sites as vdev->broken, we replace those checks with an inline
function which checks for either vdev->broken or vdev->disabled.
The 'disabled' flag is only migrated when set, which should be fairly
rare, but to maintain migration compatibility we disable it's use for
older machine types. Users requiring the use of the flag in conjunction
with older machine types can set it explicitly as a virtio-device
option.
NOTES:
- This leaves some other oddities in play, like the fact that
DRIVER_OK also gets unset in response to bus-mastering being
disabled, but not restored (however the device seems to continue
working)
- Similarly, we disable the host notifier via
virtio_bus_stop_ioeventfd(), which seems to move the handling out
of virtio-blk dataplane and back into the main IO thread, and it
ends up staying there till a reset (but otherwise continues working
normally)
Cc: David Gibson <david@gibson.dropbear.id.au>,
Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Message-Id: <20191120005003.27035-1-mdroth@linux.vnet.ibm.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Some guests read back queue size after writing it.
Update the size immediatly upon write otherwise
they get confused.
In particular this is the case for seabios.
Reported-by: Roman Kagan <rkagan@virtuozzo.com>
Suggested-by: Denis Plotnikov <dplotnikov@virtuozzo.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Should directly read DMAR_RTADDR_REG but not using 's->root'.
Because 's->root' is modified in 'vtd_root_table_setup()' so
that the first 12 bits are omitted. This causes the guest
iommu debugfs cannot show pasid tables.
Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
Message-Id: <20191205095439.29114-1-yi.y.sun@linux.intel.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
ivqs/ovqs/c_ivq/c_ovq is forgot to cleanup in
virtio_serial_device_unrealize, the memory leak stack is as bellow:
Direct leak of 1290240 byte(s) in 180 object(s) allocated from:
#0 0x7fc9bfc27560 in calloc (/usr/lib64/libasan.so.3+0xc7560)
#1 0x7fc9bed6f015 in g_malloc0 (/usr/lib64/libglib-2.0.so.0+0x50015)
#2 0x5650e02b83e7 in virtio_add_queue hw/virtio/virtio.c:2327
#3 0x5650e02847b5 in virtio_serial_device_realize hw/char/virtio-serial-bus.c:1089
#4 0x5650e02b56a7 in virtio_device_realize hw/virtio/virtio.c:3504
#5 0x5650e03bf031 in device_set_realized hw/core/qdev.c:876
#6 0x5650e0531efd in property_set_bool qom/object.c:2080
#7 0x5650e053650e in object_property_set_qobject qom/qom-qobject.c:26
#8 0x5650e0533e14 in object_property_set_bool qom/object.c:1338
#9 0x5650e04c0e37 in virtio_pci_realize hw/virtio/virtio-pci.c:1801
Reported-by: Euler Robot <euler.robot@huawei.com>
Signed-off-by: Pan Nengyuan <pannengyuan@huawei.com>
Cc: Laurent Vivier <lvivier@redhat.com>
Cc: Amit Shah <amit@kernel.org>
Cc: "Marc-André Lureau" <marcandre.lureau@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <1575444716-17632-3-git-send-email-pannengyuan@huawei.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
ivq/dvq/svq/free_page_vq is forgot to cleanup in
virtio_balloon_device_unrealize, the memory leak stack is as follow:
Direct leak of 14336 byte(s) in 2 object(s) allocated from:
#0 0x7f99fd9d8560 in calloc (/usr/lib64/libasan.so.3+0xc7560)
#1 0x7f99fcb20015 in g_malloc0 (/usr/lib64/libglib-2.0.so.0+0x50015)
#2 0x557d90638437 in virtio_add_queue hw/virtio/virtio.c:2327
#3 0x557d9064401d in virtio_balloon_device_realize hw/virtio/virtio-balloon.c:793
#4 0x557d906356f7 in virtio_device_realize hw/virtio/virtio.c:3504
#5 0x557d9073f081 in device_set_realized hw/core/qdev.c:876
#6 0x557d908b1f4d in property_set_bool qom/object.c:2080
#7 0x557d908b655e in object_property_set_qobject qom/qom-qobject.c:26
Reported-by: Euler Robot <euler.robot@huawei.com>
Signed-off-by: Pan Nengyuan <pannengyuan@huawei.com>
Message-Id: <1575444716-17632-2-git-send-email-pannengyuan@huawei.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Let's make sure calling this twice is harmless -
no known instances, but seems safer.
Suggested-by: Pan Nengyuan <pannengyuan@huawei.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Devices tend to maintain vq pointers, allow deleting them trough a vq pointer.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Don't attempt to remove /dev/fdset files.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>