Taking account of the new zone append write operation for zoned devices,
BLOCK_ACCT_ZONE_APPEND enum is introduced as other I/O request type (read,
write, flush).
Signed-off-by: Sam Li <faithilikerun@gmail.com>
Message-id: 20230508051916.178322-3-faithilikerun@gmail.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
This patch extends virtio-blk emulation to handle zoned device commands
by calling the new block layer APIs to perform zoned device I/O on
behalf of the guest. It supports Report Zone, four zone oparations (open,
close, finish, reset), and Append Zone.
The VIRTIO_BLK_F_ZONED feature bit will only be set if the host does
support zoned block devices. Regular block devices(conventional zones)
will not be set.
The guest os can use blktests, fio to test those commands on zoned devices.
Furthermore, using zonefs to test zone append write is also supported.
Signed-off-by: Sam Li <faithilikerun@gmail.com>
Message-id: 20230508051916.178322-2-faithilikerun@gmail.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
There is no need for the AioContext lock in aio_wait_bh_oneshot().
It's easy to remove the lock from existing callers and then switch from
AIO_WAIT_WHILE() to AIO_WAIT_WHILE_UNLOCKED() in aio_wait_bh_oneshot().
Document that the AioContext lock should not be held across
aio_wait_bh_oneshot(). Holding a lock across aio_poll() can cause
deadlock so we don't want callers to do that.
This is a step towards getting rid of the AioContext lock.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20230404153307.458883-1-stefanha@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This protects devices from bh->mmio reentrancy issues.
Thanks: Thomas Huth <thuth@redhat.com> for diagnosing OS X test failure.
Signed-off-by: Alexander Bulekov <alxndr@bu.edu>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20230427211013.2994127-5-alxndr@bu.edu>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Bring the block files in line with the QEMU coding style, with spaces
for indentation. This patch partially resolves the issue 371.
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/371
Signed-off-by: Yeqi Fu <fufuyqqqqqq@gmail.com>
Message-Id: <20230314095001.13801-1-fufuyqqqqqq@gmail.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
This is phase 2, following on from the basic platform support which was
already merged.
• Add a simple single-tenant internal XenStore implementation
• Indirect Xen gnttab/evtchn/foreignmem/xenstore through operations table
• Provide emulated back ends for Xen operations
• Header cleanups to allow PV back ends to build without Xen itself
• Enable PV back ends in emulated mode
• Documentation update
Tested-by: Paul Durrant <paul@xen.org>
... on real Xen (master branch, 4.18) with a Debian guest.
-----BEGIN PGP SIGNATURE-----
iQJGBAABCgAwFiEEMUsIrNDeSBEzpfKGm+mA/QrAFUQFAmQHu3wSHGR3bXdAYW1h
em9uLmNvLnVrAAoJEJvpgP0KwBVE5LYP/0VodDsQdP7Z4L+/IzgBSgEec7qmyQFB
KlBZS/PmvCZKb0DHLI3GhXIyzD+/fnLtGSRl0rYObnKP7im+MpEDGmn97f6nIITk
AzkdsVhNEBQFXCkLgQ9y8kTrTmsod9O4sqn0+naa2TX4FPcRN0MaNmpuLEubvaRS
+JuyHmwy9ZeeAnsU31uJ0nx4F1hW9IDaatNoDeFcFnKCXQp36rtdZUViMowUJvwu
Q+Xyg6dybusznaoiXd485tTPrTt+FK/wEARse3q2gRh9QblLu0r5BFb0rOfhYCTQ
jw+5lBsOX+UlffmB9IDakRpVe4RKhvvRQSkRvYkPCshsqud9zMGhaquKg1vKBgca
I31XSN0LCcon/ahHGtmVAxyZUpWdEnfzO1TbTNpz9oacROklgVgEYdw5Vwca71VD
SURl6uCt9Jb9WmsR4twus4i4qDjQIDOtOF0hcxpl7HGktkxlGxUVI4qVLXARtVCS
OTB6N0LlhJ2woj2wYK5BRTiOj03T2MkJEWaYhDdIrQREKWe2Sn4xTOH5kGbQQnOr
km93odjBZFRHsAUnzXHXW3+yHjMefH7KrHePbmvsO4foGF77bBxosuC2ehFfvNJ0
VM/H04NDtPYCBwdAr545PSN/q+WzEPQaquLZ0UuTBuPpMMOYd+Ff8YvQWJPyCM18
1mq9v6Xe9RQZ
=JGLX
-----END PGP SIGNATURE-----
Merge tag 'xenfv-2' of git://git.infradead.org/users/dwmw2/qemu into staging
Enable PV backends with Xen/KVM emulation
This is phase 2, following on from the basic platform support which was
already merged.
• Add a simple single-tenant internal XenStore implementation
• Indirect Xen gnttab/evtchn/foreignmem/xenstore through operations table
• Provide emulated back ends for Xen operations
• Header cleanups to allow PV back ends to build without Xen itself
• Enable PV back ends in emulated mode
• Documentation update
Tested-by: Paul Durrant <paul@xen.org>
... on real Xen (master branch, 4.18) with a Debian guest.
# -----BEGIN PGP SIGNATURE-----
#
# iQJGBAABCgAwFiEEMUsIrNDeSBEzpfKGm+mA/QrAFUQFAmQHu3wSHGR3bXdAYW1h
# em9uLmNvLnVrAAoJEJvpgP0KwBVE5LYP/0VodDsQdP7Z4L+/IzgBSgEec7qmyQFB
# KlBZS/PmvCZKb0DHLI3GhXIyzD+/fnLtGSRl0rYObnKP7im+MpEDGmn97f6nIITk
# AzkdsVhNEBQFXCkLgQ9y8kTrTmsod9O4sqn0+naa2TX4FPcRN0MaNmpuLEubvaRS
# +JuyHmwy9ZeeAnsU31uJ0nx4F1hW9IDaatNoDeFcFnKCXQp36rtdZUViMowUJvwu
# Q+Xyg6dybusznaoiXd485tTPrTt+FK/wEARse3q2gRh9QblLu0r5BFb0rOfhYCTQ
# jw+5lBsOX+UlffmB9IDakRpVe4RKhvvRQSkRvYkPCshsqud9zMGhaquKg1vKBgca
# I31XSN0LCcon/ahHGtmVAxyZUpWdEnfzO1TbTNpz9oacROklgVgEYdw5Vwca71VD
# SURl6uCt9Jb9WmsR4twus4i4qDjQIDOtOF0hcxpl7HGktkxlGxUVI4qVLXARtVCS
# OTB6N0LlhJ2woj2wYK5BRTiOj03T2MkJEWaYhDdIrQREKWe2Sn4xTOH5kGbQQnOr
# km93odjBZFRHsAUnzXHXW3+yHjMefH7KrHePbmvsO4foGF77bBxosuC2ehFfvNJ0
# VM/H04NDtPYCBwdAr545PSN/q+WzEPQaquLZ0UuTBuPpMMOYd+Ff8YvQWJPyCM18
# 1mq9v6Xe9RQZ
# =JGLX
# -----END PGP SIGNATURE-----
# gpg: Signature made Tue 07 Mar 2023 22:32:28 GMT
# gpg: using RSA key 314B08ACD0DE481133A5F2869BE980FD0AC01544
# gpg: issuer "dwmw@amazon.co.uk"
# gpg: Good signature from "David Woodhouse <dwmw@amazon.co.uk>" [unknown]
# gpg: aka "David Woodhouse <dwmw@amazon.com>" [unknown]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 314B 08AC D0DE 4811 33A5 F286 9BE9 80FD 0AC0 1544
* tag 'xenfv-2' of git://git.infradead.org/users/dwmw2/qemu: (27 commits)
docs: Update Xen-on-KVM documentation for PV disk support
MAINTAINERS: Add entry for Xen on KVM emulation
i386/xen: Initialize Xen backends from pc_basic_device_init() for emulation
hw/xen: Implement soft reset for emulated gnttab
hw/xen: Map guest XENSTORE_PFN grant in emulated Xenstore
hw/xen: Add emulated implementation of XenStore operations
hw/xen: Add emulated implementation of grant table operations
hw/xen: Hook up emulated implementation for event channel operations
hw/xen: Only advertise ring-page-order for xen-block if gnttab supports it
hw/xen: Avoid crash when backend watch fires too early
hw/xen: Build PV backend drivers for CONFIG_XEN_BUS
hw/xen: Rename xen_common.h to xen_native.h
hw/xen: Use XEN_PAGE_SIZE in PV backend drivers
hw/xen: Move xenstore_store_pv_console_info to xen_console.c
hw/xen: Add xenstore operations to allow redirection to internal emulation
hw/xen: Add foreignmem operations to allow redirection to internal emulation
hw/xen: Pass grant ref to gnttab unmap operation
hw/xen: Add gnttab operations to allow redirection to internal emulation
hw/xen: Add evtchn operations to allow redirection to internal emulation
hw/xen: Create initial XenStore nodes
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Whem emulating Xen, multi-page grants are distinctly non-trivial and we
have elected not to support them for the time being. Don't advertise
them to the guest.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>
Now that we have the redirectable Xen backend operations we can build the
PV backends even without the Xen libraries.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>
This header is now only for native Xen code, not PV backends that may be
used in Xen emulation. Since the toolstack libraries may depend on the
specific version of Xen headers that they pull in (and will set the
__XEN_TOOLS__ macro to enable internal definitions that they depend on),
the rule is that xen_native.h (and thus the toolstack library headers)
must be included *before* any of the headers in include/hw/xen/interface.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>
XC_PAGE_SIZE comes from the actual Xen libraries, while XEN_PAGE_SIZE is
provided by QEMU itself in xen_backend_ops.h. For backends which may be
built for emulation mode, use the latter.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>
The previous commit introduced redirectable gnttab operations fairly
much like-for-like, with the exception of the extra arguments to the
->open() call which were always NULL/0 anyway.
This *changes* the arguments to the ->unmap() operation to include the
original ref# that was mapped. Under real Xen it isn't necessary; all we
need to do from QEMU is munmap(), then the kernel will release the grant,
and Xen does the tracking/refcounting for the guest.
When we have emulated grant tables though, we need to do all that for
ourselves. So let's have the back ends keep track of what they mapped
and pass it in to the ->unmap() method for us.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>
Commit a4b15a8b introduced a new function blk_pread_nonzeroes(). Instead
of reading directly from the root node of the BlockBackend, it reads
from its 'file' child node. This can happen to mostly work for raw
images (as long as the 'raw' format driver is in use, but not actually
doing anything), but it breaks everything else.
Fix it to read from the root node instead.
Fixes: a4b15a8b9e
Reported-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20230307140230.59158-1-kwolf@redhat.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Currently, when a block backend is attached to a m25p80 device and the
associated file size does not match the flash model, QEMU complains
with the error message "failed to read the initial flash content".
This is confusing for the user.
Instead, use helper blk_check_size_and_read_all() introduced by commit
06f1521795 ("pflash: Require backend size to match device, improve
errors").
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Peter Delevoryas <peter@pjd.dev>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-Id: <20221115151000.2080833-1-clg@kaod.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20230215161641.32663-4-philmd@linaro.org>
isa_get_dma() returns a DMA channel handler from an ISABus.
To emphasize this, rename it as isa_bus_get_dma().
Mechanical change using:
$ sed -i -e 's/isa_get_dma/isa_bus_get_dma/g' \
$(git grep -l isa_get_dma)
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20230215161641.32663-2-philmd@linaro.org>
virtio_blk_update_config() calls blk_get_geometry and blk_getlength,
and both functions eventually end up calling bdrv_poll_co when not
running in a coroutine:
- blk_getlength is a co_wrapper_mixed function
- blk_get_geometry calls bdrv_get_geometry -> bdrv_nb_sectors, a
co_wrapper_mixed function too
Since we are not running in a coroutine, we need to take s->blk
AioContext lock, otherwise bdrv_poll_co will inevitably call
AIO_WAIT_WHILE and therefore try to un unlock() an AioContext lock
that was never acquired.
RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=2167838
Steps to reproduce the issue: simply boot a VM with
-object '{"qom-type":"iothread","id":"iothread1"}' \
-blockdev '{"driver":"file","filename":"$QCOW2","aio":"native","node-name":"libvirt-1-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"}' \
-blockdev '{"node-name":"libvirt-1-format","read-only":false,"cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":"libvirt-1-storage"}' \
-device virtio-blk-pci,iothread=iothread1,drive=libvirt-1-format,id=virtio-disk0,bootindex=1,write-cache=on
and observe that it will fail not manage to boot with "qemu_mutex_unlock_impl: Operation not permitted"
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Lukáš Doktor <ldoktor@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20230208111148.1040083-1-eesposit@redhat.com>
Tracked down with the help of scripts/clean-includes.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-Id: <20230202133830.2152150-21-armbru@redhat.com>
Generated from hardware using the following command and then padding
with 0xff to fill out a power-of-2:
xxd -p /sys/bus/spi/devices/spi0.0/spi-nor/sfdp
Cc: Michael Walle <michael@walle.cc>
Cc: Tudor Ambarus <tudor.ambarus@linaro.org>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Joel Stanley <joel@jms.id.au>
Message-Id: <20221221122213.1458540-1-linux@roeck-us.net>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Currently we fill the VIRT_FLASH memory space with two 64MB NOR images
when using persistent UEFI variables on virt board. Actually we only use
a very small(non-zero) part of the memory while the rest significant
large(zero) part of memory is wasted.
So this patch checks the block status and only writes the non-zero part
into memory. This requires pflash devices to use sparse files for
backends.
Signed-off-by: Xiang Zheng <zhengxiang9@huawei.com>
[ kraxel: rebased to latest master ]
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-Id: <20221220084246.1984871-1-kraxel@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
virtio_blk_dma_restart_cb() is tricky because the BH must deal with
virtio_blk_data_plane_start()/virtio_blk_data_plane_stop() being called.
There are two issues with the code:
1. virtio_blk_realize() should use qdev_add_vm_change_state_handler()
instead of qemu_add_vm_change_state_handler(). This ensures the
ordering with virtio_init()'s vm change state handler that calls
virtio_blk_data_plane_start()/virtio_blk_data_plane_stop() is
well-defined. Then blk's AioContext is guaranteed to be up-to-date in
virtio_blk_dma_restart_cb() and it's no longer necessary to have a
special case for virtio_blk_data_plane_start().
2. Only blk_drain() waits for virtio_blk_dma_restart_cb()'s
blk_inc_in_flight() to be decremented. The bdrv_drain() family of
functions do not wait for BlockBackend's in_flight counter to reach
zero. virtio_blk_data_plane_stop() relies on blk_set_aio_context()'s
implicit drain, but that's a bdrv_drain() and not a blk_drain().
Note that virtio_blk_reset() already correctly relies on blk_drain().
If virtio_blk_data_plane_stop() switches to blk_drain() then we can
properly wait for pending virtio_blk_dma_restart_bh() calls.
Once these issues are taken care of the code becomes simpler. This
change is in preparation for multiple IOThreads in virtio-blk where we
need to clean up the multi-threading behavior.
I ran the reproducer from commit 49b44549ac ("virtio-blk: On restart,
process queued requests in the proper context") to check that there is
no regression.
Cc: Sergio Lopez <slp@redhat.com>
Cc: Kevin Wolf <kwolf@redhat.com>
Cc: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Message-id: 20221102182337.252202-1-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
We have two inclusion loops:
block/block.h
-> block/block-global-state.h
-> block/block-common.h
-> block/blockjob.h
-> block/block.h
block/block.h
-> block/block-io.h
-> block/block-common.h
-> block/blockjob.h
-> block/block.h
I believe these go back to Emanuele's reorganization of the block API,
merged a few months ago in commit d7e2fe4aac.
Fortunately, breaking them is merely a matter of deleting unnecessary
includes from headers, and adding them back in places where they are
now missing.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20221221133551.3967339-2-armbru@redhat.com>
The 'hwaddr' type is defined in "exec/hwaddr.h" as:
hwaddr is the type of a physical address
(its size can be different from 'target_ulong').
All definitions use the 'HWADDR_' prefix, except TARGET_FMT_plx:
$ fgrep define include/exec/hwaddr.h
#define HWADDR_H
#define HWADDR_BITS 64
#define HWADDR_MAX UINT64_MAX
#define TARGET_FMT_plx "%016" PRIx64
^^^^^^
#define HWADDR_PRId PRId64
#define HWADDR_PRIi PRIi64
#define HWADDR_PRIo PRIo64
#define HWADDR_PRIu PRIu64
#define HWADDR_PRIx PRIx64
#define HWADDR_PRIX PRIX64
Since hwaddr's size can be *different* from target_ulong, it is
very confusing to read one of its format using the 'TARGET_FMT_'
prefix, normally used for the target_long / target_ulong types:
$ fgrep TARGET_FMT_ include/exec/cpu-defs.h
#define TARGET_FMT_lx "%08x"
#define TARGET_FMT_ld "%d"
#define TARGET_FMT_lu "%u"
#define TARGET_FMT_lx "%016" PRIx64
#define TARGET_FMT_ld "%" PRId64
#define TARGET_FMT_lu "%" PRIu64
Apparently this format was missed during commit a8170e5e97
("Rename target_phys_addr_t to hwaddr"), so complete it by
doing a bulk-rename with:
$ sed -i -e s/TARGET_FMT_plx/HWADDR_FMT_plx/g $(git grep -l TARGET_FMT_plx)
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20230110212947.34557-1-philmd@linaro.org>
[thuth: Fix some warnings from checkpatch.pl along the way]
Signed-off-by: Thomas Huth <thuth@redhat.com>
..and use for both virtio-user-blk and virtio-user-gpio. This avoids
the circular close by deferring shutdown due to disconnection until a
later point. virtio-user-blk already had this mechanism in place so
generalise it as a vhost-user helper function and use for both blk and
gpio devices.
While we are at it we also fix up vhost-user-gpio to re-establish the
event handler after close down so we can reconnect later.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
Message-Id: <20221130112439.2527228-5-alex.bennee@linaro.org>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Commit 02b61f38d3 ("hw/virtio: incorporate backend features in features")
properly negotiates VHOST_USER_F_PROTOCOL_FEATURES with the vhost-user
backend, but we forgot to enable vrings as specified in
docs/interop/vhost-user.rst:
If ``VHOST_USER_F_PROTOCOL_FEATURES`` has not been negotiated, the
ring starts directly in the enabled state.
If ``VHOST_USER_F_PROTOCOL_FEATURES`` has been negotiated, the ring is
initialized in a disabled state and is enabled by
``VHOST_USER_SET_VRING_ENABLE`` with parameter 1.
Some vhost-user front-ends already did this by calling
vhost_ops.vhost_set_vring_enable() directly:
- backends/cryptodev-vhost.c
- hw/net/virtio-net.c
- hw/virtio/vhost-user-gpio.c
But most didn't do that, so we would leave the vrings disabled and some
backends would not work. We observed this issue with the rust version of
virtiofsd [1], which uses the event loop [2] provided by the
vhost-user-backend crate where requests are not processed if vring is
not enabled.
Let's fix this issue by enabling the vrings in vhost_dev_start() for
vhost-user front-ends that don't already do this directly. Same thing
also in vhost_dev_stop() where we disable vrings.
[1] https://gitlab.com/virtio-fs/virtiofsd
[2] https://github.com/rust-vmm/vhost/blob/240fc2966/crates/vhost-user-backend/src/event_loop.rs#L217
Fixes: 02b61f38d3 ("hw/virtio: incorporate backend features in features")
Reported-by: German Maglione <gmaglione@redhat.com>
Tested-by: German Maglione <gmaglione@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Acked-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
Message-Id: <20221123131630.52020-1-sgarzare@redhat.com>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20221130112439.2527228-3-alex.bennee@linaro.org>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Commit 69e1c14aa2 ("virtio: core: vq reset feature negotation support")
enabled VIRTIO_F_RING_RESET by default for all virtio devices.
This feature is not currently emulated by QEMU, so for vhost and
vhost-user devices we need to make sure it is supported by the offloaded
device emulation (in-kernel or in another process).
To do this we need to add VIRTIO_F_RING_RESET to the features bitmap
passed to vhost_get_features(). This way it will be masked if the device
does not support it.
This issue was initially discovered with vhost-vsock and vhost-user-vsock,
and then also tested with vhost-user-rng which confirmed the same issue.
They fail when sending features through VHOST_SET_FEATURES ioctl or
VHOST_USER_SET_FEATURES message, since VIRTIO_F_RING_RESET is negotiated
by the guest (Linux >= v6.0), but not supported by the device.
Fixes: 69e1c14aa2 ("virtio: core: vq reset feature negotation support")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1318
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Message-Id: <20221121101101.29400-1-sgarzare@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Commit 334c388f25 ("pflash_cfi: Error out if device length
isn't a power of two") aimed to finish the effort started by
commit 06f1521795 ("pflash: Require backend size to match device,
improve errors"), but unfortunately we are not quite there since
various machines are still ready to accept incomplete / oversized
pflash backend images, and now fail, i.e. on Debian bullseye:
$ qemu-system-x86_64 \
-drive \
if=pflash,format=raw,unit=0,readonly=on,file=/usr/share/OVMF/OVMF_CODE.fd
qemu-system-x86_64: Device size must be a power of two.
where OVMF_CODE.fd comes from the ovmf package, which doesn't
pad the firmware images to the flash size:
$ ls -lh /usr/share/OVMF/
-rw-r--r-- 1 root root 3.5M Aug 19 2021 OVMF_CODE_4M.fd
-rw-r--r-- 1 root root 1.9M Aug 19 2021 OVMF_CODE.fd
-rw-r--r-- 1 root root 128K Aug 19 2021 OVMF_VARS.fd
Since we entered the freeze period to prepare the v7.2.0 release,
the safest is to revert commit 334c388f25.
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1294
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20221108175755.95141-1-philmd@linaro.org>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20221108172633.860700-1-danielhb413@gmail.com>
The previous fix to virtio_device_started revealed a problem in its
use by both the core and the device code. The core code should be able
to handle the device "starting" while the VM isn't running to handle
the restoration of migration state. To solve this duel use introduce a
new helper for use by the vhost-user backends who all use it to feed a
should_start variable.
We can also pick up a change vhost_user_blk_set_status while we are at
it which follows the same pattern.
Fixes: 9f6bcfd99f (hw/virtio: move vm_running check to virtio_device_started)
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Message-Id: <20221107121407.1010913-1-alex.bennee@linaro.org>
This patch is part of adding vhost-user vhost_dev_start support. The
motivation is to improve backend configuration speed and reduce live
migration VM downtime.
Moving the device start routines after finishing all the necessary device
and VQ configuration, further aligning to the virtio specification for
"device initialization sequence".
Following patch will add vhost-user vhost_dev_start support.
Signed-off-by: Yajun Wu <yajunw@nvidia.com>
Acked-by: Parav Pandit <parav@nvidia.com>
Message-Id: <20221017064452.1226514-2-yajunw@nvidia.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
This queue has the second part of the ppc4xx_sdram cleanups, doorbell
instructions for POWER8, new pflash handling for the e500 machine and a
Radix MMU regression fix.
It also has a lot of performance optimizations in the PowerPC emulation
done by the researchers of the Eldorado institute. Between using gvec
for VMX/VSX instructions, a full rework of the interrupt model and PMU
optimizations, they managed to drastically speed up the emulation of
powernv8/9/10 machines. Here's an example with avocado tests:
- with master:
tests/avocado/boot_linux_console.py:BootLinuxConsole.test_ppc_powernv8:
PASS (38.89 s)
tests/avocado/boot_linux_console.py:BootLinuxConsole.test_ppc_powernv9:
PASS (43.89 s)
- with this queue applied:
tests/avocado/boot_linux_console.py:BootLinuxConsole.test_ppc_powernv8:
PASS (21.23 s)
tests/avocado/boot_linux_console.py:BootLinuxConsole.test_ppc_powernv9:
PASS (22.58 s)
Other ppc machines, like pseries, also had a noticeable performance
boost.
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQQX6/+ZI9AYAK8oOBk82cqW3gMxZAUCY10J/gAKCRA82cqW3gMx
ZAbjAPwKNbE1wE2POJbMALBQAM5MewwLMV/UKGjE6jA7HAbb/AEA9e3o11FoUmSJ
rZkmTvMzBQZ81mMGRlS0cnqbrr4ADgc=
=gnKY
-----END PGP SIGNATURE-----
Merge tag 'pull-ppc-20221029' of https://gitlab.com/danielhb/qemu into staging
ppc patch queue for 2022-10-29:
This queue has the second part of the ppc4xx_sdram cleanups, doorbell
instructions for POWER8, new pflash handling for the e500 machine and a
Radix MMU regression fix.
It also has a lot of performance optimizations in the PowerPC emulation
done by the researchers of the Eldorado institute. Between using gvec
for VMX/VSX instructions, a full rework of the interrupt model and PMU
optimizations, they managed to drastically speed up the emulation of
powernv8/9/10 machines. Here's an example with avocado tests:
- with master:
tests/avocado/boot_linux_console.py:BootLinuxConsole.test_ppc_powernv8:
PASS (38.89 s)
tests/avocado/boot_linux_console.py:BootLinuxConsole.test_ppc_powernv9:
PASS (43.89 s)
- with this queue applied:
tests/avocado/boot_linux_console.py:BootLinuxConsole.test_ppc_powernv8:
PASS (21.23 s)
tests/avocado/boot_linux_console.py:BootLinuxConsole.test_ppc_powernv9:
PASS (22.58 s)
Other ppc machines, like pseries, also had a noticeable performance
boost.
# -----BEGIN PGP SIGNATURE-----
#
# iHUEABYKAB0WIQQX6/+ZI9AYAK8oOBk82cqW3gMxZAUCY10J/gAKCRA82cqW3gMx
# ZAbjAPwKNbE1wE2POJbMALBQAM5MewwLMV/UKGjE6jA7HAbb/AEA9e3o11FoUmSJ
# rZkmTvMzBQZ81mMGRlS0cnqbrr4ADgc=
# =gnKY
# -----END PGP SIGNATURE-----
# gpg: Signature made Sat 29 Oct 2022 07:09:50 EDT
# gpg: using EDDSA key 17EBFF9923D01800AF2838193CD9CA96DE033164
# gpg: Good signature from "Daniel Henrique Barboza <danielhb413@gmail.com>" [unknown]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 17EB FF99 23D0 1800 AF28 3819 3CD9 CA96 DE03 3164
* tag 'pull-ppc-20221029' of https://gitlab.com/danielhb/qemu: (63 commits)
target/ppc: Fix regression in Radix MMU
hw/ppc/e500: Implement pflash handling
hw/sd/sdhci: Rename ESDHC_* defines to USDHC_*
hw/sd/sdhci-internal: Unexport ESDHC defines
hw/block/pflash_cfi0{1, 2}: Error out if device length isn't a power of two
docs/system/ppc/ppce500: Use qemu-system-ppc64 across the board(s)
target/ppc: Increment PMC5 with inline insns
target/ppc: Add new PMC HFLAGS
ppc4xx_sdram: Add errp parameter to ppc4xx_sdram_banks()
ppc4xx_sdram: Convert DDR SDRAM controller to new bank handling
ppc4xx_sdram: Generalise bank setup
ppc4xx_sdram: Rename local state variable for brevity
ppc4xx_sdram: Use hwaddr for memory bank size
ppc4xx_sdram: Move ppc4xx_sdram_banks() to ppc4xx_sdram.c
ppc4xx_devs.c: Move DDR SDRAM controller model to ppc4xx_sdram.c
ppc440_uc.c: Move DDR2 SDRAM controller model to ppc4xx_sdram.c
target/ppc: move the p*_interrupt_powersave methods to excp_helper.c
target/ppc: unify cpu->has_work based on cs->interrupt_request
target/ppc: introduce ppc_maybe_interrupt
target/ppc: remove ppc_store_lpcr from CONFIG_USER_ONLY builds
...
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
According to the JEDEC standard the device length is communicated to an
OS as an exponent (power of two).
Signed-off-by: Bernhard Beschow <shentey@gmail.com>
Reviewed-by: Bin Meng <bmeng.cn@gmail.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20221018210146.193159-3-shentey@gmail.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
If the os is not installed and doesn't have the virtio guest driver,
the vhost dev isn't started, so the dev->vdev is NULL.
Reproduce: mount a Win 2019 iso, go into the install ui, then resize
the virtio-blk device, qemu crash.
Signed-off-by: Li Feng <fengli@smartx.com>
Message-Id: <20220919121816.3252223-1-fengli@smartx.com>
Reviewed-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Register guest RAM using BlockRAMRegistrar and set the
BDRV_REQ_REGISTERED_BUF flag so block drivers can optimize memory
accesses in I/O requests.
This is for vdpa-blk, vhost-user-blk, and other I/O interfaces that rely
on DMA mapping/unmapping.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Message-id: 20221013185908.1297568-14-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Generated from hardware using the following command and then padding
with 0xff to fill out a power-of-2:
hexdump -v -e '8/1 "0x%02x, " "\n"' sfdp`
Signed-off-by: Patrick Williams <patrick@stwcx.xyz>
Reviewed-by: Francisco Iglesias <frasse.iglesias@gmail.com>
[ clg: removed extern ]
Message-Id: <20221006224424.3556372-1-patrick@stwcx.xyz>
Message-Id: <20221013161241.2805140-10-clg@kaod.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
The SFDP table size is 0x100 bytes long. The mandatory table for basic
features is available at byte 0x80 and two extra Winbond specifics
table are available at 0xC0 and 0xF0.
Reviewed-by: Francisco Iglesias <frasse.iglesias@gmail.com>
Message-Id: <20220722063602.128144-8-clg@kaod.org>
Message-Id: <20221013161241.2805140-9-clg@kaod.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
The SFDP table size is 0x100 bytes long. Only the mandatory table for
basic features is available at byte 0x80.
Reviewed-by: Francisco Iglesias <frasse.iglesias@gmail.com>
Message-Id: <20220722063602.128144-7-clg@kaod.org>
Message-Id: <20221013161241.2805140-8-clg@kaod.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
The SFDP table size is 0x200 bytes long. The mandatory table for basic
features is available at byte 0x30 plus some more Macronix specific
tables.
Reviewed-by: Francisco Iglesias <frasse.iglesias@gmail.com>
Message-Id: <20220722063602.128144-6-clg@kaod.org>
Message-Id: <20221013161241.2805140-7-clg@kaod.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
The mx25l25635e and mx25l25635f chips have the same JEDEC id but the
mx25l25635f has more capabilities reported in the SFDP table. Support
for 4B opcodes is of interest because it is exploited by the Linux
kernel.
The SFDP table size is 0x200 bytes long. The mandatory table for basic
features is available at byte 0x30 and an extra Macronix specific
table is available at 0x60.
Reviewed-by: Francisco Iglesias <frasse.iglesias@gmail.com>
Message-Id: <20220722063602.128144-5-clg@kaod.org>
Message-Id: <20221013161241.2805140-6-clg@kaod.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
The SFDP table is 0x80 bytes long. The mandatory table for basic
features is available at byte 0x30 and an extra Macronix specific
table is available at 0x60.
4B opcodes are not supported.
Reviewed-by: Francisco Iglesias <frasse.iglesias@gmail.com>
Message-Id: <20220722063602.128144-4-clg@kaod.org>
Message-Id: <20221013161241.2805140-5-clg@kaod.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
The same values were collected on 4 differents OpenPower systems,
palmettos, romulus and tacoma.
The SFDP table size is defined as being 0x100 bytes but it could be
bigger. Only the mandatory table for basic features is available at
byte 0x30.
Reviewed-by: Francisco Iglesias <frasse.iglesias@gmail.com>
Message-Id: <20220722063602.128144-3-clg@kaod.org>
Message-Id: <20221013161241.2805140-3-clg@kaod.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
JEDEC STANDARD JESD216 for Serial Flash Discovery Parameters (SFDP)
provides a mean to describe the features of a serial flash device
using a set of internal parameter tables.
This is the initial framework for the RDSFDP command giving access to
a private SFDP area under the flash. This area now needs to be
populated with the flash device characteristics, using a new
'sfdp_read' handler under FlashPartInfo.
Reviewed-by: Francisco Iglesias <frasse.iglesias@gmail.com>
Message-Id: <20220722063602.128144-2-clg@kaod.org>
Message-Id: <20221013161241.2805140-2-clg@kaod.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Make vhost-user-blk backwards compatible when migrating from older VMs
running with modern features turned off, the same way it was done for
virtio-blk in 20764be042 ("virtio-blk: set config size depending on the features enabled")
It's currently impossible to migrate from an older VM with
vhost-user-blk (with disable-legacy=off) because of errors like this:
qemu-system-x86_64: get_pci_config_device: Bad config data: i=0x10 read: 41 device: 1 cmask: ff wmask: 80 w1cmask:0
qemu-system-x86_64: Failed to load PCIDevice:config
qemu-system-x86_64: Failed to load virtio-blk:virtio
qemu-system-x86_64: error while loading state for instance 0x0 of device '0000:00:05.0:00.0:02.0/virtio-blk'
qemu-system-x86_64: load of migration failed: Invalid argument
This is caused by the newer (destination) VM requiring a bigger BAR0
alignment because it has to cover a bigger configuration space, which
isn't actually needed since those additional config fields are not
active (write-zeroes/discard).
Signed-off-by: Daniil Tatianin <d-tatianin@yandex-team.ru>
Reviewed-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
Message-Id: <20220906073111.353245-6-d-tatianin@yandex-team.ru>
Reviewed-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
No reason to have this be a separate field. This also makes it more akin
to what the virtio-blk device does.
Signed-off-by: Daniil Tatianin <d-tatianin@yandex-team.ru>
Reviewed-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
Message-Id: <20220906073111.353245-5-d-tatianin@yandex-team.ru>
Reviewed-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
It is useful to have the ability to disable these features for
compatibility with older VMs that don't have these implemented.
Signed-off-by: Daniil Tatianin <d-tatianin@yandex-team.ru>
Reviewed-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
Message-Id: <20220906073111.353245-4-d-tatianin@yandex-team.ru>
Reviewed-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
This way we can reuse it for other virtio-blk devices, e.g
vhost-user-blk, which currently does not control its config space size
dynamically.
Signed-off-by: Daniil Tatianin <d-tatianin@yandex-team.ru>
Reviewed-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
Message-Id: <20220906073111.353245-3-d-tatianin@yandex-team.ru>
Reviewed-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
This is the first step towards moving all device config size calculation
logic into the virtio core code. In particular, this adds a struct that
contains all the necessary information for common virtio code to be able
to calculate the final config size for a device. This is expected to be
used with the new virtio_get_config_size helper, which calculates the
final length based on the provided host features.
This builds on top of already existing code like VirtIOFeature and
virtio_feature_get_config_size(), but adds additional fields, as well as
sanity checking so that device-specifc code doesn't have to duplicate it.
An example usage would be:
static const VirtIOFeature dev_features[] = {
{.flags = 1ULL << FEATURE_1_BIT,
.end = endof(struct virtio_dev_config, feature_1)},
{.flags = 1ULL << FEATURE_2_BIT,
.end = endof(struct virtio_dev_config, feature_2)},
{}
};
static const VirtIOConfigSizeParams dev_cfg_size_params = {
.min_size = DEV_BASE_CONFIG_SIZE,
.max_size = sizeof(struct virtio_dev_config),
.feature_sizes = dev_features
};
// code inside my_dev_device_realize()
size_t config_size = virtio_get_config_size(&dev_cfg_size_params,
host_features);
virtio_init(vdev, VIRTIO_ID_MYDEV, config_size);
Currently every device is expected to write its own boilerplate from the
example above in device_realize(), however, the next step of this
transition is moving VirtIOConfigSizeParams into VirtioDeviceClass,
so that it can be done automatically by the virtio initialization code.
All of the users of virtio_feature_get_config_size have been converted
to use virtio_get_config_size so it's no longer needed and is removed
with this commit.
Signed-off-by: Daniil Tatianin <d-tatianin@yandex-team.ru>
Message-Id: <20220906073111.353245-2-d-tatianin@yandex-team.ru>
Reviewed-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The `started` field is manipulated internally within the vhost code
except for one place, vhost-user-blk via f5b22d06fb (vhost: recheck
dev state in the vhost_migration_log routine). Mark that as a FIXME
because it introduces a potential race. I think the referenced fix
should be tracking its state locally.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20220802095010.3330793-12-alex.bennee@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Raphael Norwitz <raphael.norwittz@nutanix.com>