A P2P bridge will attempt to handle the hotplug with SHPC, which doesn't
work in the PAPR environment. Instead we want to direct all PCI hotplug
actions to the PAPR specific host bridge which will use the PAPR hotplug
mechanism.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
DRC ids are more or less arbitrary, as long as they're consistent. For
PCI, we notionally build them from the phb's index along with PCI bus
number, slot and function number.
Using bus number is broken, however, because it can change if the guest
re-enumerates the PCI topology for whatever reason (e.g. due to hotplug
of a bridge, which we don't support yet but want to).
Fortunately, there's an alternative. Bridges are required to have a unique
non-zero "chassis number" that we can use instead. Adjust the code to
use that instead.
This looks like it would introduce a guest visible breaking change, but
in fact it does not because we don't yet ever use non-zero bus numbers.
Both chassis and bus number are always 0 for the root bus, so there's no
change for the existing cases.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
spapr_pci.c currently has several confusingly similarly named functions for
various conversions between representations of DRCs. Make things clearer
by renaming things in a more consistent XXX_from_YYY() manner and remove
some called-only-once variants in favour of open coding.
While we're at it, move this code together in the file to avoid some extra
forward references, and split out construction and removal of DRCs for the
host bridge into helper functions.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
This makes some minor cleanups to spapr_drc_populate_dt(), renaming it to
the shorter and more idiomatic spapr_dt_drc() along the way.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Device nodes for PCI bridges (both host and P2P) describe both the bridge
device itself and the bus hanging off it, handling of this is a bit of a
mess.
spapr_dt_pci_device() has a few things it only adds for non-bridges, but
always adds #address-cells and #size-cells which should only appear for
bridges. But the walking down the subordinate PCI bus is done in one of
its callers spapr_populate_pci_devices_dt(). The PHB dt creation in
spapr_populate_pci_dt() open codes some similar logic to the bridge case.
This patch consolidates things in a bunch of ways:
* Bus specific dt info is now created in spapr_dt_pci_bus() used for both
P2P bridges and the host bridge. This includes walking subordinate
devices
* spapr_dt_pci_device() now calls spapr_dt_pci_bus() when called on a
P2P bridge
* We do detection of bridges with the is_bridge field of the device class,
rather than checking PCI config space directly, for consistency with
qemu's core PCI code.
* Several things are renamed for brevity and clarity
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
spapr_create_pci_child_dt() is a trivial wrapper around
spapr_populate_pci_child_dt(), but is the latter's only caller. So fold
them together into spapr_dt_pci_device(), which closer matches our modern
naming convention.
While there, make a number of cleanups to the function itself. This is
mostly using more temporary locals to avoid awkwardly long lines, and in
some cases avoiding double reads of PCI config space variables.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
spapr_populate_pci_child_dt() adds a 'name' property to the device tree
node for PCI devices. This is never necessary for a flattened device tree,
it is implicit in the name added when the node is constructed. In fact
anything we do add to a 'name' property will be overwritten with something
derived from the structural name in the guest firmware (but in fact it is
exactly the same bytes).
So, remove that. In addition, pci_get_node_name() is very simple, so fold
it into its (also simple) sole caller spapr_create_pci_child_dt().
While we're there rename pci_find_device_name() to the shorter and more
accurate dt_name_from_class().
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Every PHB must have a unique index. This is checked at realize but when
a duplicate index is detected, an error message mentioning BUIDs is
printed. This doesn't help much, especially since BUID is an internal
concept that is no longer exposed to the user.
Fix the message to mention the index property instead of BUID. As a bonus
print a list of indexes already in use.
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <155915010892.2061314.10485622810149098411.stgit@bahia.lan>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Cleanup in the boilerplate that each target must define.
Replace xtensa_env_get_cpu with env_archcpu. The combination
CPU(xtensa_env_get_cpu) should have used ENV_GET_CPU to begin;
use env_cpu now.
Move cpu_get_tb_cpu_state below the include of "exec/cpu-all.h"
so that the definition of env_cpu is available.
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Cleanup in the boilerplate that each target must define.
Replace uc32_env_get_cpu with env_archcpu. The combination
CPU(uc32_env_get_cpu) should have used ENV_GET_CPU to begin;
use env_cpu now.
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Cleanup in the boilerplate that each target must define.
Replace sparc_env_get_cpu with env_archcpu. The combination
CPU(sparc_env_get_cpu) should have used ENV_GET_CPU to begin;
use env_cpu now.
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Cleanup in the boilerplate that each target must define.
Replace ppc_env_get_cpu with env_archcpu. The combination
CPU(ppc_env_get_cpu) should have used ENV_GET_CPU to begin;
use env_cpu now.
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Cleanup in the boilerplate that each target must define.
Replace nios2_env_get_cpu with env_archcpu. The combination
CPU(nios2_env_get_cpu) should have used ENV_GET_CPU to begin;
use env_cpu now.
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Cleanup in the boilerplate that each target must define.
Replace mips_env_get_cpu with env_archcpu. The combination
CPU(mips_env_get_cpu) should have used ENV_GET_CPU to begin;
use env_cpu now.
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Cleanup in the boilerplate that each target must define.
Replace x86_env_get_cpu with env_archcpu. The combination
CPU(x86_env_get_cpu) should have used ENV_GET_CPU to begin;
use env_cpu now.
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Now that we have both ArchCPU and CPUArchState, we can define
this generically instead of via macro in each target's cpu.h.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Acked-by: Alistair Francis <alistair.francis@wdc.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
There's no functional change but the flow is (hopefully)
more consistent for both file and folder object types.
Signed-off-by: Bandan Das <bsd@redhat.com>
Message-Id: <20190401211712.19012-4-bsd@redhat.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
We don't care about the other two missing base features:
- S390_FEAT_DFP_PACKED_CONVERSION
- S390_FEAT_GROUP_GEN13_PTFF
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
Fill the new QemuDmaBuf->modifier field properly from plane info.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Message-id: 20190529072144.26737-3-kraxel@redhat.com
stricter rules for acpi tables: we now fail
on any difference that isn't whitelisted.
vhost-scsi migration.
some cleanups all over the place
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQEcBAABAgAGBQJc+B4YAAoJECgfDbjSjVRpq1EIAJR7tCxcpu9GggVlinmUA8G4
tmSAe06IryH7+nF3RsnINuGu7ius9qC2/E2y0uJUHhTqiU/RWOfWZ7PPM0EcYZaA
TLPaCe2NUF6/8afeqmvE9Usk7VspI5TDZRms+bonmZz2xP1lHIMN0qW4s7HHLWr8
sZKDtCJ+9cYII93VQwtlR0qiHgv5f0kzcuZeJaZHsAHH6XZGqRuQjI6txcFa4o53
lkdLCEwTnRuwu2wyL84eL5p+E8SzOgR/x1QI+nffrJfsvnmiT7lnOrkjnQlWAp5G
xqwqsUrUxUCuQ+zitwJqmv+H6nx79MwAM7fTHAETCWX703N5o9tZxAnHHqLoa8I=
=cQNg
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging
virtio, pci, pc: cleanups, features
stricter rules for acpi tables: we now fail
on any difference that isn't whitelisted.
vhost-scsi migration.
some cleanups all over the place
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
# gpg: Signature made Wed 05 Jun 2019 20:55:04 BST
# gpg: using RSA key 281F0DB8D28D5469
# gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" [full]
# gpg: aka "Michael S. Tsirkin <mst@redhat.com>" [full]
# Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67
# Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469
* remotes/mst/tags/for_upstream:
bios-tables-test: ignore identical binaries
tests: acpi: add simple arm/virt testcase
tests: add expected ACPI tables for arm/virt board
bios-tables-test: list all tables that differ
vhost-scsi: Allow user to enable migration
vhost-scsi: Add VMState descriptor
vhost-scsi: The vhost backend should be stopped when the VM is not running
bios-tables-test: add diff allowed list
vhost: fix memory leak in vhost_user_scsi_realize
vhost: fix incorrect print type
vhost: remove the dead code
docs: smbios: remove family=x from type2 entry description
pci: Fold pci_get_bus_devfn() into its sole caller
pci: Make is_bridge a bool
pcie: Simplify pci_adjust_config_limit()
acpi: pci: use build_append_foo() API to construct MCFG
hw/acpi: Consolidate build_mcfg to pci.c
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Rather than looking inside the definition of a DeviceState with
"s->qdev", use the QOM prefered style: "DEVICE(s)".
This patch was generated using the following Coccinelle script:
// Use DEVICE() macros to access DeviceState.qdev
@use_device_macro_to_access_qdev@
expression obj;
identifier dev;
@@
-&obj->dev.qdev
+DEVICE(obj)
Suggested-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Message-Id: <20190528164020.32250-11-philmd@redhat.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Rather than looking inside the definition of a BusState with "s->bus.qbus",
use the QOM prefered style: "BUS(&s->bus)".
This patch was generated using the following Coccinelle script:
// Use BUS() macros to access BusState.qbus
@use_bus_macro_to_access_qbus@
expression obj;
identifier bus;
@@
-&obj->bus.qbus
+BUS(&obj->bus)
Suggested-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Message-Id: <20190528164020.32250-3-philmd@redhat.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Rather than looking inside the definition of a BusState with "s->bus.qbus",
use the QOM prefered style: "BUS(&s->bus)".
This patch was generated using the following Coccinelle script:
// Use BUS() macros to access BusState.qbus
@use_bus_macro_to_access_qbus@
expression obj;
identifier bus;
@@
-&obj->bus.qbus
+BUS(&obj->bus)
Suggested-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <20190528164020.32250-6-philmd@redhat.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Rather than looking inside the definition of a DeviceState with
"s->qdev", use the QOM prefered style: "DEVICE(s)".
This patch was generated using the following Coccinelle script
(with a bit of manual fix-up, removing an extra space to please
checkpatch.pl):
// Use DEVICE() macros to access DeviceState.qdev
@use_device_macro_to_access_qdev@
expression obj;
identifier dev;
@@
-&obj->dev.qdev
+DEVICE(obj)
Suggested-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>.
Message-Id: <20190528164020.32250-7-philmd@redhat.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Rather than looking inside the definition of a DeviceState with
"s->qdev", use the QOM prefered style: "DEVICE(s)".
This patch was generated using the following Coccinelle script:
// Use DEVICE() macros to access DeviceState.qdev
@use_device_macro_to_access_qdev@
expression obj;
identifier dev;
@@
-&obj->dev.qdev
+DEVICE(obj)
Suggested-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Message-Id: <20190528164020.32250-10-philmd@redhat.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Rather than looking inside the definition of a DeviceState with
"s->qdev", use the QOM prefered style: "DEVICE(s)".
This patch was generated using the following Coccinelle script:
// Use DEVICE() macros to access DeviceState.qdev
@use_device_macro_to_access_qdev@
expression obj;
identifier dev;
@@
-&obj->dev.qdev
+DEVICE(obj)
Suggested-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Gerd Hoffmann <kraxel@redhat.com>
Message-Id: <20190528164020.32250-9-philmd@redhat.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Rather than looking inside the definition of a DeviceState with
"s->qdev", use the QOM prefered style: "DEVICE(s)".
This patch was generated using the following Coccinelle script:
// Use DEVICE() macros to access DeviceState.qdev
@use_device_macro_to_access_qdev@
expression obj;
identifier dev;
@@
-&obj->dev.qdev
+DEVICE(obj)
Suggested-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Message-Id: <20190528164020.32250-8-philmd@redhat.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Rather than looking inside the definition of a BusState with "s->bus.qbus",
use the QOM prefered style: "BUS(&s->bus)".
This patch was generated using the following Coccinelle script:
// Use BUS() macros to access BusState.qbus
@use_bus_macro_to_access_qbus@
expression obj;
identifier bus;
@@
-&obj->bus.qbus
+BUS(&obj->bus)
Suggested-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Message-Id: <20190528164020.32250-5-philmd@redhat.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Rather than looking inside the definition of a BusState with "s->bus.qbus",
use the QOM prefered style: "BUS(&s->bus)".
This patch was generated using the following Coccinelle script:
// Use BUS() macros to access BusState.qbus
@use_bus_macro_to_access_qbus@
expression obj;
identifier bus;
@@
-&obj->bus.qbus
+BUS(&obj->bus)
Suggested-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Message-Id: <20190528164020.32250-4-philmd@redhat.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Since the BusState is accesible from the SCSIBus object,
it is pointless to use qbus_reset_all_fn.
Use qbus_reset_all() directly.
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Dmitry Fleytman <dmitry.fleytman@gmail.com>
Message-Id: <20190528164020.32250-2-philmd@redhat.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
fix incorrect print type in vhost_virtqueue_stop
Signed-off-by: Jie Wang <wangjie88@huawei.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <1556605773-42019-1-git-send-email-wangjie88@huawei.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
This is a trivial cleanup patch.
Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Reviewed-by: Kamal Heib <kheib@redhat.com>
Message-Id: <20190505105112.22691-1-yuval.shaia@oracle.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
This makes use of qdev_prop_drive_iothread for scsi-disk so that the
disk can be attached to a node that is already in the target AioContext.
We need to check that the HBA actually supports iothreads, otherwise
scsi-disk must make sure that the node is already in the main
AioContext.
This changes the error message for conflicting iothread settings.
Previously, virtio-scsi produced the error message, now it comes from
blk_set_aio_context(). Update a test case accordingly.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Some qdev block devices have support for iothreads and take care of the
AioContext they are running in, but most devices don't know about any of
this. For the latter category, the qdev drive property must make sure
that their BlockBackend is in the main AioContext.
Unfortunately, while the current code just does the same thing for
devices that do support iothreads, this is not correct and it would show
as soon as we actually try to keep a consistent AioContext assignment
across all nodes and users of a block graph subtree: If a node is
already in a non-default AioContext because of one of its users,
attaching a new device should still be possible if that device can work
in the same AioContext. Switching the node back to the main context
first and only then into the device AioContext causes failure (because
the existing user wouldn't allow the switch to the main context).
So devices that support iothreads need a different kind of drive
property that leaves the node in its current AioContext, but by using
this type, the device promises to check later that it can work with this
context.
This patch adds the qdev infrastructure that allows devices to signal
that they handle iothreads and qdev should leave the AioContext alone.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This adds a new parameter to blk_new() which requires its callers to
declare from which AioContext this BlockBackend is going to be used (or
the locks of which AioContext need to be taken anyway).
The given context is only stored and kept up to date when changing
AioContexts. Actually applying the stored AioContext to the root node
is saved for another commit.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Add an Error parameter to blk_set_aio_context() and use
bdrv_child_try_set_aio_context() internally to check whether all
involved nodes can actually support the AioContext switch.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kenneth Heitke <kenneth.heitke@intel.com>
Reviewed-by: Klaus Birkelund Jensen <klaus.jensen@cnexlabs.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Commit b2fc91db84 ("q35: set split kernel irqchip as default") changed
the default for the pc-q35-4.0 machine type to use split irqchip, which
turned out to have disasterous effects on vfio-pci INTx support. KVM
resampling irqfds are registered for handling these interrupts, but
these are non-functional in split irqchip mode. We can't simply test
for split irqchip in QEMU as userspace handling of this interrupt is a
significant performance regression versus KVM handling (GeForce GPUs
assigned to Windows VMs are non-functional without forcing MSI mode or
re-enabling kernel irqchip).
The resolution is to revert the change in default irqchip mode in the
pc-q35-4.1 machine and create a pc-q35-4.0.1 machine for the 4.0-stable
branch. The qemu-q35-4.0 machine type should not be used in vfio-pci
configurations for devices requiring legacy INTx support without
explicitly modifying the VM configuration to use kernel irqchip.
Link: https://bugs.launchpad.net/qemu/+bug/1826422
Fixes: b2fc91db84 ("q35: set split kernel irqchip as default")
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Message-Id: <155786484688.13873.6037015630912983760.stgit@gimli.home>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
There is no need to have a test device created by the board.
Instead, create it in the qtest so that we will be able to run
it on other boards too.
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The dma related variable dma.dst/src/cnt is dma_addr_t, it is
uint64_t in x64 platform. Change these usage from uint32_to
uint64_t to avoid trancation in edu_dma_timer.
Signed-off-by: Li Qiang <liq3ea@163.com>
Message-Id: <20190510164349.81507-4-liq3ea@163.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The edu spec says when address >= 0x80, the MMIO area can
be accessed by 64-bit.
Signed-off-by: Li Qiang <liq3ea@163.com>
Reviewed-by: Philippe Mathieu-Daude <philmd@redhat.com>
Message-Id: <20190510164349.81507-3-liq3ea@163.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The edu spec says the MMIO area can be accessed by 64-bit.
However currently the 'max_access_size' is not so the MMIO
access dispatch can only access 32-bit one time. This patch fixes
this to respect the spec.
Signed-off-by: Li Qiang <liq3ea@163.com>
Reviewed-by: Philippe Mathieu-Daude <philmd@redhat.com>
Message-Id: <20190510164349.81507-2-liq3ea@163.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
In order to perform a valid migration of a vhost-scsi device,
the following requirements must be met:
(1) The virtio-scsi device state needs to be saved & loaded.
(2) The vhost backend must be stopped before virtio-scsi device state
is saved:
(2.1) Sync vhost backend state to virtio-scsi device state.
(2.2) No further I/O requests are made by vhost backend to target
SCSI device.
(2.3) No further guest memory access takes place after VM is stopped.
(3) Requests in-flight to target SCSI device are completed before
migration handover.
(4) Target SCSI device state needs to be saved & loaded into the
destination host target SCSI device.
Previous commit ("vhost-scsi: Add VMState descriptor")
add support to save & load the device state using VMState.
This meets requirement (1).
When VM is stopped by migration thread (On Pre-Copy complete), the
following code path is executed:
migration_completion() -> vm_stop_force_state() -> vm_stop() ->
do_vm_stop().
do_vm_stop() calls first pause_all_vcpus() which pause all guest
vCPUs and then call vm_state_notify().
In case of vhost-scsi device, this will lead to the following code path
to be executed:
vm_state_notify() -> virtio_vmstate_change() ->
virtio_set_status() -> vhost_scsi_set_status() -> vhost_scsi_stop().
vhost_scsi_stop() then calls vhost_scsi_clear_endpoint() and
vhost_scsi_common_stop().
vhost_scsi_clear_endpoint() sends VHOST_SCSI_CLEAR_ENDPOINT ioctl to
vhost backend which will reach kernel's vhost_scsi_clear_endpoint()
which process all pending I/O requests and wait for them to complete
(vhost_scsi_flush()). This meets requirement (3).
vhost_scsi_common_stop() will stop the vhost backend.
As part of this stop, dirty-bitmap is synced and vhost backend state is
synced with virtio-scsi device state. As at this point guest vCPUs are
already paused, this meets requirement (2).
At this point we are left with requirement (4) which is target SCSI
device specific and therefore cannot be done by QEMU. Which is the main
reason why vhost-scsi adds a migration blocker.
However, as this can be handled either by an external orchestrator or
by using shared-storage (i.e. iSCSI), there is no reason to limit the
orchestrator from being able to explictly specify it wish to enable
migration even when VM have a vhost-scsi device.
Considering all the above, this commit allows orchestrator to explictly
specify that it is responsbile for taking care of requirement (4) and
therefore vhost-scsi should not add a migration blocker.
Reviewed-by: Nir Weiner <nir.weiner@oracle.com>
Reviewed-by: Bijan Mottahedeh <bijan.mottahedeh@oracle.com>
Signed-off-by: Liran Alon <liran.alon@oracle.com>
Message-Id: <20190416125912.44001-4-liran.alon@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
As preparation of enabling migration of vhost-scsi device,
define it’s VMState. Note, we keep the convention of
verifying in the pre_save() method that the vhost backend
must be stopped before attempting to save the device
state. Similar to how it is done for vhost-vsock.
Reviewed-by: Bijan Mottahedeh <bijan.mottahedeh@oracle.com>
Reviewed-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Nir Weiner <nir.weiner@oracle.com>
Message-Id: <20190416125912.44001-3-liran.alon@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
vhost-scsi doesn’t takes into account whether the VM is running or not in
order to decide if it should start/stop vhost backend.
This would lead to vhost backend still being active when VM's RunState
suddenly change to stopped.
An example of when this issue is encountered is when Live-Migration Pre-Copy
phase completes. As in this case, VM state will be changed to stopped (while
vhost backend is still active), which will result in
virtio_vmstate_change() -> virtio_set_status() -> vhost_scsi_set_status()
executed but vhost_scsi_set_status() will just return without stopping
vhost backend.
To handle this, change code to consider that vhost processing should be
stopped when VM is not running. Similar to how it is done in vhost-vsock
device at vhost_vsock_set_status().
Fixes: 5e9be92d77 ("vhost-scsi: new device supporting the tcm_vhost Linux kernel module”)
Reviewed-by: Bijan Mottahedeh <bijan.mottahedeh@oracle.com>
Reviewed-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Nir Weiner <nir.weiner@oracle.com>
Message-Id: <20190416125912.44001-2-liran.alon@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Next pull request against qemu-4.1. Highlights:
* KVM accelerated support for the XIVE interrupt controller in PAPR
guests
* A number of TCG vector fixes
* Fixes for the PReP / 40p machine
* Improvements to make check-tcg test coverage
Other than that it's just a bunch of assorted fixes, cleanups and
minor improvements.
This supersedes both the pull request dated 2019-05-21 and the one
dated 2019-05-22. I've dropped one hunk which I think may have caused
the check-tcg failure that Peter saw (by enabling the ppc64abi32
build, which I think has been broken for ages). I'm not entirely
certain, since I haven't reproduced exactly the same failure.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEdfRlhq5hpmzETofcbDjKyiDZs5IFAlzuK2AACgkQbDjKyiDZ
s5LFfxAAuvBI2d5gYDSDiniJPMrEzM8ANynf8fYMGSySRNpeKz5PnMhNQieoxaEt
pS9qJnnaCDrpV09jJo6QWStUaqFqnLPOYdWvRsnb7mx0yXe5eWUyYPp0TRAqKj8S
Ainv9ma8WfhVphsH3E01KR6evdC6BDC0F2afDToFGMKcDKXafmnSOEV9ZtFAzFXO
xqh/Az+Y2ATwDmt92uSq7JBS5YRUvhYQORoKslxnrJswKkN+Uwi5+a2FzOHk3Jwe
BlV6soEAVqb9ItFtgwcArclryCMMVxrqzs2VTWOYbhznFX0X1xUNeSQ8H+7F+IVy
Xu1e2fnwufvilvWSsjtYvdYnnCbNvwgWjYfZNMrQ2hmSDtCQnRKyVIYwiU08Qj2y
LmVlQzWN3WYHIRBTACLMDf5VHa9P01QZeJEoVIV6i4m4PCxbSmlzI62eRKNhW917
2d3h8dGIxSDm9/WpXefKMMrt2P7fAqkiz5ZUZIjkspcHaPPmk7qQp0ngFjeEuyFk
tJMd87hgemm9gg+mcF9XQ8yZGkR3oTq7nwDGwZHrp8S0GyRvNwhTbT2iKzAG2cxe
kfWRFswxn1zYPShqkcj3rwNsg8LnC3b22Og/obHYVjQ8ONx4ZB0q8xJSkUpvsQf5
HEUHLHtstBmrInFMf+2KbViUIpobmn4woojjNsqZ32W7OZv6Yk4=
=2q3B
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-4.1-20190529' into staging
ppc patch queue 2019-05-29
Next pull request against qemu-4.1. Highlights:
* KVM accelerated support for the XIVE interrupt controller in PAPR
guests
* A number of TCG vector fixes
* Fixes for the PReP / 40p machine
* Improvements to make check-tcg test coverage
Other than that it's just a bunch of assorted fixes, cleanups and
minor improvements.
This supersedes both the pull request dated 2019-05-21 and the one
dated 2019-05-22. I've dropped one hunk which I think may have caused
the check-tcg failure that Peter saw (by enabling the ppc64abi32
build, which I think has been broken for ages). I'm not entirely
certain, since I haven't reproduced exactly the same failure.
# gpg: Signature made Wed 29 May 2019 07:49:04 BST
# gpg: using RSA key 75F46586AE61A66CC44E87DC6C38CACA20D9B392
# gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>" [full]
# gpg: aka "David Gibson (Red Hat) <dgibson@redhat.com>" [full]
# gpg: aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>" [full]
# gpg: aka "David Gibson (kernel.org) <dwg@kernel.org>" [unknown]
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E 87DC 6C38 CACA 20D9 B392
* remotes/dgibson/tags/ppc-for-4.1-20190529: (44 commits)
ppc/pnv: add dummy XSCOM registers for PRD initialization
ppc/pnv: introduce new skiboot platform properties
spapr: Don't migrate the hpt_maxpagesize cap to older machine types
spapr: change default interrupt mode to 'dual'
spapr/xive: fix multiple resets when using the 'dual' interrupt mode
docs: provide documentation on the POWER9 XIVE interrupt controller
spapr/irq: add KVM support to the 'dual' machine
ppc/xics: fix irq priority in ics_set_irq_type()
spapr/irq: initialize the IRQ device only once
spapr/irq: introduce a spapr_irq_init_device() helper
spapr: check for the activation of the KVM IRQ device
spapr: introduce routines to delete the KVM IRQ device
sysbus: add a sysbus_mmio_unmap() helper
spapr/xive: activate KVM support
spapr/xive: add migration support for KVM
spapr/xive: introduce a VM state change handler
spapr/xive: add state synchronization with KVM
spapr/xive: add hcall support when under KVM
spapr/xive: add KVM support
spapr: Print out extra hints when CAS negotiation of interrupt mode fails
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>