- qcow2: Use threads for encrypted I/O
- qemu-img rebase: Optimizations
- backup job: Allow any source node, and some refactoring
- Some general simplifications in the block layer
-----BEGIN PGP SIGNATURE-----
iQFGBAABCAAwFiEEkb62CjDbPohX0Rgp9AfbAGHVz0AFAlzti4ASHG1yZWl0ekBy
ZWRoYXQuY29tAAoJEPQH2wBh1c9Av6YH/1mCu9KsLTGaFruJgnx4MXNUII0tsmwG
S+ZPiGTVLpJf1Na5wuFhOvGS1aVp1YKCzwXOP3TA2h/IJ/RByx+4fSc7GjJYuSJ4
YPoB9tTCFk/V7LehtWxmm5SlKkPEcTNPP2odnmjICL5PBBQr6w+3vthYH3BHLQtV
rLkNjswB9YBCciEESF6YZJyXKArIpLr7l95olatpDhGYagRA4/wcUdr2VUWboMRQ
9+8yY9CPElZcTmeQjqXqOszYSxKAWYSLfBnkdLMuE1XJRwhL91+JCezuw5WY/o9A
QgLl9p854/sEkN0hURfLSBzLljkGESdVw6WYa1DBuqxsUPEArzd0FRk=
=wut7
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/maxreitz/tags/pull-block-2019-05-28' into staging
Block patches:
- qcow2: Use threads for encrypted I/O
- qemu-img rebase: Optimizations
- backup job: Allow any source node, and some refactoring
- Some general simplifications in the block layer
# gpg: Signature made Tue 28 May 2019 20:26:56 BST
# gpg: using RSA key 91BEB60A30DB3E8857D11829F407DB0061D5CF40
# gpg: issuer "mreitz@redhat.com"
# gpg: Good signature from "Max Reitz <mreitz@redhat.com>" [full]
# Primary key fingerprint: 91BE B60A 30DB 3E88 57D1 1829 F407 DB00 61D5 CF40
* remotes/maxreitz/tags/pull-block-2019-05-28: (21 commits)
blockdev: loosen restrictions on drive-backup source node
qcow2-bitmap: initialize bitmap directory alignment
qcow2: skip writing zero buffers to empty COW areas
qemu-img: rebase: Reuse in-chain BlockDriverState
qemu-img: rebase: Reduce reads on in-chain rebase
qemu-img: rebase: Reuse parent BlockDriverState
block: Make bdrv_root_attach_child() unref child_bs on failure
block: Use bdrv_unref_child() for all children in bdrv_close()
block/backup: refactor: split out backup_calculate_cluster_size
block/backup: unify different modes code path
block/backup: refactor and tolerate unallocated cluster skipping
block/backup: move to copy_bitmap with granularity
block/backup: simplify backup_incremental_init_copy_bitmap
qcow2: do encryption in threads
qcow2: bdrv_co_pwritev: move encryption code out of the lock
qcow2: qcow2_co_preadv: improve locking
qcow2-threads: split out generic path
qcow2-threads: qcow2_co_do_compress: protect queuing by mutex
qcow2-threads: use thread_pool_submit_co
qcow2: add separate file for threaded data processing functions
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Expected table change is then handled like this:
1. add table to diff allowed list
2. change generating code (can be combined with 1)
3. maintainer runs a script to update expected +
blows away allowed diff list
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
fix memory leak in vhost_user_scsi_realize
Signed-off-by: Jie Wang <wangjie88@huawei.com>
Message-Id: <1556608500-12183-1-git-send-email-wangjie88@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
fix incorrect print type in vhost_virtqueue_stop
Signed-off-by: Jie Wang <wangjie88@huawei.com>
Message-Id: <1556605773-42019-1-git-send-email-wangjie88@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
remove the dead code
Signed-off-by: Jie Wang <wangjie88@huawei.com>
Message-Id: <1556604614-32081-1-git-send-email-wangjie88@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
'family' option is not part of type 2 table and if user tries to use it
as such QEMU will error out with an unknow option error.
Drop it from docs lest it confuse users.
Fixes: b155eb1d04 ("smbios: document cmdline options for smbios type 2-4, 17 structures")
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <1558448611-315074-1-git-send-email-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The only remaining caller of pci_get_bus_devfn() is pci_nic_init_nofail(),
itself an old compatibility function. Fold the two together to avoid
re-using the stale interface.
While we're there replace the explicit fprintf()s with error_report().
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Message-Id: <20190513061939.3464-6-david@gibson.dropbear.id.au>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
The is_bridge field in PCIDevice acts as a bool, but is declared as an int.
Declare it as a bool for clarity, and change everything that writes it to
use true/false instead of 0/1 to match.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Message-Id: <20190513061939.3464-5-david@gibson.dropbear.id.au>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Since c2077e2c "pci: Adjust PCI config limit based on bus topology",
pci_adjust_config_limit() has been used in the config space read and write
paths to only permit access to extended config space on buses which permit
it. Specifically it prevents access on devices below a vanilla-PCI bus via
some combination of bridges, even if both the host bridge and the device
itself are PCI-E.
It accomplishes this with a somewhat complex call up the chain of bridges
to see if any of them prohibit extended config space access. This is
overly complex, since we can always know if the bus will support such
access at the point it is constructed.
This patch simplifies the test by using a flag in the PCIBus instance
indicating whether extended configuration space is accessible. It is
false for vanilla PCI buses. For PCI-E buses, it is true for root
buses and equal to the parent bus's's capability otherwise.
For the special case of sPAPR's paravirtualized PCI root bus, which
acts mostly like vanilla PCI, but does allow extended config space
access, we override the default value of the flag from the host bridge
code.
This should cause no behavioural change.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Message-Id: <20190513061939.3464-4-david@gibson.dropbear.id.au>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
build_append_foo() API doesn't need explicit endianness conversions
which eliminates a source of errors and it makes build_mcfg() look like
declarative definition of MCFG table in ACPI spec, which makes it easy
to review.
Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
Suggested-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
v3:
* add some comment on the Configuration Space base address allocation
structure
v2:
* miss the reserved[8] of MCFG in last version, add it back
* drop SOBs and make sure bios-tables-test all OK
Message-Id: <20190521062836.6541-3-richardw.yang@linux.intel.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Now we have two identical build_mcfg functions.
Consolidate them in acpi/pci.c.
Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
v4:
* ACPI_PCI depends on both ACPI and PCI
* rebase on latest master, adjust arm Kconfig
v3:
* adjust changelog based on Igor's suggestion
Message-Id: <20190521062836.6541-2-richardw.yang@linux.intel.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
s/kbd/tablet/, fixes cut+paste bug.
Cc: qemu-stable@nongnu.org
Reported-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20190520081805.15019-1-kraxel@redhat.com
Add support for per port power switching.
Virtual power of course ;)
Use port-power=on property to enable this.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-id: 20190524070310.4952-6-kraxel@redhat.com
Helper function to update port status bits which depends on the
connected device. We need the same logic for device attach and
port reset, so factor it out.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20190524070310.4952-5-kraxel@redhat.com
Add usb_hub_port_set() and usb_hub_port_clear() helpers which care about
updating the change bits (port->wPortChange) properly, so we don't need
to have that logic sprinkled all over the place ;)
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-id: 20190524070310.4952-4-kraxel@redhat.com
Add num_ports property which allows configure the number of downstream
ports. Valid range is 1-8, default is 8.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-id: 20190524070310.4952-3-kraxel@redhat.com
Add dashes, so they don't look like two separate things when printed.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20190524070310.4952-2-kraxel@redhat.com
Seems some devices become confused when we call
libusb_set_configuration(). So before calling the function check
whenever the device has multiple configurations in the first place, and
in case it hasn't (which is the case for the majority of devices) simply
skip the call as it will have no effect anyway.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-id: 20190522094702.17619-4-kraxel@redhat.com
If the guest didn't talk to the device yet, skip the reset.
Without this usb-host devices get resetted a number of times
at boot time for no good reason.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-id: 20190522094702.17619-3-kraxel@redhat.com
That way the device reset handler can see what
the before-reset state of the device is.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-id: 20190522094702.17619-2-kraxel@redhat.com
Add new virtio-gpu devices with a "vhost-user" property. The
associated vhost-user backend is used to handle the virtio rings and
provide rendering results thanks to the vhost-user-gpu protocol.
Example usage:
-object vhost-user-backend,id=vug,cmd="./vhost-user-gpu"
-device vhost-user-vga,vhost-user=vug
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-id: 20190524130946.31736-10-marcandre.lureau@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Add base classes that are common to vhost-user-gpu-pci and
vhost-user-vga.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-id: 20190524130946.31736-9-marcandre.lureau@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Add a base class that is common to virtio-gpu and vhost-user-gpu
devices.
The VirtIOGPUBase base class provides common functionalities necessary
for both virtio-gpu and vhost-user-gpu:
- common configuration (max-outputs, initial resolution, flags)
- virtio device initialization, including queue setup
- device pre-conditions checks (iommu)
- migration blocker
- virtio device callbacks
- hooking up to qemu display subsystem
- a few common helper functions to reset the device, retrieve display
informations
- a class callback to unblock the rendering (for GL updates)
What is left to the virtio-gpu subdevice to take care of, in short,
are all the virtio queues handling, command processing and migration.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-id: 20190524130946.31736-8-marcandre.lureau@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Add a vhost-user gpu backend, based on virtio-gpu/3d device. It is
associated with a vhost-user-gpu device.
Various TODO and nice to have items:
- multi-head support
- crash & resume handling
- accelerated rendering/display that avoids the waiting round trips
- edid support
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-id: 20190524130946.31736-6-marcandre.lureau@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
OpenGL isn't required to use DRM rendernodes. The following patches
uses it for 2d resources for ex.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-id: 20190524130946.31736-5-marcandre.lureau@redhat.com
[ kraxel s/LINUX/POSIX/ (fixes openbsd build failure) ]
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
This will allow to share the format conversion function with
vhost-user-gpu.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-id: 20190524130946.31736-4-marcandre.lureau@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
The helper functions are useful to build the vhost-user-gpu backend.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-id: 20190524130946.31736-3-marcandre.lureau@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Add a new vhost-user message to give a unix socket to a vhost-user
backend for GPU display updates.
Back when I started that work, I added a new GPU channel because the
vhost-user protocol wasn't bidirectional. Since then, there is a
vhost-user-slave channel for the slave to send requests to the master.
We could extend it with GPU messages. However, the GPU protocol is
quite orthogonal to vhost-user, thus I chose to have a new dedicated
channel.
See vhost-user-gpu.rst for the protocol details.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-id: 20190524130946.31736-2-marcandre.lureau@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
PRD (Processor recovery diagnostics) is a service available on
OpenPower systems. The opal-prd daemon initializes the PowerPC
Processor through the XSCOM bus and then waits for hardware diagnostic
events.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190527071722.31424-1-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Newer skiboots (after 6.3) support QEMU platforms that have
characteristics closer to real OpenPOWER systems. The CPU type is used
to define the BMC drivers: Aspeed AST2400 for POWER8 processors and
AST2500 for POWER9s.
Advertise the new platform property names, "qemu,powernv8" and
"qemu,powernv9", using the CPU type chosen for the QEMU PowerNV
machine. Also, advertise the original platform name "qemu,powernv" in
case of POWER8 processors for compatibility with older skiboots.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190527071749.31499-1-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Commit 0b8c89be7f7b added the hpt_maxpagesize capability to the migration
stream. This is okay for new machine types but it breaks backward migration
to older QEMUs, which don't expect the extra subsection.
Add a compatibility boolean flag to the sPAPR machine class and use it to
skip migration of the capability for machine types 4.0 and older. This
fixes migration to an older QEMU. Note that the destination will emit a
warning:
qemu-system-ppc64: warning: cap-hpt-max-page-size lower level (16) in incoming stream than on destination (24)
This is expected and harmless though. It is okay to migrate from a lower
HPT maximum page size (64k) to a greater one (16M).
Fixes: 0b8c89be7f7b "spapr: Add forgotten capability to migration stream"
Based-on: <20190522074016.10521-3-clg@kaod.org>
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <155853262675.1158324.17301777846476373459.stgit@bahia.lan>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Now that XIVE support is complete (QEMU emulated and KVM devices),
change the pseries machine to advertise both interrupt modes: XICS
(P7/P8) and XIVE (P9).
The machine default interrupt modes depends on the version. Current
settings are:
pseries default interrupt mode
4.1 dual
4.0 xics
3.1 xics
3.0 legacy xics (different IRQ number space layout)
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190522074016.10521-3-clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Today, when a reset occurs on a pseries machine using the 'dual'
interrupt mode, the KVM devices are released and recreated depending
on the interrupt mode selected by CAS. If XIVE is selected, the SysBus
memory regions of the SpaprXive model are initialized by the KVM
backend initialization routine each time a reset occurs. This leads to
a crash after a couple of resets because the machine reaches the
QDEV_MAX_MMIO limit of SysBusDevice :
qemu-system-ppc64: hw/core/sysbus.c:193: sysbus_init_mmio: Assertion `dev->num_mmio < QDEV_MAX_MMIO' failed.
To fix, initialize the SysBus memory regions in spapr_xive_realize()
called only once and remove the same inits from the QEMU and KVM
backend initialization routines which are called at each reset.
Reported-by: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190522074016.10521-2-clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This documents the overall XIVE architecture and the XIVE support for
sPAPR guest machines (pseries).
It also provides documentation on the 'info pic' command.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190521082411.24719-1-clg@kaod.org>
Reviewed-by: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The interrupt mode is chosen by the CAS negotiation process and
activated after a reset to take into account the required changes in
the machine. This brings new constraints on how the associated KVM IRQ
device is initialized.
Currently, each model takes care of the initialization of the KVM
device in their realize method but this is not possible anymore as the
initialization needs to be done globaly when the interrupt mode is
known, i.e. when machine is reseted. It also means that we need a way
to delete a KVM device when another mode is chosen.
Also, to support migration, the QEMU objects holding the state to
transfer should always be available but not necessarily activated.
The overall approach of this proposal is to initialize both interrupt
mode at the QEMU level to keep the IRQ number space in sync and to
allow switching from one mode to another. For the KVM side of things,
the whole initialization of the KVM device, sources and presenters, is
grouped in a single routine. The XICS and XIVE sPAPR IRQ reset
handlers are modified accordingly to handle the init and the delete
sequences of the KVM device.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Message-Id: <20190513084245.25755-15-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Recent commits changed the behavior of ics_set_irq_type() to
initialize correctly LSIs at the KVM level. ics_set_irq_type() is also
called by the realize routine of the different devices of the machine
when initial interrupts are claimed, before the ICSState device is
reseted.
In the case, the ICSIRQState priority is 0x0 and the call to
ics_set_irq_type() results in configuring the target of the
interrupt. On P9, when using the KVM XICS-on-XIVE device, the target
is configured to be server 0, priority 0 and the event queue 0 is
created automatically by KVM.
With the dual interrupt mode creating the KVM device at reset, it
leads to unexpected effects on the guest, mostly blocking IPIs. This
is wrong, fix it by reseting the ICSIRQState structure when
ics_set_irq_type() is called.
Fixes: commit 6cead90c5c ("xics: Write source state to KVM at claim time")
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190513084245.25755-14-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Add a check to make sure that the routine initializing the emulated
IRQ device is called once. We don't have much to test on the XICS
side, so we introduce a 'init' boolean under ICSState.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190513084245.25755-13-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The way the XICS and the XIVE devices are initialized follows the same
pattern. First, try to connect to the KVM device and if not possible
fallback on the emulated device, unless a kernel_irqchip is required.
The spapr_irq_init_device() routine implements this sequence in
generic way using new sPAPR IRQ handlers ->init_emu() and ->init_kvm().
The XIVE init sequence is moved under the associated sPAPR IRQ
->init() handler. This will change again when KVM support is added for
the dual interrupt mode.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Message-Id: <20190513084245.25755-12-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The activation of the KVM IRQ device depends on the interrupt mode
chosen at CAS time by the machine and some methods used at reset or by
the migration need to be protected.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190513084245.25755-11-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
If a new interrupt mode is chosen by CAS, the machine generates a
reset to reconfigure. At this point, the connection with the previous
KVM device needs to be closed and a new connection needs to opened
with the KVM device operating the chosen interrupt mode.
New routines are introduced to destroy the XICS and the XIVE KVM
devices. They make use of a new KVM device ioctl which destroys the
device and also disconnects the IRQ presenters from the vCPUs.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Message-Id: <20190513084245.25755-10-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This will be used to remove the MMIO regions of the POWER9 XIVE
interrupt controller when the sPAPR machine is reseted.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Message-Id: <20190513084245.25755-9-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
All is in place for KVM now. State synchronization and migration will
come next.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Message-Id: <20190513084245.25755-8-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
When the VM is stopped, the VM state handler stabilizes the XIVE IC
and marks the EQ pages dirty. These are then transferred to destination
before the transfer of the device vmstates starts.
The SpaprXive interrupt controller model captures the XIVE internal
tables, EAT and ENDT and the XiveTCTX model does the same for the
thread interrupt context registers.
At restart, the SpaprXive 'post_load' method restores all the XIVE
states. It is called by the sPAPR machine 'post_load' method, when all
XIVE states have been transferred and loaded.
Finally, the source states are restored in the VM change state handler
when the machine reaches the running state.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Message-Id: <20190513084245.25755-7-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This handler is in charge of stabilizing the flow of event notifications
in the XIVE controller before migrating a guest. This is a requirement
before transferring the guest EQ pages to a destination.
When the VM is stopped, the handler sets the source PQs to PENDING to
stop the flow of events and to possibly catch a triggered interrupt
occuring while the VM is stopped. Their previous state is saved. The
XIVE controller is then synced through KVM to flush any in-flight
event notification and to stabilize the EQs. At this stage, the EQ
pages are marked dirty to make sure the EQ pages are transferred if a
migration sequence is in progress.
The previous configuration of the sources is restored when the VM
resumes, after a migration or a stop. If an interrupt was queued while
the VM was stopped, the handler simply generates the missing trigger.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Message-Id: <20190513084245.25755-6-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This extends the KVM XIVE device backend with 'synchronize_state'
methods used to retrieve the state from KVM. The HW state of the
sources, the KVM device and the thread interrupt contexts are
collected for the monitor usage and also migration.
These get operations rely on their KVM counterpart in the host kernel
which acts as a proxy for OPAL, the host firmware. The set operations
will be added for migration support later.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190513084245.25755-5-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
XIVE hcalls are all redirected to QEMU as none are on a fast path.
When necessary, QEMU invokes KVM through specific ioctls to perform
host operations. QEMU should have done the necessary checks before
calling KVM and, in case of failure, H_HARDWARE is simply returned.
H_INT_ESB is a special case that could have been handled under KVM
but the impact on performance was low when under QEMU. Here are some
figures :
kernel irqchip OFF ON
H_INT_ESB KVM QEMU
rtl8139 (LSI ) 1.19 1.24 1.23 Gbits/sec
virtio 31.80 42.30 -- Gbits/sec
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Message-Id: <20190513084245.25755-4-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>