Exec transport backend for 'migrate'/'migrate-incoming' QAPIs accept
new wire protocol of MigrateAddress struct.
It is achived by parsing 'uri' string and storing migration parameters
required for exec connection into strList struct.
Suggested-by: Aravind Retnakaran <aravind.retnakaran@nutanix.com>
Signed-off-by: Het Gala <het.gala@nutanix.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231023182053.8711-8-farosas@suse.de>
RDMA based transport backend for 'migrate'/'migrate-incoming' QAPIs
accept new wire protocol of MigrateAddress struct.
It is achived by parsing 'uri' string and storing migration parameters
required for RDMA connection into well defined InetSocketAddress struct.
Suggested-by: Aravind Retnakaran <aravind.retnakaran@nutanix.com>
Signed-off-by: Het Gala <het.gala@nutanix.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231023182053.8711-7-farosas@suse.de>
Socket transport backend for 'migrate'/'migrate-incoming' QAPIs accept
new wire protocol of MigrateAddress struct.
It is achived by parsing 'uri' string and storing migration parameters
required for socket connection into well defined SocketAddress struct.
Suggested-by: Aravind Retnakaran <aravind.retnakaran@nutanix.com>
Signed-off-by: Het Gala <het.gala@nutanix.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231023182053.8711-6-farosas@suse.de>
This patch parses 'migrate' and 'migrate-incoming' QAPI's 'uri'
string containing migration connection related information
and stores them inside well defined 'MigrateAddress' struct.
Fabiano fixed for "file" transport.
Suggested-by: Aravind Retnakaran <aravind.retnakaran@nutanix.com>
Signed-off-by: Het Gala <het.gala@nutanix.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231023182053.8711-4-farosas@suse.de>
Message-ID: <20231023182053.8711-5-farosas@suse.de>
This patch introduces well defined MigrateAddress struct
and its related child objects.
The existing argument of 'migrate' and 'migrate-incoming' QAPI
- 'uri' is of type string. The current implementation follows
double encoding scheme for fetching migration parameters like
'uri' and this is not an ideal design.
Motive for intoducing struct level design is to prevent double
encoding of QAPI arguments, as Qemu should be able to directly
use the QAPI arguments without any level of encoding.
Note: this commit only adds the type, and actual uses comes
in later commits.
Fabiano fixed for "file" transport.
Suggested-by: Aravind Retnakaran <aravind.retnakaran@nutanix.com>
Signed-off-by: Het Gala <het.gala@nutanix.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231023182053.8711-2-farosas@suse.de>
Message-Id: <20231023182053.8711-3-farosas@suse.de>
Now we have a Error** passed into the return path thread stack, which is
even clearer than an int retval. Change ram_dirty_bitmap_reload() and the
callers to use a bool instead to replace errnos.
Suggested-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231017202633.296756-5-peterx@redhat.com>
To do so, create two paired sockets, but make them not providing real data.
Feed those fake sockets to src/dst QEMUs for recovery to let them go into
RECOVER stage without going out. Test that we can always kick it out and
recover again with the right ports.
This patch is based on Fabiano's version here:
https://lore.kernel.org/r/877cowmdu0.fsf@suse.de
Signed-off-by: Fabiano Rosas <farosas@suse.de>
[peterx: write commit message, remove case 1, fix bugs, and more]
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231017202633.296756-4-peterx@redhat.com>
Normally the postcopy recover phase should only exist for a super short
period, that's the duration when QEMU is trying to recover from an
interrupted postcopy migration, during which handshake will be carried out
for continuing the procedure with state changes from PAUSED -> RECOVER ->
POSTCOPY_ACTIVE again.
Here RECOVER phase should be super small, that happens right after the
admin specified a new but working network link for QEMU to reconnect to
dest QEMU.
However there can still be case where the channel is broken in this small
RECOVER window.
If it happens, with current code there's no way the src QEMU can got kicked
out of RECOVER stage. No way either to retry the recover in another channel
when established.
This patch allows the RECOVER phase to fail itself too - we're mostly
ready, just some small things missing, e.g. properly kick the main
migration thread out when sleeping on rp_sem when we found that we're at
RECOVER stage. When this happens, it fails the RECOVER itself, and
rollback to PAUSED stage. Then the user can retry another round of
recovery.
To make it even stronger, teach QMP command migrate-pause to explicitly
kick src/dst QEMU out when needed, so even if for some reason the migration
thread didn't got kicked out already by a failing rethrn-path thread, the
admin can also kick it out.
This will be an super, super corner case, but still try to cover that.
One can try to test this with two proxy channels for migration:
(a) socat unix-listen:/tmp/src.sock,reuseaddr,fork tcp:localhost:10000
(b) socat tcp-listen:10000,reuseaddr,fork unix:/tmp/dst.sock
So the migration channel will be:
(a) (b)
src -> /tmp/src.sock -> tcp:10000 -> /tmp/dst.sock -> dst
Then to make QEMU hang at RECOVER stage, one can do below:
(1) stop the postcopy using QMP command postcopy-pause
(2) kill the 2nd proxy (b)
(3) try to recover the postcopy using /tmp/src.sock on src
(4) src QEMU will go into RECOVER stage but won't be able to continue
from there, because the channel is actually broken at (b)
Before this patch, step (4) will make src QEMU stuck in RECOVER stage,
without a way to kick the QEMU out or continue the postcopy again. After
this patch, (4) will quickly fail qemu and bounce back to PAUSED stage.
Admin can also kick QEMU from (4) into PAUSED when needed using
migrate-pause when needed.
After bouncing back to PAUSED stage, one can recover again.
Reported-by: Xiaohui Li <xiaohli@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2111332
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231017202633.296756-3-peterx@redhat.com>
rp_state.error was a boolean used to show error happened in return path
thread. That's not only duplicating error reporting (migrate_set_error),
but also not good enough in that we only do error_report() and set it to
true, we never can keep a history of the exact error and show it in
query-migrate.
To make this better, a few things done:
- Use error_setg() rather than error_report() across the whole lifecycle
of return path thread, keeping the error in an Error*.
- With above, no need to have mark_source_rp_bad(), remove it, alongside
with rp_state.error itself.
- Use migrate_set_error() to apply that captured error to the global
migration object when error occured in this thread.
- Do the same when detected qemufile error in source return path
We need to re-export qemu_file_get_error_obj() to do the last one.
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231017202633.296756-2-peterx@redhat.com>
[ Maintainer note:
I put the test as flaky because our CI has problems with shared
memory. We will remove the flaky bits as soon as we get a solution.
]
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <1698263069-406971-7-git-send-email-steven.sistare@oracle.com>
Add the cpr-reboot migration mode. Usage:
$ qemu-system-$arch -monitor stdio ...
QEMU 8.1.50 monitor - type 'help' for more information
(qemu) migrate_set_capability x-ignore-shared on
(qemu) migrate_set_parameter mode cpr-reboot
(qemu) migrate -d file:vm.state
(qemu) info status
VM status: paused (postmigrate)
(qemu) quit
$ qemu-system-$arch -monitor stdio -incoming defer ...
QEMU 8.1.50 monitor - type 'help' for more information
(qemu) migrate_set_capability x-ignore-shared on
(qemu) migrate_set_parameter mode cpr-reboot
(qemu) migrate_incoming file:vm.state
(qemu) info status
VM status: running
In this mode, the migrate command saves state to a file, allowing one
to quit qemu, reboot to an updated kernel, and restart an updated version
of qemu. The caller must specify a migration URI that writes to and reads
from a file. Unlike normal mode, the use of certain local storage options
does not block the migration, but the caller must not modify guest block
devices between the quit and restart. To avoid saving guest RAM to the
file, the memory backend must be shared, and the @x-ignore-shared migration
capability must be set. Guest RAM must be non-volatile across reboot, such
as by backing it with a dax device, but this is not enforced. The restarted
qemu arguments must match those used to initially start qemu, plus the
-incoming option.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <1698263069-406971-6-git-send-email-steven.sistare@oracle.com>
vhost blocks migration if logging is not supported to track dirty
memory, and vhost-user blocks it if the log cannot be saved to a shm fd.
vhost-vdpa blocks migration if both hosts do not support all the device's
features using a shadow VQ, for tracking requests and dirty memory.
vhost-scsi blocks migration if storage cannot be shared across hosts,
or if state cannot be migrated.
None of these conditions apply if the old and new qemu processes do
not run concurrently, and if new qemu starts on the same host as old,
which is the case for cpr.
Narrow the scope of these blockers so they only apply to normal mode.
They will not block cpr modes when they are added in subsequent patches.
No functional change until a new mode is added.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <1698263069-406971-5-git-send-email-steven.sistare@oracle.com>
Some blockdevs block migration because they do not support sharing across
hosts and/or do not support dirty bitmaps. These prohibitions do not apply
if the old and new qemu processes do not run concurrently, and if new qemu
starts on the same host as old, which is the case for cpr. Narrow the scope
of these blockers so they only apply to normal mode. They will not block
cpr modes when they are added in subsequent patches.
No functional change until a new mode is added.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <1698263069-406971-4-git-send-email-steven.sistare@oracle.com>
Extend the blocker interface so that a blocker can be registered for
one or more migration modes. The existing interfaces register a
blocker for all modes, and the new interfaces take a varargs list
of modes.
Internally, maintain a separate blocker list per mode. The same Error
object may be added to multiple lists. When a block is deleted, it is
removed from every list, and the Error is freed.
No functional change until a new mode is added.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <1698263069-406971-3-git-send-email-steven.sistare@oracle.com>
Create a mode migration parameter that can be used to select alternate
migration algorithms. The default mode is normal, representing the
current migration algorithm, and does not need to be explicitly set.
No functional change until a new mode is added, except that the mode is
shown by the 'info migrate' command.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <1698263069-406971-2-git-send-email-steven.sistare@oracle.com>
This patch is inspired by Joao Martin's patch here:
https://lore.kernel.org/r/20230926161841.98464-1-joao.m.martins@oracle.com
Add tracepoints for major downtime checkpoints on both src and dst. They
share the same tracepoint with a string showing its stage.
Besides the checkpoints in the previous patch, this patch also added
destination checkpoints.
On src, we have these checkpoints added:
- src-downtime-start: right before vm stops on src
- src-vm-stopped: after vm is fully stopped
- src-iterable-saved: after all iterables saved (END sections)
- src-non-iterable-saved: after all non-iterable saved (FULL sections)
- src-downtime-stop: migration fully completed
On dst, we have these checkpoints added:
- dst-precopy-loadvm-completes: after loadvm all done for precopy
- dst-precopy-bh-*: record BH steps to resume VM for precopy
- dst-postcopy-bh-*: record BH steps to resume VM for postcopy
On dst side, we don't have a good way to trace total time consumed by
iterable or non-iterable for now. We can mark it by 1st time receiving a
FULL / END section, but rather than that let's just rely on the other
tracepoints added for vmstates to back up the information.
With this patch, one can enable "vmstate_downtime*" tracepoints and it'll
enable all tracepoints for downtime measurements necessary.
Drop loadvm_postcopy_handle_run_bh() tracepoint alongside, because they
service the same purpose, which was only for postcopy. We then have
unified prefix for all downtime relevant tracepoints.
Co-developed-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231030163346.765724-6-peterx@redhat.com>
Provide a helper for non-COLO use case of migration to stop a VM. This
prepares for adding some downtime relevant tracepoints to migration, where
they may or may not apply to COLO.
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231030163346.765724-5-peterx@redhat.com>
We have a bunch of savevm_section* tracepoints, they're good to analyze
migration stream, but not always suitable if someone would like to analyze
the migration downtime. Two major problems:
- savevm_section* tracepoints are dumping all sections, we only care
about the sections that contribute to the downtime
- They don't have an identifier to show the type of sections, so no way
to filter downtime information either easily.
We can add type into the tracepoints, but instead of doing so, this patch
kept them untouched, instead of adding a bunch of downtime specific
tracepoints, so one can enable "vmstate_downtime*" tracepoints and get a
full picture of how the downtime is distributed across iterative and
non-iterative vmstate save/load.
Note that here both save() and load() need to be traced, because both of
them may contribute to the downtime. The contribution is not a simple "add
them together", though: consider when the src is doing a save() of device1
while the dest can be load()ing for device2, so they can happen
concurrently.
Tracking both sides make sense because device load() and save() can be
imbalanced, one device can save() super fast, but load() super slow, vice
versa. We can't figure that out without tracing both.
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231030163346.765724-4-peterx@redhat.com>
Unify the three users on recording downtimes with the same pair of helpers.
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231030163346.765724-3-peterx@redhat.com>
Postcopy calculates its downtime separately. It always sets
MigrationState.downtime properly, but not MigrationState.downtime_start.
Make postcopy do the same as other modes on properly recording the
timestamp when the VM is going to be stopped. Drop the temporary variable
in postcopy_start() along the way.
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231030163346.765724-2-peterx@redhat.com>
I have no idea if we can have more than one vmware_vga device, so play
it safe.
Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231020090731.28701-14-quintela@redhat.com>
We can have more than one eeprom93xx.
For instance:
e100_nic_realize() -> eeprom93xx_new()
Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231020090731.28701-13-quintela@redhat.com>
We can have more than one audio backend.
void audio_init_audiodevs(void)
{
AudiodevListEntry *e;
QSIMPLEQ_FOREACH(e, &audiodevs, next) {
audio_init(e->dev, &error_fatal);
}
}
Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231020090731.28701-12-quintela@redhat.com>
Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231020090731.28701-11-quintela@redhat.com>
Before finally register one SaveStateEntry, we detect for duplicated
entries. This could be helpful to notify us asap instead of get
silent migration failures which could be hard to diagnose.
For example, this patch will generate a message like this (if without
previous fixes on x2apic) as long as we wants to boot a VM instance
with "-smp 200,maxcpus=288,sockets=2,cores=72,threads=2" and QEMU will
bail out even before VM starts:
savevm_state_handler_insert: Detected duplicate SaveStateEntry: id=apic, instance_id=0x0
Suggested-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231020090731.28701-10-quintela@redhat.com>
Current code does:
- register pre_2_10_vmstate_dummy_icp with "icp/server" and instance
dependinfg on cpu number
- for newer machines, it register vmstate_icp with "icp/server" name
and instance 0
- now it unregisters "icp/server" for the 1st instance.
This is wrong at many levels:
- we shouldn't have two VMSTATEDescriptions with the same name
- In case this is the only solution that we can came with, it needs to
be:
* register pre_2_10_vmstate_dummy_icp
* unregister pre_2_10_vmstate_dummy_icp
* register real vmstate_icp
Created vmstate_replace_hack_for_ppc() with warnings left and right
that it is a hack.
CC: Cedric Le Goater <clg@kaod.org>
CC: Daniel Henrique Barboza <danielhb413@gmail.com>
CC: David Gibson <david@gibson.dropbear.id.au>
CC: Greg Kurz <groug@kaod.org>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231020090731.28701-8-quintela@redhat.com>
Each user network conection create a new slirp instance. We register
more than one slirp instance for number 0.
qemu-system-x86_64: -netdev user,id=hs1: savevm_state_handler_insert: Detected duplicate SaveStateEntry: id=slirp, instance_id=0x0
Broken pipe
../../../../../mnt/code/qemu/full/tests/qtest/libqtest.c:195: kill_qemu() tried to terminate QEMU process but encountered exit status 1 (expected 0)
Aborted (core dumped)
Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231020090731.28701-6-quintela@redhat.com>
This are the easiest cases, where we were already using
VMSTATE_INSTANCE_ID_ANY.
Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231020090731.28701-3-quintela@redhat.com>
We have lots of cases where we are using an instance_id==0 when we
should be using VMSTATE_INSTANCE_ID_ANY (-1). Basically everything
that can have more than one needs to have a proper instance_id or -1
and the system will take one for it.
vmstate_register_any(): We register with -1.
Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231020090731.28701-2-quintela@redhat.com>
We must not call register_savevm_live() from an instance_init() function
(since this could be called multiple times during device introspection).
Move this to the realize() function instead.
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Eric Farman <farman@linux.ibm.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231020150554.664422-4-thuth@redhat.com>
There's no need for dedicated handlers here if they don't do anything
special.
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Eric Farman <farman@linux.ibm.com>
Acked-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231020150554.664422-3-thuth@redhat.com>
Since the instance_init() function immediately tries to set the
property to "true", the s390_skeys_set_migration_enabled() tries
to register a savevm handler during instance_init(). However,
instance_init() functions can be called multiple times, e.g. for
introspection of devices. That means multiple instances of devices
can be created during runtime (which is fine as long as they all
don't get realized, too), so the "Prevent double registration of
savevm handler" check in the s390_skeys_set_migration_enabled()
function does not work at all as expected (since there could be
more than one instance).
Thus we must not call register_savevm_live() from an instance_init()
function at all. Move this to the realize() function instead. This
way we can also get rid of the property getter and setter functions
completely, simplifying the code along the way quite a bit.
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Eric Farman <farman@linux.ibm.com>
Acked-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231020150554.664422-2-thuth@redhat.com>
instance_init() can be called multiple times, e.g. during introspection
of the device. We should not install the vmstate handlers here. Do it
in the realize() function instead.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Acked-by: Corey Minyard <cminyard@mvista.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231020145554.662751-1-thuth@redhat.com>
- virtio-blk: use blk_io_plug_call() instead of notification BH
- mirror: allow switching from background to active mode
- qemu-img rebase: add compression support
- Fix locking in media change monitor commands
- Fix a few blockjob-related deadlocks when using iothread
-----BEGIN PGP SIGNATURE-----
iQJFBAABCAAvFiEE3D3rFZqa+V09dFb+fwmycsiPL9YFAmVBTkERHGt3b2xmQHJl
ZGhhdC5jb20ACgkQfwmycsiPL9ZiqRAAqvsWbblmEGJ7TBKYQK3f8QshJ66RxzbC
4eSjKHrciWNTeeIeU8r8OvFcPPoTcPXxpcmasD2gsAxG5W5N8vkPbBkW+YT4YdDJ
pWJXrbJ15nILC4DmnR1ARVtvxKgv9zy5LSm5bjss1K+OSYJl/nx+ILjmfVZnYDF7
z1dP/G0JxKKm4JzAIdBE3uZS+6Q5kx/wGYlJv8EQmlH3DYfsJfy6Lthe9jfw8ijg
lSqLoQ+D0lEd6Bk4XbkUqqBxFcYBWTfU6qPZoyIO94zCTwTG9yIjmoivxmmfwQZq
cJUTGGZjcxpJYnvcC6P13WgcWBtcD9L2kYFVH0JyjpwcSg9cCGHMF66n9pSlyEGq
DUikwVzbTwOotwzYQyM88v4ET+2+Qdcwn8pRbv9PllEczh0kAsUAEuxSgtz4NEcN
bZrap/16xHFybNOKkMZcmpqxspT5NXKbDODUP0IvbSYMOYpWS983nBTxwMRpyHog
2TFDZu4DjNiPkI2BcYM5VOKk6diNowZFShcEKvoaOLX/n9EBhP0tjoH9VUn1800F
myHrhF2jpIf9GhErMWB7N2W3/0aK0pqdQgbpVnd1ARDdIdYkr7G/S+50D9K80b6n
0q2E7br4S5bcsY0HQzBL9YARSayY+lVOssLoolCWEsYzijdBQmAvs5THajFKcism
/idI6nlp2Vs=
=RdxS
-----END PGP SIGNATURE-----
Merge tag 'for-upstream' of https://repo.or.cz/qemu/kevin into staging
Block layer patches
- virtio-blk: use blk_io_plug_call() instead of notification BH
- mirror: allow switching from background to active mode
- qemu-img rebase: add compression support
- Fix locking in media change monitor commands
- Fix a few blockjob-related deadlocks when using iothread
# -----BEGIN PGP SIGNATURE-----
#
# iQJFBAABCAAvFiEE3D3rFZqa+V09dFb+fwmycsiPL9YFAmVBTkERHGt3b2xmQHJl
# ZGhhdC5jb20ACgkQfwmycsiPL9ZiqRAAqvsWbblmEGJ7TBKYQK3f8QshJ66RxzbC
# 4eSjKHrciWNTeeIeU8r8OvFcPPoTcPXxpcmasD2gsAxG5W5N8vkPbBkW+YT4YdDJ
# pWJXrbJ15nILC4DmnR1ARVtvxKgv9zy5LSm5bjss1K+OSYJl/nx+ILjmfVZnYDF7
# z1dP/G0JxKKm4JzAIdBE3uZS+6Q5kx/wGYlJv8EQmlH3DYfsJfy6Lthe9jfw8ijg
# lSqLoQ+D0lEd6Bk4XbkUqqBxFcYBWTfU6qPZoyIO94zCTwTG9yIjmoivxmmfwQZq
# cJUTGGZjcxpJYnvcC6P13WgcWBtcD9L2kYFVH0JyjpwcSg9cCGHMF66n9pSlyEGq
# DUikwVzbTwOotwzYQyM88v4ET+2+Qdcwn8pRbv9PllEczh0kAsUAEuxSgtz4NEcN
# bZrap/16xHFybNOKkMZcmpqxspT5NXKbDODUP0IvbSYMOYpWS983nBTxwMRpyHog
# 2TFDZu4DjNiPkI2BcYM5VOKk6diNowZFShcEKvoaOLX/n9EBhP0tjoH9VUn1800F
# myHrhF2jpIf9GhErMWB7N2W3/0aK0pqdQgbpVnd1ARDdIdYkr7G/S+50D9K80b6n
# 0q2E7br4S5bcsY0HQzBL9YARSayY+lVOssLoolCWEsYzijdBQmAvs5THajFKcism
# /idI6nlp2Vs=
# =RdxS
# -----END PGP SIGNATURE-----
# gpg: Signature made Wed 01 Nov 2023 03:58:09 JST
# gpg: using RSA key DC3DEB159A9AF95D3D7456FE7F09B272C88F2FD6
# gpg: issuer "kwolf@redhat.com"
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" [full]
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6
* tag 'for-upstream' of https://repo.or.cz/qemu/kevin: (27 commits)
iotests: add test for changing mirror's copy_mode
mirror: return mirror-specific information upon query
blockjob: query driver-specific info via a new 'query' driver method
qapi/block-core: turn BlockJobInfo into a union
qapi/block-core: use JobType for BlockJobInfo's type
mirror: implement mirror_change method
block/mirror: determine copy_to_target only once
block/mirror: move dirty bitmap to filter
block/mirror: set actively_synced even after the job is ready
blockjob: introduce block-job-change QMP command
virtio-blk: remove batch notification BH
virtio: use defer_call() in virtio_irqfd_notify()
util/defer-call: move defer_call() to util/
block: rename blk_io_plug_call() API to defer_call()
blockdev: mirror: avoid potential deadlock when using iothread
block: avoid potential deadlock during bdrv_graph_wrlock() in bdrv_close()
blockjob: drop AioContext lock before calling bdrv_graph_wrlock()
iotests: Test media change with iothreads
block: Fix locking in media change monitor commands
iotests: add tests for "qemu-img rebase" with compression
...
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
- add dtc package to openbsd VMs
- use -fno-stack-protector for non-stdlib tests
- split alpha and sh4 compilers into legacy image
- harmonise other compilers into debian-all-test-cross
- fix NULL check in gdb_regs
- fix memleak in semihosting
- remove unused parameter in plugin code
- fix fd leak in lockstep plugin
-----BEGIN PGP SIGNATURE-----
iQEzBAABCgAdFiEEZoWumedRZ7yvyN81+9DbCVqeKkQFAmVBCvQACgkQ+9DbCVqe
KkR8jAgAjFC3BE6fu80zYT0Dmeu8zh20QY/wgKQebaFfGEmPL4Bqkl2D/Rx7PhQA
EH8fR/LAH/iXAO07+LYOB6QiyMb9PWiXS52iHyE3q11mOaM8iKkkj7a59NW8DfGC
biSrj9o3wpz9gGkJjzTCcHC8DOMbrAuE12XnmhW7uTqqkrcTMC393dSEeyL+nrP9
lKS5XzFyn3FOT4YIL8hAC02ObKH4LpWIO3gdWeDAo56yg24fLir9a2wYSXMaxQtN
kDf6UtL97CIIhbNi6qrUPBB13MV8MlXno3wnb9+E4Cn5sGntGSnTyh7j6XrGqYj9
p/Vio6ye8xP1IjlavKiBM0nnozcAhw==
=ZOMS
-----END PGP SIGNATURE-----
Merge tag 'pull-halloween-omnibus-311023-2' of https://gitlab.com/stsquad/qemu into staging
Maintainer updates for testing, gitlab, gdbstub and plugins:
- add dtc package to openbsd VMs
- use -fno-stack-protector for non-stdlib tests
- split alpha and sh4 compilers into legacy image
- harmonise other compilers into debian-all-test-cross
- fix NULL check in gdb_regs
- fix memleak in semihosting
- remove unused parameter in plugin code
- fix fd leak in lockstep plugin
# -----BEGIN PGP SIGNATURE-----
#
# iQEzBAABCgAdFiEEZoWumedRZ7yvyN81+9DbCVqeKkQFAmVBCvQACgkQ+9DbCVqe
# KkR8jAgAjFC3BE6fu80zYT0Dmeu8zh20QY/wgKQebaFfGEmPL4Bqkl2D/Rx7PhQA
# EH8fR/LAH/iXAO07+LYOB6QiyMb9PWiXS52iHyE3q11mOaM8iKkkj7a59NW8DfGC
# biSrj9o3wpz9gGkJjzTCcHC8DOMbrAuE12XnmhW7uTqqkrcTMC393dSEeyL+nrP9
# lKS5XzFyn3FOT4YIL8hAC02ObKH4LpWIO3gdWeDAo56yg24fLir9a2wYSXMaxQtN
# kDf6UtL97CIIhbNi6qrUPBB13MV8MlXno3wnb9+E4Cn5sGntGSnTyh7j6XrGqYj9
# p/Vio6ye8xP1IjlavKiBM0nnozcAhw==
# =ZOMS
# -----END PGP SIGNATURE-----
# gpg: Signature made Tue 31 Oct 2023 23:11:00 JST
# gpg: using RSA key 6685AE99E75167BCAFC8DF35FBD0DB095A9E2A44
# gpg: Good signature from "Alex Bennée (Master Work Key) <alex.bennee@linaro.org>" [unknown]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 6685 AE99 E751 67BC AFC8 DF35 FBD0 DB09 5A9E 2A44
* tag 'pull-halloween-omnibus-311023-2' of https://gitlab.com/stsquad/qemu:
contrib/plugins: Close file descriptor on error return
plugins: Remove an extra parameter
semihosting: fix memleak at semihosting_arg_fallback
gdbstub: Check if gdb_regs is NULL
tests/docker: upgrade debian-all-test-cross to bookworm
tests/docker: use debian-all-test-cross for sparc64
tests/docker: use debian-all-test-cross for riscv64
tests/docker: use debian-all-test-cross for mips
tests/docker: use debian-all-test-cross for mips64
tests/docker: use debian-all-test-cross for m68k
tests/docker: use debian-all-test-cross for hppa
tests/docker: use debian-all-test-cross for power
tests/docker: move sh4 to use debian-legacy-test-cross
tests/docker: use debian-legacy-test-cross for alpha
gitlab: add build-loongarch to matrix
gitlab: clean-up build-soft-softmmu job
gitlab: split alpha testing into a legacy container
tests/tcg: Add -fno-stack-protector
tests/vm/openbsd: Use the system dtc package
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
One part of the test is using a throttled source to ensure that there
are no obvious issues when changing the copy_mode while there are
ongoing requests (source and target images are compared at the very
end).
The other part of the test is using a throttled target to ensure that
the change to active mode actually happened. This is done by hitting
the throttling limit, issuing a synchronous write and then immediately
verifying the target side. QSD is used, because otherwise, a
synchronous write would hang there.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Message-ID: <20231031135431.393137-11-f.ebner@proxmox.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
To start out, only actively-synced is returned.
For example, this is useful for jobs that started out in background
mode and switched to active mode. Once actively-synced is true, it's
clear that the mode switch has been completed. Note that completion of
the switch might happen much earlier, e.g. if the switch happens
before the job is ready, once all background operations have finished.
It's assumed that whether the disks are actively-synced or not is more
interesting than whether the mode switch completed. That information
can still be added if required in the future.
In presence of an iothread, the actively_synced member is now shared
between the iothread and the main thread, so turn accesses to it
atomic.
Requires to adapt the output for iotest 109.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Message-ID: <20231031135431.393137-10-f.ebner@proxmox.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Message-ID: <20231031135431.393137-9-f.ebner@proxmox.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
In preparation to additionally return job-type-specific information.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Message-ID: <20231031135431.393137-8-f.ebner@proxmox.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
In preparation to turn BlockJobInfo into a union with @type as the
discriminator. That requires it to be an enum. Even without that
requirement, it's nicer to have an enum instead of a str here.
No functional change is intended.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Message-ID: <20231031135431.393137-7-f.ebner@proxmox.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
which allows switching the @copy-mode from 'background' to
'write-blocking'.
This is useful for management applications, so they can start out in
background mode to avoid limiting guest write speed and switch to
active mode when certain criteria are fulfilled.
In presence of an iothread, the copy_mode member is now shared between
the iothread and the main thread, so turn accesses to it atomic.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Message-ID: <20231031135431.393137-6-f.ebner@proxmox.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
In preparation to allow changing the copy_mode via QMP. When running
in an iothread, it could be that copy_mode is changed from the main
thread in between reading copy_mode in bdrv_mirror_top_pwritev() and
reading copy_mode in bdrv_mirror_top_do_write(), so they might end up
disagreeing about whether copy_to_target is true or false. Avoid that
scenario by determining copy_to_target only once and passing it to
bdrv_mirror_top_do_write() as an argument.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Message-ID: <20231031135431.393137-5-f.ebner@proxmox.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
In preparation to allow switching to active mode without draining.
Initialization of the bitmap in mirror_dirty_init() still happens with
the original/backing BlockDriverState, which should be fine, because
the mirror top has the same length.
Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Message-ID: <20231031135431.393137-4-f.ebner@proxmox.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
In preparation to allow switching from background to active mode. This
ensures that setting actively_synced will not be missed when the
switch happens after the job is ready.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Message-ID: <20231031135431.393137-3-f.ebner@proxmox.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
which will allow changing job-type-specific options after job
creation.
In the JobVerbTable, the same allow bits as for set-speed are used,
because set-speed can be considered an existing change command.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Message-ID: <20231031135431.393137-2-f.ebner@proxmox.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
There is a batching mechanism for virtio-blk Used Buffer Notifications
that is no longer needed because the previous commit added batching to
virtio_notify_irqfd().
Note that this mechanism was rarely used in practice because it is only
enabled when EVENT_IDX is not negotiated by the driver. Modern drivers
enable EVENT_IDX.
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20230913200045.1024233-5-stefanha@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
virtio-blk and virtio-scsi invoke virtio_irqfd_notify() to send Used
Buffer Notifications from an IOThread. This involves an eventfd
write(2) syscall. Calling this repeatedly when completing multiple I/O
requests in a row is wasteful.
Use the defer_call() API to batch together virtio_irqfd_notify() calls
made during thread pool (aio=threads), Linux AIO (aio=native), and
io_uring (aio=io_uring) completion processing.
Behavior is unchanged for emulated devices that do not use
defer_call_begin()/defer_call_end() since defer_call() immediately
invokes the callback when called outside a
defer_call_begin()/defer_call_end() region.
fio rw=randread bs=4k iodepth=64 numjobs=8 IOPS increases by ~9% with a
single IOThread and 8 vCPUs. iodepth=1 decreases by ~1% but this could
be noise. Detailed performance data and configuration specifics are
available here:
https://gitlab.com/stefanha/virt-playbooks/-/tree/blk_io_plug-irqfd
This duplicates the BH that virtio-blk uses for batching. The next
commit will remove it.
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20230913200045.1024233-4-stefanha@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The networking subsystem may wish to use defer_call(), so move the code
to util/ where it can be reused.
As a reminder of what defer_call() does:
This API defers a function call within a defer_call_begin()/defer_call_end()
section, allowing multiple calls to batch up. This is a performance
optimization that is used in the block layer to submit several I/O requests
at once instead of individually:
defer_call_begin(); <-- start of section
...
defer_call(my_func, my_obj); <-- deferred my_func(my_obj) call
defer_call(my_func, my_obj); <-- another
defer_call(my_func, my_obj); <-- another
...
defer_call_end(); <-- end of section, my_func(my_obj) is called once
Suggested-by: Ilya Maximets <i.maximets@ovn.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20230913200045.1024233-3-stefanha@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>