Badly formatted migration streams can go undetected or produce
misleading errors due to a lock of checking at the end of sections.
In particular a section that adds an extra 0x00 at the end
causes what looks like a normal end of stream and thus doesn't produce
any errors, and something that ends in a 0x01..0x04 kind of look
like real section headers and then fail when the section parser tries
to figure out which section they are. This is made worse by the
choice of 0x00..0x04 being small numbers that are particularly common
in normal section data.
This patch adds a section footer consisting of a marker (0x7e - ~)
followed by the section-id that was also sent in the header. If
they mismatch then it throws an error explaining which section was
being loaded.
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
The next patch adds section footers; but we don't want to
break migration compatibility so disable them on older
machine types
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
In postcopy we need the loadvm_handlers to be used in a couple
of different instances of the loadvm loop/routine, and thus
it can't be local any more.
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
There are currently lots of pieces of incoming migration state scattered
around, and postcopy is adding more, and it seems better to try and keep
it together.
allocate MIS in process_incoming_migration_co
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
For historic reasons, ram migration have been on arch_init.c. Just
split it into migration/ram.c, the same that happened with block.c.
There is only code movement, no changes altogether.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Put the three parameters related to multiple thread (de)compression
into an int array, and use an enum type to index the parameter.
Signed-off-by: Liang Li <liang.z.li@intel.com>
Signed-off-by: Yang Zhang <yang.z.zhang@intel.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Add the code to create and destroy the multiple threads those will be
used to do data decompression. Left some functions empty just to keep
clearness, and the code will be added later.
Signed-off-by: Liang Li <liang.z.li@intel.com>
Signed-off-by: Yang Zhang <yang.z.zhang@intel.com>
Reviewed-by: Dr.David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Add the code to create and destroy the multiple threads those will
be used to do data compression. Left some functions empty to keep
clearness, and the code will be added later.
Signed-off-by: Liang Li <liang.z.li@intel.com>
Signed-off-by: Yang Zhang <yang.z.zhang@intel.com>
Reviewed-by: Dr.David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
migrate_rdma_pin_all() and qsb_clone() are completely unused and thus
can be deleted.
Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Cc: Juan Quintela <quintela@redhat.com>
Cc: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
It used to be an int, but then we can't pass directly the
bytes_transferred parameter, that would happen later in the series.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
One of the annoyances of the current migration format is the fact that
it's not self-describing. In fact, it's not properly describing at all.
Some code randomly scattered throughout QEMU elaborates roughly how to
read and write a stream of bytes.
We discussed an idea during KVM Forum 2013 to add a JSON description of
the migration protocol itself to the migration stream. This patch
adds a section after the VM_END migration end marker that contains
description data on what the device sections of the stream are composed of.
This approach is backwards compatible with any QEMU version reading the
stream, because QEMU just stops reading after the VM_END marker and ignores
any data following it.
With an additional external program this allows us to decipher the
contents of any migration stream and hopefully make migration bugs easier
to track down.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
expose the count that logs the times of updating the dirty bitmap to
end user.
Signed-off-by: ChenLiang <chenliang88@huawei.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Provide ram_mig_init (like blk_mig_init) for vl.c to initialise stuff
to do with ram migration (currently in arch_init.c).
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Gonglei <arei.gonglei@huawei.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
When qemu do live migration with xbzrle, qemu malloc decoded_buf
at destination end but free it at source end. It will crash qemu
by double free error in some scenarios. Splitting the XBZRLE structure
for clear logic distinguishing src/dst side.
Signed-off-by: ChenLiang <chenliang88@huawei.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Orit Wasserman <owasserm@redhat.com>
Signed-off-by: GongLei <arei.gonglei@huawei.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
The VMState code will be moved to vmstate.c and it uses some of the
QEMU_VM_* constants, so move it to a header.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Orit Wasserman <owasserm@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Commit 29ae8a4133 ("rdma: introduce
MIG_STATE_NONE and change MIG_STATE_SETUP state transition") changed the
state transitions during migration setup.
Spice used to be notified with MIG_STATE_ACTIVE and it detected this
using migration_is_active(). Spice is now notified with
MIG_STATE_SETUP and migration_is_active() no longer works.
Replace migration_is_active() with migration_in_setup() to fix spice
migration.
Cc: Michael R. Hines <mrhines@us.ibm.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Using the previous patches, we're now able to timestamp the SETUP
state. Once we have this time, let the user know about it in the
schema.
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Michael R. Hines <mrhines@us.ibm.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Code that does need to be visible is kept
well contained inside this file and this is the only
new additional file to the entire patch.
This file includes the entire protocol and interfaces
required to perform RDMA migration.
Also, the configure and Makefile modifications to link
this file are included.
Full documentation is in docs/rdma.txt
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Chegu Vinod <chegu_vinod@hp.com>
Tested-by: Chegu Vinod <chegu_vinod@hp.com>
Tested-by: Michael R. Hines <mrhines@us.ibm.com>
Signed-off-by: Michael R. Hines <mrhines@us.ibm.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
This gives RDMA shared access to madvise() on the destination side
when an entire chunk is found to be zero.
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Chegu Vinod <chegu_vinod@hp.com>
Tested-by: Chegu Vinod <chegu_vinod@hp.com>
Tested-by: Michael R. Hines <mrhines@us.ibm.com>
Signed-off-by: Michael R. Hines <mrhines@us.ibm.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
this patch adds a efficient encoding for zero blocks by
adding a new flag indicating a block is completely zero.
additionally bdrv_write_zeros() is used at the destination
to efficiently write these zeroes. depending on the implementation
this avoids that the destination target gets fully provisioned.
Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
The auto-converge migration capability allows the user to specify if they
choose live migration seqeunce to automatically detect and force convergence.
Signed-off-by: Chegu Vinod <chegu_vinod@hp.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
This capability allows you to disable dynamic chunk registration
for better throughput on high-performance links.
For example, using an 8GB RAM virtual machine with all 8GB of memory in
active use and the VM itself is completely idle using a 40 gbps infiniband link:
1. x-rdma-pin-all disabled total time: approximately 7.5 seconds @ 9.5 Gbps
2. x-rdma-pin-all enabled total time: approximately 4 seconds @ 26 Gbps
These numbers would of course scale up to whatever size virtual machine
you have to migrate using RDMA.
Enabling this feature does *not* have any measurable affect on
migration *downtime*. This is because, without this feature, all of the
memory will have already been registered already in advance during
the bulk round and does not need to be re-registered during the successive
iteration rounds.
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Chegu Vinod <chegu_vinod@hp.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Tested-by: Chegu Vinod <chegu_vinod@hp.com>
Tested-by: Michael R. Hines <mrhines@us.ibm.com>
Signed-off-by: Michael R. Hines <mrhines@us.ibm.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
These are the prototypes and implementation of new hooks that
RDMA takes advantage of to perform dynamic page registration.
An optional hook is also introduced for a custom function
to be able to override the default save_page function.
Also included are the prototypes and accessor methods used by
arch_init.c which invoke funtions inside savevm.c to call out
to the hooks that may or may not have been overridden
inside of QEMUFileOps.
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Chegu Vinod <chegu_vinod@hp.com>
Tested-by: Chegu Vinod <chegu_vinod@hp.com>
Tested-by: Michael R. Hines <mrhines@us.ibm.com>
Signed-off-by: Michael R. Hines <mrhines@us.ibm.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
This exposes throughput (in megabits/sec) through QMP.
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Chegu Vinod <chegu_vinod@hp.com>
Tested-by: Chegu Vinod <chegu_vinod@hp.com>
Tested-by: Michael R. Hines <mrhines@us.ibm.com>
Signed-off-by: Michael R. Hines <mrhines@us.ibm.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
RDMA writes happen asynchronously, and thus the performance accounting
also needs to be able to occur asynchronously. This allows anybody
to call into savevm.c to update both f->pos as well as into arch_init.c
to update the acct_info structure with up-to-date values when
the RDMA transfer actually completes.
Reviewed-by: Juan Quintela <quintela@redhat.com>
Tested-by: Chegu Vinod <chegu_vinod@hp.com>
Tested-by: Michael R. Hines <mrhines@us.ibm.com>
Signed-off-by: Michael R. Hines <mrhines@us.ibm.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
during bulk stage of ram migration if a page is a
zero page do not send it at all.
the memory at the destination reads as zero anyway.
even if there is an madvise with QEMU_MADV_DONTNEED
at the target upon receipt of a zero page I have observed
that the target starts swapping if the memory is overcommitted.
it seems that the pages are dropped asynchronously.
this patch also updates QMP to return the number of
skipped pages in MigrationStats.
Signed-off-by: Peter Lieven <pl@kamp.de>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
The indirection is useless now. Backends can open s->file directly.
Reviewed-by: Orit Wasserman <owasserm@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Second, drop the file descriptor indirection, and write directly to the
QEMUFile.
Reviewed-by: Orit Wasserman <owasserm@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
As a start, use QEMUFile to store the destination and close it.
qemu_get_fd gets a file descriptor that will be used by the write
callbacks.
Reviewed-by: Orit Wasserman <owasserm@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Buffering was needed because blocking writes could take a long time
and starve other threads seeking to grab the big QEMU mutex.
Now that all writes (except within _complete callbacks) are done
outside the big QEMU mutex, we do not need buffering at all.
Reviewed-by: Orit Wasserman <owasserm@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Perform final cleanup in a bottom half, and add joining the thread to
the series of cleanup actions.
migrate_fd_error remains for connection error, but it doesn't need
to cleanup anything anymore.
Reviewed-by: Orit Wasserman <owasserm@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Always use qemu_file_get_error to detect errors, since that is how
QEMUFile itself drops I/O after an error occurs. There is no need
to propagate and check return values all the time.
Also remove the "complete" member, since we know that it is set (via
migrate_fd_cleanup) only when the state changes.
Reviewed-by: Orit Wasserman <owasserm@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
We removed the calculation in commit e4ed1541ac
Now we add it back. We need to create dirty_bytes_rate because we
can't include cpu-all.h from migration.c, and there is no other way to
include TARGET_PAGE_SIZE.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Orit Wasserman <owasserm@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Reviewed-by: Eric Blake <eblake@redhat.com>
Avoid splitting the state of outgoing migration, more or less arbitrarily,
between two data structures. QEMUFileBuffered anyway is used only during
migration.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
This only moves the code (also from buffered_file.h to migration.h).
Fix whitespace until checkpatch is happy.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Code just now does (simplified for clarity)
if (qemu_savevm_state_iterate(s->file) == 1) {
vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
qemu_savevm_state_complete(s->file);
}
Problem here is that qemu_savevm_state_iterate() returns 1 when it
knows that remaining memory to sent takes less than max downtime.
But this means that we could end spending 2x max_downtime, one
downtime in qemu_savevm_iterate, and the other in
qemu_savevm_state_complete.
Changed code to:
pending_size = qemu_savevm_state_pending(s->file, max_size);
DPRINTF("pending size %lu max %lu\n", pending_size, max_size);
if (pending_size >= max_size) {
ret = qemu_savevm_state_iterate(s->file);
} else {
vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
qemu_savevm_state_complete(s->file);
}
So what we do is: at current network speed, we calculate the maximum
number of bytes we can sent: max_size.
Then we ask every save_live section how much they have pending. If
they are less than max_size, we move to complete phase, otherwise we
do an iterate one.
This makes things much simpler, because now individual sections don't
have to caluclate the bandwidth (it was implossible to do right from
there).
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Now that we have a thread, and blocking writes, we don't need it.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
We want the file assignment to happen before the thread is created to
avoid locking, so we just do it before creating the thread.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Orit Wasserman <owasserm@redhat.com>