Previously a BlockDriverState has only one dirty bitmap, so only one
caller (e.g. a block job) can keep track of writing. This changes the
dirty bitmap to a list and creates a BdrvDirtyBitmap for each caller, the
lifecycle is managed with these new functions:
bdrv_create_dirty_bitmap
bdrv_release_dirty_bitmap
Where BdrvDirtyBitmap is a linked list wrapper structure of HBitmap.
In place of bdrv_set_dirty_tracking, a BdrvDirtyBitmap pointer argument
is added to these functions, since each caller has its own dirty bitmap:
bdrv_get_dirty
bdrv_dirty_iter_init
bdrv_get_dirty_count
bdrv_set_dirty and bdrv_reset_dirty prototypes are unchanged but will
internally walk the list of all dirty bitmaps and set them one by one.
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
block-migration.c does not actually use DriveInfo anywhere. Hence it's
safe to drive ref code, we really only care about referencing BDS.
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
this patch adds a efficient encoding for zero blocks by
adding a new flag indicating a block is completely zero.
additionally bdrv_write_zeros() is used at the destination
to efficiently write these zeroes. depending on the implementation
this avoids that the destination target gets fully provisioned.
Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Only the migration_bitmap_sync() call needs the iothread lock.
Reviewed-by: Orit Wasserman <owasserm@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
This makes it possible to do blocking writes directly to the socket,
with no buffer in the middle. For RAM, only the migration_bitmap_sync()
call needs the iothread lock. For block migration, it is needed by
the block layer (including bdrv_drain_all and dirty bitmap access),
but because some code is shared between iterate and complete, all of
mig_save_device_dirty is run with the lock taken.
In the savevm case, the iterate callback runs within the big lock.
This is annoying because it complicates the rules. Luckily we do not
need to do anything about it: the RAM iterate callback does not need
the iothread lock, and block migration never runs during savevm.
Reviewed-by: Orit Wasserman <owasserm@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Some state is shared between the block migration code and its AIO
callbacks. Once block migration will run outside the iothread,
the block migration code and the AIO callbacks will be able to
run concurrently. Protect the critical sections with a separate
lock. Do the same for completed_sectors, which can be used from
the monitor.
Reviewed-by: Orit Wasserman <owasserm@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Some small changes that will simplify the positioning of lock/unlock
primitives.
Reviewed-by: Orit Wasserman <owasserm@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Now that the cancel callback is called consistently for all errors,
we can avoid doing its work in the other callbacks.
Reviewed-by: Orit Wasserman <owasserm@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
The return value of .save_live_pending() is the number of bytes
remaining. This is just an estimate because we do not know how many
blocks will be dirtied by the running guest.
Currently our return value for .save_live_pending() is wrong because it
includes dirty blocks but not in-flight bdrv_aio_readv() requests or
unsent blocks. Crucially, it also doesn't include the bulk phase where
the entire device is transferred - therefore we risk completing block
migration before all blocks have been transferred!
The return value of .save_live_iterate() is the number of bytes
transferred this iteration. Currently we return whether there are bytes
remaining, which is incorrect.
Move the bytes remaining calculation into .save_live_pending() and
really return the number of bytes transferred this iteration in
.save_live_iterate().
Also fix the %ld format specifier which was used for a uint64_t
argument. PRIu64 must be use to avoid warnings on 32-bit hosts.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-id: 1360661835-28663-3-git-send-email-stefanha@redhat.com
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
The .save_live_iterate() function returns 0 to continue iterating or 1
to stop iterating.
Since 16310a3cca it only ever returns 0,
leading to an infinite loop.
Return 1 if we have finished sending dirty blocks.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1360534366-26723-4-git-send-email-stefanha@redhat.com
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Commit 43be3a25c9 changed the
blk_mig_save_dirty_block() return code handling. The function's doc
comment says:
/* return value:
* 0: too much data for max_downtime
* 1: few enough data for max_downtime
*/
Because of the 1 return value, callers must check for ret < 0 instead of
just:
if (ret) { ... }
We do not want to bail when 1 is returned, only on error.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1360534366-26723-3-git-send-email-stefanha@redhat.com
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Show the actual flags value and include "block migration" in the error
message so it's clear where the error is coming from.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1360534366-26723-2-git-send-email-stefanha@redhat.com
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Code just now does (simplified for clarity)
if (qemu_savevm_state_iterate(s->file) == 1) {
vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
qemu_savevm_state_complete(s->file);
}
Problem here is that qemu_savevm_state_iterate() returns 1 when it
knows that remaining memory to sent takes less than max downtime.
But this means that we could end spending 2x max_downtime, one
downtime in qemu_savevm_iterate, and the other in
qemu_savevm_state_complete.
Changed code to:
pending_size = qemu_savevm_state_pending(s->file, max_size);
DPRINTF("pending size %lu max %lu\n", pending_size, max_size);
if (pending_size >= max_size) {
ret = qemu_savevm_state_iterate(s->file);
} else {
vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
qemu_savevm_state_complete(s->file);
}
So what we do is: at current network speed, we calculate the maximum
number of bytes we can sent: max_size.
Then we ask every save_live section how much they have pending. If
they are less than max_size, we move to complete phase, otherwise we
do an iterate one.
This makes things much simpler, because now individual sections don't
have to caluclate the bandwidth (it was implossible to do right from
there).
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Make consistent the result of blk_mig_save_dirty_block() and
mig_save_device_dirty()
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
This means we don't need to pass through qemu_file to get the errors.
Adjust all callers.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
When cancelling block migration, all in-flight requests of the block
migration must be completed before the data can be freed. This was
visible as failing assertions and segfaults.
Reported-by: Peter Lieven <pl@dlhnet.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
We split it into 2 functions, foo_live_iterate, and foo_live_complete.
At this point, we only remove the bits that are for the other stage,
functionally this is equivalent to previous code.
Signed-off-by: Juan Quintela <quintela@redhat.com>
This patch splits stage 1 to its own function for both save_live
users, ram and block. It is just a copy of the function, removing the
parts of the other stages. Optimizations would came later.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Enable the creation of a method to tell migration if that section is
active and should be migrate. We use it for blk-migration, that is
normally not active. We don't create the method for RAM, as setups
without RAM are very strange O:-)
Signed-off-by: Juan Quintela <quintela@redhat.com>
Notice that the live migration users never unregister, so no problem
about freeing the ops structure.
Signed-off-by: Juan Quintela <quintela@redhat.com>
The Monitor object is passed back and forth within the migration/savevm
code so that it can print errors and progress to the user.
However, that approach assumes a HMP monitor, being completely invalid
in QMP.
This commit drops almost every single usage of the Monitor object, all
monitor_printf() calls have been converted into DPRINTF() ones.
There are a few remaining Monitor objects, those are going to be dropped
by the next commit.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
All files under GPLv2 will get GPLv2+ changes starting tomorrow.
event_notifier.c and exec-obsolete.h were only ever touched by Red Hat
employees and can be relicensed now.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Initially done with the following semantic patch:
@ rule1 @
expression E;
statement S;
@@
E =
(
bdrv_aio_readv
| bdrv_aio_writev
| bdrv_aio_flush
| bdrv_aio_discard
| bdrv_aio_ioctl
)
(...);
(
- if (E == NULL) { ... }
|
- if (E)
{ <... S ...> }
)
which however missed the occurrence in block/blkverify.c
(as it should have done), and left behind some unused
variables.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Many places in QEMU call qemu_aio_flush() to complete all pending
asynchronous I/O. Most of these places actually want to drain all block
requests but there is no block layer API to do so.
This patch introduces the bdrv_drain_all() API to wait for requests
across all BlockDriverStates to complete. As a bonus we perform checks
after qemu_aio_wait() to ensure that requests really have finished.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
These errors were detected by codespell:
remaing -> remaining
soley -> solely
virutal -> virtual
seperate -> separate
libcacard.txt still needs some more patches.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Make *save_live() return negative values when there is one error, and
updates all callers to check for the error.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Now the function returned errno, so it is better the new name.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Anthony Liguori <aliguori@us.ibm.com>
error_report() prepends location, and appends a newline. The message
constructed from the arguments should not contain a newline. Fix the
obvious offenders.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
block_mig_state.total_time is currently the sum of the read request
latencies. This is not very accurate because block migration uses aio and
so several requests can be submitted at once. Bandwidth should be computed
with wall-clock time, not by adding the latencies. In this case,
"total_time" has a higher value than it should, and so the computed
bandwidth is lower than it is in reality. This means that migration can
take longer than it needs to.
However, we don't want to use pure wall-clock time here. We are computing
bandwidth in the asynchronous phase, where the migration repeatedly wakes
up and sends some aio requests. The computed bandwidth will be used for
synchronous transfer.
Signed-off-by: Avishay Traeger <avishay@il.ibm.com>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
block_mig_state.reads is an int, and multiplying by BLOCK_SIZE yielded a
negative number, resulting in a negative bandwidth (running on a 32-bit
machine). Change order to avoid.
Signed-off-by: Avishay Traeger <avishay@il.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Set block device in use during block migration, disallow drive_del and
bdrv_truncate for in use devices.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
So that ejection of attached device by guest does not free data
in use by block migration instance.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
CC: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
b02bea3a85 added a check on the return
value of bdrv_write and aborts migration when it fails. However, if the
size of the block device to migrate is not a multiple of BLOCK_SIZE
(currently 1 MB), the last bdrv_write will fail with -EIO.
Fixed by calling bdrv_write with the correct size of the last block.
Signed-off-by: Pierre Riteau <Pierre.Riteau@irisa.fr>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>