In qmp_transaction, assert that the BdrvActionOps to be used is actually
valid.
This assertion failing is very improbable, however, it might happen, if
a new TransactionActionKind is introduced "out of order" and the
actions[] array is not updated.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Add an Error ** parameter to bdrv_open, bdrv_file_open and associated
functions to allow more specific error messages.
Signed-off-by: Max Reitz <mreitz@redhat.com>
This interface use id and name as optional parameters, to handle the
case that one image contain multiple snapshots with same name which
may be '', but with different id.
Adding parameter id is for historical compatiability reason, and
that case is not possible in qemu's new interface for internal
snapshot at block device level, but still possible in qemu-img.
Signed-off-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Unlike savevm, the qmp_transaction interface will not generate
snapshot name automatically, saving trouble to return information
of the new created snapshot.
Although qcow2 support storing multiple snapshots with same name
but different ID, here it will fail when an snapshot with that name
already exist before the operation. Format such as rbd do not support
ID at all, and in most case, it means trouble to user when he faces
multiple snapshots with same name, so ban that case. Request with
empty name will be rejected.
Snapshot ID can't be specified in this interface.
Signed-off-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Block jobs used drive_get_ref(drive_get_by_blockdev(bs)) to avoid BDS
being deleted. Now we have BDS reference count, and block jobs don't
care about dinfo, so replace them to get cleaner code. It is also the
safe way when BDS has no drive info.
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Manage BlockDriverState lifecycle with refcnt, so bdrv_delete() is no
longer public and should be called by bdrv_unref() if refcnt is
decreased to 0.
This is an identical change because effectively, there's no multiple
reference of BDS now: no caller of bdrv_ref() yet, only bdrv_new() sets
bs->refcnt to 1, so all bdrv_unref() now actually delete the BDS.
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
This feature can be used in case where users are avoiding the iops limit by
doing jumbo I/Os hammering the storage backend.
Signed-off-by: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
The max parameter of the leaky bucket throttling algorithm can be used to
allow the guest to do bursts.
The max value is a pool of I/O that the guest can use without being throttled
at all. Throttling is triggered once this pool is empty.
Signed-off-by: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
This is an autogenerated patch using scripts/switch-timer-api.
Switch the entire code base to using the new timer API.
Note this patch may introduce some line length issues.
Signed-off-by: Alex Bligh <alex@alex.org.uk>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
When user tries to use read-only whitelist format in the command line
option, failure message was "'foo' invalid format". It might be invalid
only for writable, but valid for read-only, so it is confusing. Give the
user easier to understand information.
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
bdrv_flags is set by bdrv_parse_discard_flags(), but later it is reset
to zero.
Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com>
Message-id: 1376483201-13466-1-git-send-email-mohan@in.ibm.com
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
When use -drive file='xxx',format=qcow2,snapshot=on the error
message "Can't use snapshot=on with driver-specific options"
can be show, and fail to start the qemu.
This should not be happened, and there is no file.driver option
in qemu command line.
It is because the commit 74fe54f2a1,
it puts 'driver' option if the command line use 'format' option.
This patch is to solve this bug.
Signed-off-by: Mike Qiu <qiudayu@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
We don't want to commit to the API yet before everything is worked out.
Like already for 1.5, disable it again for the 1.6 release. This commit
is meant to be reverted after the 1.6 release.
The disabling of the driver-specific options is achieved by applying the
old checks while parsing the command line.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
The macro g_assert_not_reached is a better self documenting replacement
for assert(0) or assert(false).
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
This patch adds sync-modes to the drive-backup interface and
implements the FULL, NONE and TOP modes of synchronization.
FULL performs as before copying the entire contents of the drive
while preserving the point-in-time using CoW.
NONE only copies new writes to the target drive.
TOP copies changes to the topmost drive image and preserves the
point-in-time using CoW.
For sync mode TOP are creating a new target image using the same backing
file as the original disk image. Then any new data that has been laid
on top of it since creation is copied in the main backup_run() loop.
There is an extra check in the 'TOP' case so that we don't bother to copy
all the data of the backing file as it already exists in the target.
This is where the bdrv_co_is_allocated() is used to determine if the
data exists in the topmost layer or below.
Also any new data being written is intercepted via the write_notifier
hook which ends up calling backup_do_cow() to copy old data out before
it gets overwritten.
For mode 'NONE' we create the new target image and only copy in the
original data from the disk image starting from the time the call was
made. This preserves the point in time data by only copying the parts
that are *going to change* to the target image. This way we can
reconstruct the final image by checking to see if the given block exists
in the new target image first, and if it does not, you can get it from
the original image. This is basically an optimization allowing you to
do point-in-time snapshots with low overhead vs the 'FULL' version.
Since there is no old data to copy out the loop in backup_run() for the
NONE case just calls qemu_coroutine_yield() which only wakes up after
an event (usually cancel in this case). The rest is handled by the
before_write notifier which again calls backup_do_cow() to write out
the old data so it can be preserved.
Signed-off-by: Ian Main <imain@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The old 'cache' option really encodes three different boolean flags into
a cache mode name, without providing all combinations. Make them three
separate options instead and translate the old option to the new ones
for drive_init().
The specific boolean options take precedence if the old cache option is
specified as well, so the following options are equivalent:
-drive file=x,cache=none,cache.no-flush=true
-drive file=x,cache.writeback=true,cache.direct=true,cache.no-flush=true
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
In QMP, we want to use dashes instead of underscores in QMP argument
names, and use nested options for throttling.
The new option names affect the command line as well, but for
compatibility drive_init() will convert the old option names before
calling into the code that will be shared between -drive and
blockdev-add.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
This is traditionally -drive format=..., which is now translated into
the new driver option. This gives us a more consistent way to select the
driver of BlockDriverStates that can be used in QMP context, too.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
The drive-backup command is similar to the drive-mirror command, except
no guest data written after the command executes gets copied. Add a
sync mode argument which determines whether the entire disk is copied,
just allocated clusters, or only clusters being written to by the guest.
Currently only sync mode 'full' is supported - it copies the entire disk.
For read-only point-in-time snapshots we may only need sync mode 'none'
since the target can be a qcow2 file using the guest's disk as its
backing file (no need to copy the entire disk). Finally, sync mode
'top' is useful if we wish to preserve the backing chain.
Note that this patch just adds the sync mode argument to drive-backup.
It does not implement sync modes 'top' or 'none'. This patch is
necessary so we can add a drive-backup HMP command that behaves like the
existing drive-mirror HMP command and takes a sync mode.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The Abort action can be used to test QMP 'transaction' failure. Add it
as the last action to exercise the .abort() and .cleanup() code paths
for all previous actions.
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This patch adds a transactional version of the drive-backup QMP command.
It allows atomic snapshots of multiple drives along with automatic
cleanup if there is a failure to start one of the backup jobs.
Note that QMP events are emitted for block job completion/cancellation
and the block job will be listed by query-block-jobs.
@device: the name of the device whose writes should be mirrored.
@target: the target of the new image. If the file exists, or if it
is a device, the existing file/device will be used as the new
destination. If it does not exist, a new file will be created.
@format: #optional the format of the new destination, default is to
probe if @mode is 'existing', else the format of the source
@mode: #optional whether and how QEMU should create a new image, default is
'absolute-paths'.
@speed: #optional the maximum speed, in bytes per second
@on-source-error: #optional the action to take on an error on the source,
default 'report'. 'stop' and 'enospc' can only be used
if the block device supports io-status (see BlockInfo).
@on-target-error: #optional the action to take on an error on the target,
default 'report' (no limitations, since this applies to
a different block device than @device).
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Some QMP 'transaction' types don't need to do anything on .commit().
Make .commit() optional just like .abort().
The "drive-backup" action will take advantage of this, it only needs to
cancel the block job on .abort(). Other block job actions will probably
follow the same pattern, so allow .commit() to be NULL.
Suggested-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The QMP 'transaction' command keeps a list of in-flight transactions.
The transaction state structure is called BlkTransactionStates even
though it only deals with a single transaction. The only plural thing
is the linked list of transaction states.
I find it confusing to call the single structure "States". This patch
renames it to "State", just like BlockDriverState is singular.
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
@drive-backup
Start a point-in-time copy of a block device to a new destination. The
status of ongoing drive-backup operations can be checked with
query-block-jobs where the BlockJobInfo.type field has the value 'backup'.
The operation can be stopped before it has completed using the
block-job-cancel command.
@device: the name of the device which should be copied.
@target: the target of the new image. If the file exists, or if it
is a device, the existing file/device will be used as the new
destination. If it does not exist, a new file will be created.
@format: #optional the format of the new destination, default is to
probe if @mode is 'existing', else the format of the source
@mode: #optional whether and how QEMU should create a new image, default is
'absolute-paths'.
@speed: #optional the maximum speed, in bytes per second
@on-source-error: #optional the action to take on an error on the source,
default 'report'. 'stop' and 'enospc' can only be used
if the block device supports io-status (see BlockInfo).
@on-target-error: #optional the action to take on an error on the target,
default 'report' (no limitations, since this applies to
a different block device than @device).
Note that @on-source-error and @on-target-error only affect background I/O.
If an error occurs during a guest write request, the device's rerror/werror
actions will be used.
Returns: nothing on success
If @device is not a valid block device, DeviceNotFound
Since 1.6
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Use bdrv_getlength() for its byte units and error return instead of
bdrv_get_geometry().
Reported-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
It is not necessary to check that we can find a protocol block driver
since we create or open the image file. This produces the error that we
need anyway.
Besides, the QERR_INVALID_BLOCK_FORMAT is inappropriate since the
protocol is incorrect rather than the format.
Also drop an empty line between bdrv_open() and checking its return
value. This may be due to copy-pasting from earlier code that performed
other operations before handling errors.
Reported-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Paolo Bonzini <pbonzini@redhat.com> suggested the following test case:
1. Launch a guest and wait at the GRUB boot menu:
qemu-system-x86_64 -enable-kvm -m 1024 \
-drive if=none,cache=none,file=test.img,id=foo,werror=stop,rerror=stop
-device virtio-blk-pci,drive=foo,id=virtio0,addr=4
2. Hot unplug the device:
(qemu) drive_del foo
3. Select the first boot menu entry
Without this patch the guest pauses due to ENOMEDIUM. The guest is
stuck in a continuous pause loop since the I/O request is retried and
fails immediately again when the guest is resumed.
With this patch the error is reported to the guest.
Note that this scenario actually happens sometimes during libvirt disk
hot unplug, where device_del is followed by drive_del. I/O may still be
submitted to the drive after drive_del if the guest does not process the
PCI hot unplug notification.
Reported-by: Dafna Ron <dron@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
We may want to include a driver in the whitelist for read only tasks
such as diagnosing or exporting guest data (with libguestfs as a good
example). This patch introduces a readonly whitelist option, and for
backward compatibility, the old configure option --block-drv-whitelist
is now an alias to rw whitelist.
Drivers in readonly list is only permitted to open file readonly, and
returns -ENOTSUP for RW opening.
E.g. To include vmdk readonly, and others read+write:
./configure --target-list=x86_64-softmmu \
--block-drv-rw-whitelist=qcow2,raw,file,qed \
--block-drv-ro-whitelist=vmdk
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
There's no reason to restrict transactions to operations related to
block devices, so rename the type now before schema introspection stops
us from doing so.
Also change the schema documentation of 'transaction' to not refer to
block devices or snapshots any more.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Make it easier to add other operations to qmp_transaction() by using
callbacks, with external snapshots serving as an example implementation
of the callbacks.
Signed-off-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The code is simply moved into a separate function.
Signed-off-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The code is moved into preparation function, and changed
a bit to tip more clearly what it is doing.
Signed-off-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The code before really committing is moved into a function. Most
code is simply moved from qmp_transaction(), except that on fail it
just returns now. Other code such as input parsing is not touched,
to make it easier in review.
Signed-off-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
We have an errno value that can be displayed, so we should just do that.
An easy way to reproduce this case is to resize a raw image to a size
that is too large for the host file system.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
We don't want to commit to the API yet before everything is worked out.
Disable it for the 1.5 release. This commit is meant to be reverted
after the 1.5 release.
The disabling of the driver-specific options is achieved by applying the
old checks while parsing the command line.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Many of these should be cleaned up with proper qdev-/QOM-ification.
Right now there are many catch-all headers in include/hw/ARCH depending
on cpu.h, and this makes it necessary to compile these files per-target.
However, fixing this does not belong in these patches.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
It is not necessary to adjust the slice time at runtime. We already
extend the current slice in order to carry over accounting into the next
slice. Changing the actual slice time value introduces oscillations.
The guest may experience large changes in throughput or IOPS from one
moment to the next when slice times are adjusted.
Reported-by: Benoît Canet <benoit@irqsave.net>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-By: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
After this patch, using -drive with an empty file name continues to open
the file if driver-specific options are used. If no driver-specific
options are specified, the semantics stay as it was: It defines a drive
without an inserted medium.
In order to achieve this, bdrv_open() must be made safe to work with a
NULL filename parameter. The assumption that is made is that only block
drivers which implement bdrv_parse_filename() support using driver
specific options and could therefore work without a filename. These
drivers must make sure to cope with NULL in their implementation of
.bdrv_open() (this is only NBD for now). For all other drivers, the
block layer code will make sure to error out before calling into their
code - they can't possibly work without a filename.
Now an NBD connection can be opened like this:
qemu-system-x86_64 -drive file.driver=nbd,file.port=1234,file.host=::1
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
this patch ensures that all pending IOs are completed
before a device is resized. this is especially important
if a device is shrinked as it the bdrv_check_request()
result is invalidated.
Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Screwed up in commit 666daa68. Thanks to Kevin Wolf for reminding me
to fix this.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Any non-default -drive options are now passed down to the block drivers.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Pointing to a QemuOpts element is surprising and can lead to subtle
use-after-free errors when the QemuOpts is freed after all options are
parsed.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
It doesn't do anything yet except storing the options QDict in the
BlockDriverState.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Add support for BDRV_O_UNMAP from the QEMU command-line.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
There can be a need to turn output to stdout off. This patch adds a -q option
that enable "Quiet mode". In Quiet mode, only errors are printed out.
Signed-off-by: Miroslav Rezanina <mrezanin@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Negative I/O throttling iops and bps values do not make sense so reject
them with an error message.
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
The do_check_io_limits() function returns false when I/O limits are
invalid but it doesn't set an Error to indicate why. The two
do_check_io_limits() callers duplicate error reporting. Solve this by
passing an Error pointer into do_check_io_limits().
Note that the two callers report slightly different errors: drive_init()
prints a custom error message while qmp_block_set_io_throttle() does
error_set(errp, QERR_INVALID_PARAMETER_COMBINATION).
QERR_INVALID_PARAMETER_COMBINATION is a generic error, see
include/qapi/qmp/qerror.h:
#define QERR_INVALID_PARAMETER_COMBINATION \
ERROR_CLASS_GENERIC_ERROR, "Invalid parameter combination"
Since it is generic we are not obliged to keep this error. Switch to
the custom error message which contains more information.
This patch prepares for adding additional checks with their own error
messages to do_check_io_limits(). The next patch adds a new check.
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
# By Paolo Bonzini (14) and others
# Via Kevin Wolf
* kwolf/for-anthony: (24 commits)
ide: Add fall through annotations
block: Create proper size file for disk mirror
ahci: Add migration support
ahci: Change data types in preparation for migration
ahci: Remove unused AHCIDevice fields
hbitmap: add assertion on hbitmap_iter_init
mirror: do nothing on zero-sized disk
block/vdi: Check for bad signature
block/vdi: Improved return values from vdi_open
block/vdi: Improve debug output for signature
block: Use error code EMEDIUMTYPE for wrong format in some block drivers
block: Add special error code for wrong format
mirror: support arbitrarily-sized iterations
mirror: support more than one in-flight AIO operation
mirror: add buf-size argument to drive-mirror
mirror: switch mirror_iteration to AIO
mirror: allow customizing the granularity
block: allow customizing the granularity of the dirty bitmap
block: return count of dirty sectors, not chunks
mirror: perform COW if the cluster size is bigger than the granularity
...
The qmp monitor command to mirror a disk was passing -1 for size
along with the disk's backing file. This size of the resulting disk
is the size of the backing file, which is incorrect if the disk
has been resized. Therefore we should always pass in the size of
the current disk.
Signed-off-by: Vishvananda Ishaya <vishvananda@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The block drivers need a special error code for "wrong format".
From the available error codes EMEDIUMTYPE fits best.
It is not available on all platforms, so a definition in
qemu-common.h and a specific error report are needed.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This makes sense when the next commit starts using the extra buffer space
to perform many I/O operations asynchronously.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The desired granularity may be very different depending on the kind of
operation (e.g. continuous replication vs. collapse-to-raw) and whether
the VM is expected to perform lots of I/O while mirroring is in progress.
Allow the user to customize it, while providing a sane default so that
in general there will be no extra allocated space in the target compared
to the source.
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
When mirroring runs, the backing files for the target may not yet be
ready. However, this means that a copy-on-write operation on the target
would fill the missing sectors with zeros. Copy-on-write only happens
if the granularity of the dirty bitmap is smaller than the cluster size
(and only for clusters that are allocated in the source after the job
has started copying). So far, the granularity was fixed to 1MB; to avoid
the problem we detected the situation and required the backing files to
be available in that case only.
However, we want to lower the granularity for efficiency, so we need
a better solution. The solution is to always copy a whole cluster the
first time it is touched. The code keeps a bitmap of clusters that
have already been allocated by the mirroring job, and only does "manual"
copy-on-write if the chunk being copied is zero in the bitmap.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The non-live bdrv_commit() function may return one of the following
errors: -ENOTSUP, -EBUSY, -EACCES, -EIO. The only error that is
checked in the HMP handler is -EBUSY, so the monitor command 'commit'
silently fails for all error cases other than 'Device is in use'.
Report error using monitor_printf() and strerror(), and convert existing
qerror_report() calls in do_commit() to monitor_printf().
Signed-off-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
We will use qemu_opts_create_nofail function, it can make code
more readable.
Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This commit adds an Error ** argument to bdrv_img_create() and set it
appropriately on error.
Callers of bdrv_img_create() pass NULL for the new argument and still
rely on bdrv_img_create()'s return value. Next commits will change
callers to use the Error object instead.
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
There are QEMUMachines that have neither IF_IDE nor IF_SCSI as a
default/standard interface to their block devices / drives. Therefore,
this patch introduces a new field default_block_type per QEMUMachine
struct. The prior use_scsi field becomes thereby obsolete and is
replaced through .default_block_type = IF_SCSI.
This patch also changes the default for s390x to IF_VIRTIO and
removes an early hack that converts IF_IDE drives.
Other parties have already claimed interest (e.g. IF_SD for exynos)
To create a sane default, for machines that dont specify a
default_block_type, this patch makes IF_IDE = 0 and IF_NONE = 1.
I checked all users of IF_NONE (blockdev.c and ww/device-hotplug.c)
as well as IF_IDE and it seems that it is ok to change the defines -
in other words, I found no obvious (to me) assumption in the code
regarding IF_NONE==0. IF_NONE is only set if there is an
explicit if=none. Without if=* the interface becomes IF_DEFAULT.
I would suggest to have some additional care, e.g. by letting
this patch sit some days in the block tree.
Based on an initial patch from Einar Lueck <elelueck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
CC: Igor Mitsyanko <i.mitsyanko@samsung.com>
CC: Markus Armbruster <armbru@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Alexander Graf <agraf@suse.de>
Acked-by: Igor Mitsyanko <i.mitsyanko@samsung.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Releases of qemu-kvm will be interrupted at qemu 1.3.0.
Users should switch to plain qemu releases.
To avoid breaking scenarios which are setup with command line
options specific to qemu-kvm, port these switches from qemu-kvm
to qemu.git.
Port drive boot option. From the qemu-kvm original commit message:
We do not want to maintain this option forever. It will be removed after
a grace period of a few releases. So warn the user that this option has
no effect and will become invalid soon.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Error management is important for mirroring; otherwise, an error on the
target (even something as "innocent" as ENOSPC) requires to start again
with a full copy. Similar to on_read_error/on_write_error, two separate
knobs are provided for on_source_error (reads) and on_target_error (writes).
The default is 'report' for both.
The 'ignore' policy will leave the sector dirty, so that it will be
retried later. Thus, it will not cause corruption.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This adds the monitor commands that start the mirroring job.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Even for jobs that need to be manually completed, management may want
to take care itself of the completion, not requiring the user to issue
a command to terminate the job. In this case we want to avoid that
they poll us continuously, waiting for completion to become available.
Thus, add a new event that signals the phase switch and the availability
of the block-job-complete command.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
While streaming can be dropped as soon as it progressed through the whole
image, mirroring needs to be completed manually for two reasons: 1) so that
management knows exactly when the VM switches to the target; 2) because
for other use cases such as replication, we may leave the operation running
for the whole life of the virtual machine.
Add a new block job command that manually completes background operations.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This simplifies some code and error checking, and also fixes a bug.
bdrv_find_backing_image() should only be passed absolute filenames,
or filenames relative to the chain. In the QMP message handler for
block commit, when looking up the base do so from the determined top
image, so we know it is reachable from top.
Some of the error messages put out by block-commit have changed
slightly, which causes 2 tests cases for block-commit to fail.
This patch updates the test cases to look for the correct error
output.
Signed-off-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This patch cleans up return sentences in the end of void functions.
Reported-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Amos Kong <akong@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@gmail.com>
This patch adds support for error management to streaming.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Do this while we are touching this part of the code, before introducing
more uses of "int is_read".
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This will let block-stream reuse the enum. Places that used the enums
are renamed accordingly.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Add QMP commands matching the functionality.
Paused jobs cannot be canceled without first resuming them. This
ensures that I/O errors are never missed by management. However, an
optional force argument can be specified to allow that.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Job pausing reuses the existing support for cancellable sleeps. A pause
happens at the next sleeping point and lasts until the coroutine is
re-entered explicitly. Cancellation was already doing a forced resume,
so implement it explicitly in terms of resume.
Paused jobs cannot be canceled without first resuming them. This ensures
that I/O errors are never missed by management.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Extract it out of the implementation of info block-jobs.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The DeviceNotActive text is not a particularly good match, add
a separate text while keeping the same class.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The command for live block commit is added, which has the following
arguments:
device: the block device to perform the commit on (mandatory)
base: the base image to commit into; optional (if not specified,
it is the underlying original image)
top: the top image of the commit - all data from inside top down
to base will be committed into base (mandatory for now; see
note, below)
speed: maximum speed, in bytes/sec
Note: Eventually this command will support merging down the active layer,
but that code is not yet complete. If the active layer is passed
in as top, then an error will be returned. Once merging down the
active layer is supported, the 'top' argument may become optional,
and default to the active layer.
The is done as a block job, so upon completion a BLOCK_JOB_COMPLETED will
be emitted.
Signed-off-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Currently, after a live snapshot of a drive, the image that has
been 'demoted' to be below the new active layer remains r/w.
This patch reopens it read-only.
Note that we do not check for error on the reopen(), because we
will not abort the snapshots if the reopen fails.
This patch depends on the bdrv_reopen() series.
Signed-off-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
If readonly=on is given at device creation time, the ->readonly flag
needs to be set in the block driver state for this device so that
readonly-ness is preserved across media changes (qmp change command).
Similarly, to preserve the snapshot property requires ->open_flags to
be correct.
Signed-off-by: Kevin Shanahan <kmshanah@disenchant.net>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Now all major device models (IDE, SCSI, virtio) can choose between
writethrough and writeback at run-time, and virtio will even revert
to writethrough if the guest is not capable of sending flushes. So
we can change the default to writeback at last.
Tested, for lack of a better idea, with a breakpoint on bdrv_open
and all cache choices one by one.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
For command line options which permit '?' meaning 'please list the
permitted values', add support for 'help' as a synonym, by abstracting
the check out into a helper function.
This change means that in some cases where we were being lazy in
our string parsing, "?junk" will now be rejected as an invalid option
rather than being (undocumentedly) treated the same way as "?".
Update the documentation to use 'help' rather than '?', since '?'
is a shell metacharacter and thus prone to fail confusingly if there
is a single character filename in the current working directory and
the '?' has not been escaped. It's therefore better to steer users
towards 'help', though '?' is retained for backwards compatibility.
We do not, however, update the output of the system emulator's -help
(or any documentation autogenerated from the qemu-options.hx which
is the source of the -help text) because libvirt parses our -help
output and will break. At a later date when QEMU provides a better
interface so libvirt can avoid having to do this, we can update the
-help text too.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
All current users (IDE, SCSI and virtio-blk) happen to share this 20
characters limit. Still, it should be left to device models. They
already enforce their limits. They have to, as the DriveInfo limit
only affects legacy -drive serial=..., not the qdev properties.
usb-storage, which doesn't limit serial number length, also uses
DriveInfo for -usbdevice. But that doesn't provide access to
DriveInfo serial.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
There are two producers of these hints: drive_init() on behalf of
-drive, and hd_geometry_guess().
The only consumer of the hint is hd_geometry_guess().
The callers of hd_geometry_guess() call it only when drive_init()
didn't set the hints. Therefore, drive_init()'s hints are never used.
Thus, hd_geometry_guess() only ever sees hints it produced itself in a
prior call. Only the first call computes something, subsequent calls
just repeat the first call's results. However, hd_geometry_guess() is
never called more than once: the device models don't, and the block
device is destroyed on unplug. Thus, dropping the repeat feature
doesn't break anything now.
If a block device wasn't destroyed on unplug and could be reused with
a new device, then repeating old results would be wrong. Thus,
dropping the repeat feature prevents future breakage.
This renders the hints unused. Purge them from the block layer.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
In preparation of purging it from the block layer, which will happen
later in this series.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
If the image is read-only then it's not possible to copy read data into
it. Therefore copy-on-read is automatically disabled for read-only
images.
Up until now this behavior was silent, add a warning so the user knows
why copy-on-read is not working.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This commit converts qemu_opts_create() from qerror_report() to
error_set().
Currently, most calls to qemu_opts_create() can't fail, so most
callers don't need any changes.
The two cases where code checks for qemu_opts_create() erros are:
1. Initialization code in vl.c. All of them print their own
error messages directly to stderr, no need to pass the Error
object
2. The functions opts_parse(), qemu_opts_from_qdict() and
qemu_chr_parse_compat() make use of the error information and
they can be called from HMP or QMP. In this case, to allow for
incremental conversion, we propagate the error up using
qerror_report_err(), which keeps the QError semantics
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Reviewed-By: Laszlo Ersek <lersek@redhat.com>