We don't want to commit to the API yet before everything is worked out.
Like already for 1.5, disable it again for the 1.6 release. This commit
is meant to be reverted after the 1.6 release.
The disabling of the driver-specific options is achieved by applying the
old checks while parsing the command line.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
The macro g_assert_not_reached is a better self documenting replacement
for assert(0) or assert(false).
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
This patch adds sync-modes to the drive-backup interface and
implements the FULL, NONE and TOP modes of synchronization.
FULL performs as before copying the entire contents of the drive
while preserving the point-in-time using CoW.
NONE only copies new writes to the target drive.
TOP copies changes to the topmost drive image and preserves the
point-in-time using CoW.
For sync mode TOP are creating a new target image using the same backing
file as the original disk image. Then any new data that has been laid
on top of it since creation is copied in the main backup_run() loop.
There is an extra check in the 'TOP' case so that we don't bother to copy
all the data of the backing file as it already exists in the target.
This is where the bdrv_co_is_allocated() is used to determine if the
data exists in the topmost layer or below.
Also any new data being written is intercepted via the write_notifier
hook which ends up calling backup_do_cow() to copy old data out before
it gets overwritten.
For mode 'NONE' we create the new target image and only copy in the
original data from the disk image starting from the time the call was
made. This preserves the point in time data by only copying the parts
that are *going to change* to the target image. This way we can
reconstruct the final image by checking to see if the given block exists
in the new target image first, and if it does not, you can get it from
the original image. This is basically an optimization allowing you to
do point-in-time snapshots with low overhead vs the 'FULL' version.
Since there is no old data to copy out the loop in backup_run() for the
NONE case just calls qemu_coroutine_yield() which only wakes up after
an event (usually cancel in this case). The rest is handled by the
before_write notifier which again calls backup_do_cow() to write out
the old data so it can be preserved.
Signed-off-by: Ian Main <imain@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The old 'cache' option really encodes three different boolean flags into
a cache mode name, without providing all combinations. Make them three
separate options instead and translate the old option to the new ones
for drive_init().
The specific boolean options take precedence if the old cache option is
specified as well, so the following options are equivalent:
-drive file=x,cache=none,cache.no-flush=true
-drive file=x,cache.writeback=true,cache.direct=true,cache.no-flush=true
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
In QMP, we want to use dashes instead of underscores in QMP argument
names, and use nested options for throttling.
The new option names affect the command line as well, but for
compatibility drive_init() will convert the old option names before
calling into the code that will be shared between -drive and
blockdev-add.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
This is traditionally -drive format=..., which is now translated into
the new driver option. This gives us a more consistent way to select the
driver of BlockDriverStates that can be used in QMP context, too.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
The drive-backup command is similar to the drive-mirror command, except
no guest data written after the command executes gets copied. Add a
sync mode argument which determines whether the entire disk is copied,
just allocated clusters, or only clusters being written to by the guest.
Currently only sync mode 'full' is supported - it copies the entire disk.
For read-only point-in-time snapshots we may only need sync mode 'none'
since the target can be a qcow2 file using the guest's disk as its
backing file (no need to copy the entire disk). Finally, sync mode
'top' is useful if we wish to preserve the backing chain.
Note that this patch just adds the sync mode argument to drive-backup.
It does not implement sync modes 'top' or 'none'. This patch is
necessary so we can add a drive-backup HMP command that behaves like the
existing drive-mirror HMP command and takes a sync mode.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The Abort action can be used to test QMP 'transaction' failure. Add it
as the last action to exercise the .abort() and .cleanup() code paths
for all previous actions.
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This patch adds a transactional version of the drive-backup QMP command.
It allows atomic snapshots of multiple drives along with automatic
cleanup if there is a failure to start one of the backup jobs.
Note that QMP events are emitted for block job completion/cancellation
and the block job will be listed by query-block-jobs.
@device: the name of the device whose writes should be mirrored.
@target: the target of the new image. If the file exists, or if it
is a device, the existing file/device will be used as the new
destination. If it does not exist, a new file will be created.
@format: #optional the format of the new destination, default is to
probe if @mode is 'existing', else the format of the source
@mode: #optional whether and how QEMU should create a new image, default is
'absolute-paths'.
@speed: #optional the maximum speed, in bytes per second
@on-source-error: #optional the action to take on an error on the source,
default 'report'. 'stop' and 'enospc' can only be used
if the block device supports io-status (see BlockInfo).
@on-target-error: #optional the action to take on an error on the target,
default 'report' (no limitations, since this applies to
a different block device than @device).
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Some QMP 'transaction' types don't need to do anything on .commit().
Make .commit() optional just like .abort().
The "drive-backup" action will take advantage of this, it only needs to
cancel the block job on .abort(). Other block job actions will probably
follow the same pattern, so allow .commit() to be NULL.
Suggested-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The QMP 'transaction' command keeps a list of in-flight transactions.
The transaction state structure is called BlkTransactionStates even
though it only deals with a single transaction. The only plural thing
is the linked list of transaction states.
I find it confusing to call the single structure "States". This patch
renames it to "State", just like BlockDriverState is singular.
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
@drive-backup
Start a point-in-time copy of a block device to a new destination. The
status of ongoing drive-backup operations can be checked with
query-block-jobs where the BlockJobInfo.type field has the value 'backup'.
The operation can be stopped before it has completed using the
block-job-cancel command.
@device: the name of the device which should be copied.
@target: the target of the new image. If the file exists, or if it
is a device, the existing file/device will be used as the new
destination. If it does not exist, a new file will be created.
@format: #optional the format of the new destination, default is to
probe if @mode is 'existing', else the format of the source
@mode: #optional whether and how QEMU should create a new image, default is
'absolute-paths'.
@speed: #optional the maximum speed, in bytes per second
@on-source-error: #optional the action to take on an error on the source,
default 'report'. 'stop' and 'enospc' can only be used
if the block device supports io-status (see BlockInfo).
@on-target-error: #optional the action to take on an error on the target,
default 'report' (no limitations, since this applies to
a different block device than @device).
Note that @on-source-error and @on-target-error only affect background I/O.
If an error occurs during a guest write request, the device's rerror/werror
actions will be used.
Returns: nothing on success
If @device is not a valid block device, DeviceNotFound
Since 1.6
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Use bdrv_getlength() for its byte units and error return instead of
bdrv_get_geometry().
Reported-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
It is not necessary to check that we can find a protocol block driver
since we create or open the image file. This produces the error that we
need anyway.
Besides, the QERR_INVALID_BLOCK_FORMAT is inappropriate since the
protocol is incorrect rather than the format.
Also drop an empty line between bdrv_open() and checking its return
value. This may be due to copy-pasting from earlier code that performed
other operations before handling errors.
Reported-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Paolo Bonzini <pbonzini@redhat.com> suggested the following test case:
1. Launch a guest and wait at the GRUB boot menu:
qemu-system-x86_64 -enable-kvm -m 1024 \
-drive if=none,cache=none,file=test.img,id=foo,werror=stop,rerror=stop
-device virtio-blk-pci,drive=foo,id=virtio0,addr=4
2. Hot unplug the device:
(qemu) drive_del foo
3. Select the first boot menu entry
Without this patch the guest pauses due to ENOMEDIUM. The guest is
stuck in a continuous pause loop since the I/O request is retried and
fails immediately again when the guest is resumed.
With this patch the error is reported to the guest.
Note that this scenario actually happens sometimes during libvirt disk
hot unplug, where device_del is followed by drive_del. I/O may still be
submitted to the drive after drive_del if the guest does not process the
PCI hot unplug notification.
Reported-by: Dafna Ron <dron@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
We may want to include a driver in the whitelist for read only tasks
such as diagnosing or exporting guest data (with libguestfs as a good
example). This patch introduces a readonly whitelist option, and for
backward compatibility, the old configure option --block-drv-whitelist
is now an alias to rw whitelist.
Drivers in readonly list is only permitted to open file readonly, and
returns -ENOTSUP for RW opening.
E.g. To include vmdk readonly, and others read+write:
./configure --target-list=x86_64-softmmu \
--block-drv-rw-whitelist=qcow2,raw,file,qed \
--block-drv-ro-whitelist=vmdk
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
There's no reason to restrict transactions to operations related to
block devices, so rename the type now before schema introspection stops
us from doing so.
Also change the schema documentation of 'transaction' to not refer to
block devices or snapshots any more.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Make it easier to add other operations to qmp_transaction() by using
callbacks, with external snapshots serving as an example implementation
of the callbacks.
Signed-off-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The code is simply moved into a separate function.
Signed-off-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The code is moved into preparation function, and changed
a bit to tip more clearly what it is doing.
Signed-off-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The code before really committing is moved into a function. Most
code is simply moved from qmp_transaction(), except that on fail it
just returns now. Other code such as input parsing is not touched,
to make it easier in review.
Signed-off-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
We have an errno value that can be displayed, so we should just do that.
An easy way to reproduce this case is to resize a raw image to a size
that is too large for the host file system.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
We don't want to commit to the API yet before everything is worked out.
Disable it for the 1.5 release. This commit is meant to be reverted
after the 1.5 release.
The disabling of the driver-specific options is achieved by applying the
old checks while parsing the command line.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Many of these should be cleaned up with proper qdev-/QOM-ification.
Right now there are many catch-all headers in include/hw/ARCH depending
on cpu.h, and this makes it necessary to compile these files per-target.
However, fixing this does not belong in these patches.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
It is not necessary to adjust the slice time at runtime. We already
extend the current slice in order to carry over accounting into the next
slice. Changing the actual slice time value introduces oscillations.
The guest may experience large changes in throughput or IOPS from one
moment to the next when slice times are adjusted.
Reported-by: Benoît Canet <benoit@irqsave.net>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-By: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
After this patch, using -drive with an empty file name continues to open
the file if driver-specific options are used. If no driver-specific
options are specified, the semantics stay as it was: It defines a drive
without an inserted medium.
In order to achieve this, bdrv_open() must be made safe to work with a
NULL filename parameter. The assumption that is made is that only block
drivers which implement bdrv_parse_filename() support using driver
specific options and could therefore work without a filename. These
drivers must make sure to cope with NULL in their implementation of
.bdrv_open() (this is only NBD for now). For all other drivers, the
block layer code will make sure to error out before calling into their
code - they can't possibly work without a filename.
Now an NBD connection can be opened like this:
qemu-system-x86_64 -drive file.driver=nbd,file.port=1234,file.host=::1
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
this patch ensures that all pending IOs are completed
before a device is resized. this is especially important
if a device is shrinked as it the bdrv_check_request()
result is invalidated.
Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Screwed up in commit 666daa68. Thanks to Kevin Wolf for reminding me
to fix this.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Any non-default -drive options are now passed down to the block drivers.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Pointing to a QemuOpts element is surprising and can lead to subtle
use-after-free errors when the QemuOpts is freed after all options are
parsed.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
It doesn't do anything yet except storing the options QDict in the
BlockDriverState.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Add support for BDRV_O_UNMAP from the QEMU command-line.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
There can be a need to turn output to stdout off. This patch adds a -q option
that enable "Quiet mode". In Quiet mode, only errors are printed out.
Signed-off-by: Miroslav Rezanina <mrezanin@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Negative I/O throttling iops and bps values do not make sense so reject
them with an error message.
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
The do_check_io_limits() function returns false when I/O limits are
invalid but it doesn't set an Error to indicate why. The two
do_check_io_limits() callers duplicate error reporting. Solve this by
passing an Error pointer into do_check_io_limits().
Note that the two callers report slightly different errors: drive_init()
prints a custom error message while qmp_block_set_io_throttle() does
error_set(errp, QERR_INVALID_PARAMETER_COMBINATION).
QERR_INVALID_PARAMETER_COMBINATION is a generic error, see
include/qapi/qmp/qerror.h:
#define QERR_INVALID_PARAMETER_COMBINATION \
ERROR_CLASS_GENERIC_ERROR, "Invalid parameter combination"
Since it is generic we are not obliged to keep this error. Switch to
the custom error message which contains more information.
This patch prepares for adding additional checks with their own error
messages to do_check_io_limits(). The next patch adds a new check.
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
# By Paolo Bonzini (14) and others
# Via Kevin Wolf
* kwolf/for-anthony: (24 commits)
ide: Add fall through annotations
block: Create proper size file for disk mirror
ahci: Add migration support
ahci: Change data types in preparation for migration
ahci: Remove unused AHCIDevice fields
hbitmap: add assertion on hbitmap_iter_init
mirror: do nothing on zero-sized disk
block/vdi: Check for bad signature
block/vdi: Improved return values from vdi_open
block/vdi: Improve debug output for signature
block: Use error code EMEDIUMTYPE for wrong format in some block drivers
block: Add special error code for wrong format
mirror: support arbitrarily-sized iterations
mirror: support more than one in-flight AIO operation
mirror: add buf-size argument to drive-mirror
mirror: switch mirror_iteration to AIO
mirror: allow customizing the granularity
block: allow customizing the granularity of the dirty bitmap
block: return count of dirty sectors, not chunks
mirror: perform COW if the cluster size is bigger than the granularity
...
The qmp monitor command to mirror a disk was passing -1 for size
along with the disk's backing file. This size of the resulting disk
is the size of the backing file, which is incorrect if the disk
has been resized. Therefore we should always pass in the size of
the current disk.
Signed-off-by: Vishvananda Ishaya <vishvananda@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The block drivers need a special error code for "wrong format".
From the available error codes EMEDIUMTYPE fits best.
It is not available on all platforms, so a definition in
qemu-common.h and a specific error report are needed.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This makes sense when the next commit starts using the extra buffer space
to perform many I/O operations asynchronously.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The desired granularity may be very different depending on the kind of
operation (e.g. continuous replication vs. collapse-to-raw) and whether
the VM is expected to perform lots of I/O while mirroring is in progress.
Allow the user to customize it, while providing a sane default so that
in general there will be no extra allocated space in the target compared
to the source.
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
When mirroring runs, the backing files for the target may not yet be
ready. However, this means that a copy-on-write operation on the target
would fill the missing sectors with zeros. Copy-on-write only happens
if the granularity of the dirty bitmap is smaller than the cluster size
(and only for clusters that are allocated in the source after the job
has started copying). So far, the granularity was fixed to 1MB; to avoid
the problem we detected the situation and required the backing files to
be available in that case only.
However, we want to lower the granularity for efficiency, so we need
a better solution. The solution is to always copy a whole cluster the
first time it is touched. The code keeps a bitmap of clusters that
have already been allocated by the mirroring job, and only does "manual"
copy-on-write if the chunk being copied is zero in the bitmap.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The non-live bdrv_commit() function may return one of the following
errors: -ENOTSUP, -EBUSY, -EACCES, -EIO. The only error that is
checked in the HMP handler is -EBUSY, so the monitor command 'commit'
silently fails for all error cases other than 'Device is in use'.
Report error using monitor_printf() and strerror(), and convert existing
qerror_report() calls in do_commit() to monitor_printf().
Signed-off-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
We will use qemu_opts_create_nofail function, it can make code
more readable.
Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This commit adds an Error ** argument to bdrv_img_create() and set it
appropriately on error.
Callers of bdrv_img_create() pass NULL for the new argument and still
rely on bdrv_img_create()'s return value. Next commits will change
callers to use the Error object instead.
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
There are QEMUMachines that have neither IF_IDE nor IF_SCSI as a
default/standard interface to their block devices / drives. Therefore,
this patch introduces a new field default_block_type per QEMUMachine
struct. The prior use_scsi field becomes thereby obsolete and is
replaced through .default_block_type = IF_SCSI.
This patch also changes the default for s390x to IF_VIRTIO and
removes an early hack that converts IF_IDE drives.
Other parties have already claimed interest (e.g. IF_SD for exynos)
To create a sane default, for machines that dont specify a
default_block_type, this patch makes IF_IDE = 0 and IF_NONE = 1.
I checked all users of IF_NONE (blockdev.c and ww/device-hotplug.c)
as well as IF_IDE and it seems that it is ok to change the defines -
in other words, I found no obvious (to me) assumption in the code
regarding IF_NONE==0. IF_NONE is only set if there is an
explicit if=none. Without if=* the interface becomes IF_DEFAULT.
I would suggest to have some additional care, e.g. by letting
this patch sit some days in the block tree.
Based on an initial patch from Einar Lueck <elelueck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
CC: Igor Mitsyanko <i.mitsyanko@samsung.com>
CC: Markus Armbruster <armbru@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Alexander Graf <agraf@suse.de>
Acked-by: Igor Mitsyanko <i.mitsyanko@samsung.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Releases of qemu-kvm will be interrupted at qemu 1.3.0.
Users should switch to plain qemu releases.
To avoid breaking scenarios which are setup with command line
options specific to qemu-kvm, port these switches from qemu-kvm
to qemu.git.
Port drive boot option. From the qemu-kvm original commit message:
We do not want to maintain this option forever. It will be removed after
a grace period of a few releases. So warn the user that this option has
no effect and will become invalid soon.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Error management is important for mirroring; otherwise, an error on the
target (even something as "innocent" as ENOSPC) requires to start again
with a full copy. Similar to on_read_error/on_write_error, two separate
knobs are provided for on_source_error (reads) and on_target_error (writes).
The default is 'report' for both.
The 'ignore' policy will leave the sector dirty, so that it will be
retried later. Thus, it will not cause corruption.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This adds the monitor commands that start the mirroring job.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Even for jobs that need to be manually completed, management may want
to take care itself of the completion, not requiring the user to issue
a command to terminate the job. In this case we want to avoid that
they poll us continuously, waiting for completion to become available.
Thus, add a new event that signals the phase switch and the availability
of the block-job-complete command.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
While streaming can be dropped as soon as it progressed through the whole
image, mirroring needs to be completed manually for two reasons: 1) so that
management knows exactly when the VM switches to the target; 2) because
for other use cases such as replication, we may leave the operation running
for the whole life of the virtual machine.
Add a new block job command that manually completes background operations.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This simplifies some code and error checking, and also fixes a bug.
bdrv_find_backing_image() should only be passed absolute filenames,
or filenames relative to the chain. In the QMP message handler for
block commit, when looking up the base do so from the determined top
image, so we know it is reachable from top.
Some of the error messages put out by block-commit have changed
slightly, which causes 2 tests cases for block-commit to fail.
This patch updates the test cases to look for the correct error
output.
Signed-off-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This patch cleans up return sentences in the end of void functions.
Reported-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Amos Kong <akong@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@gmail.com>
This patch adds support for error management to streaming.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Do this while we are touching this part of the code, before introducing
more uses of "int is_read".
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This will let block-stream reuse the enum. Places that used the enums
are renamed accordingly.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Add QMP commands matching the functionality.
Paused jobs cannot be canceled without first resuming them. This
ensures that I/O errors are never missed by management. However, an
optional force argument can be specified to allow that.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Job pausing reuses the existing support for cancellable sleeps. A pause
happens at the next sleeping point and lasts until the coroutine is
re-entered explicitly. Cancellation was already doing a forced resume,
so implement it explicitly in terms of resume.
Paused jobs cannot be canceled without first resuming them. This ensures
that I/O errors are never missed by management.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Extract it out of the implementation of info block-jobs.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The DeviceNotActive text is not a particularly good match, add
a separate text while keeping the same class.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The command for live block commit is added, which has the following
arguments:
device: the block device to perform the commit on (mandatory)
base: the base image to commit into; optional (if not specified,
it is the underlying original image)
top: the top image of the commit - all data from inside top down
to base will be committed into base (mandatory for now; see
note, below)
speed: maximum speed, in bytes/sec
Note: Eventually this command will support merging down the active layer,
but that code is not yet complete. If the active layer is passed
in as top, then an error will be returned. Once merging down the
active layer is supported, the 'top' argument may become optional,
and default to the active layer.
The is done as a block job, so upon completion a BLOCK_JOB_COMPLETED will
be emitted.
Signed-off-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Currently, after a live snapshot of a drive, the image that has
been 'demoted' to be below the new active layer remains r/w.
This patch reopens it read-only.
Note that we do not check for error on the reopen(), because we
will not abort the snapshots if the reopen fails.
This patch depends on the bdrv_reopen() series.
Signed-off-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
If readonly=on is given at device creation time, the ->readonly flag
needs to be set in the block driver state for this device so that
readonly-ness is preserved across media changes (qmp change command).
Similarly, to preserve the snapshot property requires ->open_flags to
be correct.
Signed-off-by: Kevin Shanahan <kmshanah@disenchant.net>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Now all major device models (IDE, SCSI, virtio) can choose between
writethrough and writeback at run-time, and virtio will even revert
to writethrough if the guest is not capable of sending flushes. So
we can change the default to writeback at last.
Tested, for lack of a better idea, with a breakpoint on bdrv_open
and all cache choices one by one.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
For command line options which permit '?' meaning 'please list the
permitted values', add support for 'help' as a synonym, by abstracting
the check out into a helper function.
This change means that in some cases where we were being lazy in
our string parsing, "?junk" will now be rejected as an invalid option
rather than being (undocumentedly) treated the same way as "?".
Update the documentation to use 'help' rather than '?', since '?'
is a shell metacharacter and thus prone to fail confusingly if there
is a single character filename in the current working directory and
the '?' has not been escaped. It's therefore better to steer users
towards 'help', though '?' is retained for backwards compatibility.
We do not, however, update the output of the system emulator's -help
(or any documentation autogenerated from the qemu-options.hx which
is the source of the -help text) because libvirt parses our -help
output and will break. At a later date when QEMU provides a better
interface so libvirt can avoid having to do this, we can update the
-help text too.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
All current users (IDE, SCSI and virtio-blk) happen to share this 20
characters limit. Still, it should be left to device models. They
already enforce their limits. They have to, as the DriveInfo limit
only affects legacy -drive serial=..., not the qdev properties.
usb-storage, which doesn't limit serial number length, also uses
DriveInfo for -usbdevice. But that doesn't provide access to
DriveInfo serial.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
There are two producers of these hints: drive_init() on behalf of
-drive, and hd_geometry_guess().
The only consumer of the hint is hd_geometry_guess().
The callers of hd_geometry_guess() call it only when drive_init()
didn't set the hints. Therefore, drive_init()'s hints are never used.
Thus, hd_geometry_guess() only ever sees hints it produced itself in a
prior call. Only the first call computes something, subsequent calls
just repeat the first call's results. However, hd_geometry_guess() is
never called more than once: the device models don't, and the block
device is destroyed on unplug. Thus, dropping the repeat feature
doesn't break anything now.
If a block device wasn't destroyed on unplug and could be reused with
a new device, then repeating old results would be wrong. Thus,
dropping the repeat feature prevents future breakage.
This renders the hints unused. Purge them from the block layer.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
In preparation of purging it from the block layer, which will happen
later in this series.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
If the image is read-only then it's not possible to copy read data into
it. Therefore copy-on-read is automatically disabled for read-only
images.
Up until now this behavior was silent, add a warning so the user knows
why copy-on-read is not working.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This commit converts qemu_opts_create() from qerror_report() to
error_set().
Currently, most calls to qemu_opts_create() can't fail, so most
callers don't need any changes.
The two cases where code checks for qemu_opts_create() erros are:
1. Initialization code in vl.c. All of them print their own
error messages directly to stderr, no need to pass the Error
object
2. The functions opts_parse(), qemu_opts_from_qdict() and
qemu_chr_parse_compat() make use of the error information and
they can be called from HMP or QMP. In this case, to allow for
incremental conversion, we propagate the error up using
qerror_report_err(), which keeps the QError semantics
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Reviewed-By: Laszlo Ersek <lersek@redhat.com>
Allow streaming operations to be started with an initial speed limit.
This eliminates the window of time between starting streaming and
issuing block-job-set-speed. Users should use the new optional 'speed'
parameter instead so that speed limits are in effect immediately when
the job starts.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Acked-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Acked-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
There are at least two different errors that can occur in
block_job_set_speed(): the job might not support setting speeds or the
value might be invalid.
Use the Error mechanism to report the error where it occurs.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Acked-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
The block job API uses -errno return values internally and we convert
these to Error in the QMP functions. This is ugly because the Error
should be created at the point where we still have all the relevant
information. More importantly, it is hard to add new error cases to
this case since we quickly run out of -errno values without losing
information.
Go ahead and use Error directly and don't convert later.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Acked-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Open images with BDRV_O_INCOMING in order to inform block drivers
that an incoming live migration is coming.
Signed-off-by: Benoit Canet <benoit.canet@gmail.com>
Reviewed-by: Stefan Hajnoczi <stefanha@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
We do not want jobs to keep a device busy for a possibly very long
time, and management could become confused because they thought a
device was not even there anymore. So, cancel long-running jobs
as soon as their device is going to disappear.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
strncpy may not null-terminate the destination string.
Cc: kwolf@redhat.com
Signed-off-by: Floris Bos <dev@noc-ps.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Simplify the blockdev-snapshot-sync code and gain failsafe operation
by turning it into a wrapper around the new transaction command. A new
option is also added matching "mode".
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The mode field lets a management application create the snapshot
destination outside QEMU.
Right now, the only modes are "existing" and "absolute-paths". Mirroring
introduces "no-backing-file". In the future "relative-paths" could be
implemented too.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
We will add other kinds of operation. Prepare for this by adjusting
the schema.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Monitor operations that manipulate image files must not execute while a
background job (like image streaming) is in progress. This prevents
corruptions from happening when two pieces of code are manipulating the
image file without knowledge of each other.
The monitor "commit" command raises QERR_DEVICE_IN_USE when
bdrv_commit() returns -EBUSY but "commit all" has no error handling.
This is easy to fix, although note that we do not deliver a detailed
error about which device was busy in the "commit all" case.
Suggested-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This is a QAPI/QMP only command to take a snapshot of a group of
devices. This is similar to the blockdev-snapshot-sync command, except
blockdev-group-snapshot-sync accepts a list devices, filenames, and
formats.
It is attempted to keep the snapshot of the group atomic; if the
creation or open of any of the new snapshots fails, then all of
the new snapshots are abandoned, and the name of the snapshot image
that failed is returned. The failure case should not interrupt
any operations.
Rather than use bdrv_close() along with a subsequent bdrv_open() to
perform the pivot, the original image is never closed and the new
image is placed 'in front' of the original image via manipulation
of the BlockDriverState fields. Thus, once the new snapshot image
has been successfully created, there are no more failure points
before pivoting to the new snapshot.
This allows the group of disks to remain consistent with each other,
even across snapshot failures.
Signed-off-by: Jeff Cody <jcody@redhat.com>
Acked-by: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Add support for streaming data from an intermediate section of the
image chain (see patch and documentation for details).
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Unplugging a storage interface like virtio-blk causes the host block
device to be deleted too. Long-running operations like block migration
must take a DriveInfo reference to prevent the BlockDriverState from
being freed. For image streaming we can do the same thing.
Note that it is not possible to acquire/release the drive reference in
block.c where the block job functions live because
drive_get_ref()/drive_put_ref() are blockdev.c functions. Calling them
from block.c would be a layering violation - tools like qemu-img don't
even link against blockdev.c.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Add query-block-jobs, which shows the progress of ongoing block device
operations.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Acked-by: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Add block_job_cancel, which stops an active block streaming operation.
When the operation has been cancelled the new BLOCK_JOB_CANCELLED event
is emitted.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Acked-by: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Add block_job_set_speed, which sets the maximum speed for a background
block operation.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Acked-by: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Add the block_stream command, which starts copy backing file contents
into the image file. Also add the BLOCK_JOB_COMPLETED QMP event which
is emitted when image streaming completes. Later patches add control
over the background copy speed, cancelation, and querying running
streaming operations.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Acked-by: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Long-running block operations like block migration and image streaming
must have continual access to their block device. It is not safe to
perform operations like hotplug, eject, change, resize, commit, or
external snapshot while a long-running operation is in progress.
This patch adds the missing bdrv_in_use() checks so that block migration
and image streaming never have the rug pulled out from underneath them.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Let's report specific errors so that management tools and users can
identify the problem.
Two new qerrors are needed:
* QERR_DEVICE_HAS_NO_MEDIUM for ENOMEDIUM
* QERR_DEVICE_IS_READ_ONLY for EACCES
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Also drops the leftover 'mon' argument.
This is a preparation for the next commits which will port the
eject and change commands to the QAPI.
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Unfortunately, this conversion required an additional change.
In the old QMP command, the 'snapshot-file' argument is specified as
optional. The idea is to take the snapshot internally if 'snapshot-file'
is not passed. However, internal snapshots are not supported yet so
the command returns a MissingParamater error if 'snapshot-file' is not
passed. Which makes the argument actually required and will cause
compatibility breakage if we change that in the future.
To fix this the QAPI converted blockdev_snapshot_sync command makes the
'snapshot-file' argument required. Again, in practice it's actually required,
so this is not incompatible.
If we do implement internal snapshots someday, we'll need a new argument
for it.
Note that this discussion doesn't affect HMP.
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Many places in QEMU call qemu_aio_flush() to complete all pending
asynchronous I/O. Most of these places actually want to drain all block
requests but there is no block layer API to do so.
This patch introduces the bdrv_drain_all() API to wait for requests
across all BlockDriverStates to complete. As a bonus we perform checks
after qemu_aio_wait() to ensure that requests really have finished.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This patch adds the -drive copy-on-read=on|off command-line option:
copy-on-read=on|off
copy-on-read is "on" or "off" and enables whether to copy read backing
file sectors into the image file. Copy-on-read avoids accessing the
same backing file sectors repeatedly and is useful when the backing
file is over a slow network. By default copy-on-read is off.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Recent versions of udev always keep the tray locked so that the kernel
can observe "eject request" events (aka tray button presses) even on
discs that aren't mounted. Add support for these events in the ATAPI
and SCSI cd drive device models.
To let management cope with the behavior of udev, an event should also
be added for "tray opened/closed". This way, after issuing an "eject"
command, management can poll until the guests actually reacts to the
command. They can then issue the "change" command after the tray has been
opened, or try with "eject -f" after a (configurable?) timeout. However,
with this patch and the corresponding support in the device models,
at least it is possible to do a manual two-step eject+change sequence.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
change fails while the tray is locked by the guest. eject -f forces
it open and removes any media. Unfortunately, the tray closes again
instantly. Since the lock remains as it is, there is no way to insert
another medium unless the guest voluntarily unlocks.
Fix by leaving the tray open after monitor eject.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
It's a confused mess (see previous commit). No users remain.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
BlockDriverState member removable is a confused mess. It is true when
an ide-cd, scsi-cd or floppy qdev is attached, or when the
BlockDriverState was created with -drive if={floppy,sd} or -drive
if={ide,scsi,xen,none},media=cdrom ("created removable"), except when
an ide-hd, scsi-hd, scsi-generic or virtio-blk qdev is attached.
Three users remain:
1. eject_device(), via bdrv_is_removable() uses it to determine
whether a block device can eject media.
2. bdrv_info() is monitor command "info block". QMP documentation
says "true if the device is removable, false otherwise". From the
monitor user's point of view, the only sensible interpretation of
"is removable" is "can eject media with monitor commands eject and
change".
A block device can eject media unless a device is attached that
doesn't support it. Switch the two users over to new
bdrv_dev_has_removable_media() that returns exactly that.
3. bdrv_getlength() uses to suppress its length cache when media can
change (see commit 46a4e4e6). Media change is either monitor
command change (updates the length cache), monitor command eject
(doesn't update the length cache, easily fixable), or physical
media change (invalidates length cache, not so easily fixable).
I'm refraining from improving anything here, because this series is
long enough already. Instead, I simply switch it over to
bdrv_dev_has_removable_media() as well.
This changes the behavior of the length cache and of monitor commands
eject and change in two cases:
a. drive not created removable, no device attached
The commit makes the drive removable, and defeats the length cache.
Example: -drive if=none
b. drive created removable, but the attached drive is non-removable,
and doesn't call bdrv_set_removable(..., 0) (most devices don't)
The commit makes the drive non-removable, and enables the length
cache.
Example: -drive if=xen,media=cdrom -M xenpv
The other non-removable devices that don't call
bdrv_set_removable() can't currently use a drive created removable,
either because they aren't qdevified, or because they lack a drive
property. Won't stay that way.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Requires new BlockDevOps member is_medium_locked(). Implement for IDE
and SCSI CD-ROMs.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
For now, this just protects against programming errors like having the
same drive back multiple non-qdev devices, or untimely bdrv_delete().
Later commits will add other interesting uses.
While there, rename BlockDriverState member peer to dev, bdrv_attach()
to bdrv_attach_dev(), bdrv_detach() to bdrv_detach_dev(), and
bdrv_get_attached() to bdrv_get_attached_dev().
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This patch introduces bdrv_parse_cache_flags() which sets open flags
given a cache mode. Previously this was duplicated in blockdev.c and
qemu-img.c.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Ejecting hard disk platters can only end in tears.
If you need to revoke access to an image, use drive_del, not eject -f.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Add QMP bits for snapshot_blkdev command. This is the same as
snapshot_blkdev in the human monitor. The command is synchronous.
In the future async commands and or a break down of the functionality
into multiple commands might be added.
Also change the 'snapshot_file' argument to 'snapshot-file' in
the human monitor, so that it matches QMP.
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: Luiz Capitulino <lcapitulino@gmail.com>
The current message doesn't clearly communicate the error cause.
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Change BDRV_O_NOCACHE to only imply bypassing the host OS file cache,
but no writeback semantics. All existing callers are changed to also
specify BDRV_O_CACHE_WB to give them writeback semantics.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
No users of bdrv_get_type_hint() left. bdrv_set_type_hint() can make
the media removable by side effect. Make that explicit.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
DriveInfo is closely tied to -drive, and like -drive, it mixes
information about host and guest part of the block device. Unlike
DriveInfo, BlockDriverState should be about the host part only.
One of the remaining guest bits there is the "type hint". -drive
option media sets it, and qdevs "ide-drive", "scsi-disk" and non-qdev
IF_XEN devices check it to pick HD vs. CD.
Communicate -drive option media via new DriveInfo member media_cd
instead.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
When removing a drive from the host-side via drive_del we currently have
the following path:
drive_del
qemu_aio_flush()
bdrv_close() // zaps bs->drv, which makes any subsequent I/O get
// dropped. Works as designed
drive_uninit()
bdrv_delete() // frees the bs. Since the device is still connected to
// bs, any subsequent I/O is a use-after-free.
The value of bs->drv becomes unpredictable on free. As long as it
remains null, I/O still gets dropped, however it could become non-null
at any point after the free resulting SEGVs or other QEMU state
corruption.
To resolve this issue as simply as possible, we can chose to not
actually delete the BlockDriverState pointer. Since bdrv_close()
handles setting the drv pointer to NULL, we just need to remove the
BlockDriverState from the QLIST that is used to enumerate the block
devices. This is currently handled within bdrv_delete, so move this
into its own function, bdrv_make_anon().
The result is that we can now invoke drive_del, this closes the file
descriptors and sets BlockDriverState->drv to NULL which prevents futher
IO to the device, and since we do not free BlockDriverState, we don't
have to worry about the copy retained in the block devices.
We also don't attempt to remove the qdev property since we are no longer
deleting the BlockDriverState on drives with associated drives. This
also allows for removing Drives with no devices associated either.
Reported-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Ryan Harper <ryanh@us.ibm.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
We have two different virtio buses: pci and s390. The abstraction path
taken in qemu is to have generic aliases for each device type in the
architecture specific qdev devices.
So let's make use of these aliases whenever we can and define them
whenever we can.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
In case we cannot open the newly created snapshot image, try to fall
back to the original image file and continue running on that, which
should prevent the guest from aborting.
This is a corner case which can happen if the admin by mistake
specifies the snapshot file on a virtual file system which does not
support O_DIRECT. bdrv_create() does not use O_DIRECT, but the
following open in bdrv_open() does and will then fail.
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Set block device in use during block migration, disallow drive_del and
bdrv_truncate for in use devices.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The host part of a block device can be deleted with in progress
block migration.
To fix this, add a reference count to DriveInfo, freeing resources
on last reference.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
CC: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Watch this:
(qemu) drive_add 0 if=none
(qemu) info block
none0: type=hd removable=0 [not inserted]
(qemu) drive_del none0
Segmentation fault (core dumped)
add_init_drive() is confused about drive_init()'s failure modes, and
cleans up when it shouldn't. This leaves the DriveInfo with member
opts dangling. drive_del attempts to free it, and dies.
drive_init() behaves as follows:
* If it created a drive with media, it returns its DriveInfo.
* If it created a drive without media, it clears *fatal_error and
returns NULL.
* If it couldn't create a drive, it sets *fatal_error and returns
NULL.
Of its three callers:
* drive_init_func() is correct.
* usb_msd_init() assumes drive_init() failed when it returns NULL.
This is correct only because it always passes option "file", and
"drive without media" can't happen then.
* add_init_drive() assumes drive_init() failed when it returns NULL.
This is incorrect.
Clean up drive_init() to return NULL on failure and only on failure.
Drop its parameter fatal_error.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Let the callers build the optstr. Only one wants to. All the others
become simpler, because they don't have to worry about escaping '%'.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
We silently ignore multiple definitions for the same drive:
$ qemu-system-x86_64 -nodefaults -vnc :1 -S -monitor stdio -drive if=ide,index=1,file=tmp.qcow2 -drive if=ide,index=1,file=nonexistant
QEMU 0.13.50 monitor - type 'help' for more information
(qemu) info block
ide0-hd1: type=hd removable=0 file=tmp.qcow2 backing_file=tmp.img ro=0 drv=qcow2 encrypted=0
With if=none, this can become quite confusing:
$ qemu-system-x86_64 -nodefaults -vnc :1 -S -monitor stdio -drive if=none,index=1,file=tmp.qcow2,id=eins -drive if=none,index=1,file=nonexistant,id=zwei -device ide-drive,drive=eins -device ide-drive,drive=zwei
qemu-system-x86_64: -device ide-drive,drive=zwei: Property 'ide-drive.drive' can't find value 'zwei'
The second -device fails, because it refers to drive zwei, which got
silently ignored.
Make multiple drive definitions fail cleanly.
Unfortunately, there's code that relies on multiple drive definitions
being silently ignored: main() merrily adds default drives even when
the user already defined these drives. Fix that up.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Before, type & index were hidden in printf-like fmt, ... parameters,
which get expanded into an option string. Rather inconvenient for
uses later in this series.
New IF_DEFAULT to ask for the machine's default interface. Before,
that was done by having no option "if" in the option string.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Before commit 622b520f, index=12 meant bus=1,unit=5.
Since the commit, it means bus=0,unit=12. The drive is created, but
not the guest device. That's because the controllers we use with
if=scsi drives (lsi53c895a and esp) support only 7 units, and
scsi_bus_legacy_handle_cmdline() ignores drives with unit numbers
exceeding that limit.
Changing the mapping of index to bus, unit is a regression. Breaking
-drive invocations that used to work just makes it worse.
Revert the part of commit 622b520f that causes this, and clean up
some.
Note that the fix only affects if=scsi. You can still put more than 7
units on a SCSI bus with -device & friends.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Turns drive_init()'s lengthy conditional into a concise loop, and
makes the data available elsewhere.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
qdev_init_bdrv() doesn't belong into qdev.c; it's about drives, not
qdevs. Rename to drive_get_next, move to blockdev.c, drop the bogus
DeviceState argument, and return DriveInfo instead of
BlockDriverState.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Add a monitor command that allows resizing of block devices while
qemu is running. It uses the existing bdrv_truncate method already
used by qemu-img to do it's work. Compared to qemu-img the size
parsing is very simplicistic, but I think having a properly numering
object is more useful for non-humand monitor users than having
the units and relative resize parsing.
For SCSI devices the new size can be updated in Linux guests by
doing the following shell command:
echo > /sys/class/scsi_device/0:0:0:0/device/rescan
For ATA devices I don't know of a way to update the block device
size in Linux system, and for virtio-blk the next two patches
will provide an automatic update of the size when this command
is issued on the host.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Watch this:
(qemu) drive_add 0 if=none,file=tmp.img
OK
(qemu) info block
none0: type=hd removable=0 file=tmp.img ro=0 drv=raw encrypted=0
(qemu) drive_del none0
Segmentation fault (core dumped)
do_drive_del()'s code to clean up the pointer from a qdev using the
drive back to the drive needs to check whether such a device exists.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This makes the errors point to the error location, and fixes drive_add
to report errors in the monitor instead of stderr.
While there, tweak a few error messages for consistency.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
When cyls, heads or secs are out of range, the error message prints
buf, which points to the value of option "if". Bogus, may even be
null. Drop that.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Current code does not support snapshot internally to the running
image. Error in case no snapshot_file is specified.
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The monitor command is:
snapshot_blkdev <device> [snapshot-file] [format]
Default format is qcow2. For now snapshots without a snapshot-file, eg
internal snapshots, are not supported.
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
If a user decides to punish a guest by revoking its block device via
drive_del, and subsequently also attempts to remove the pci device
backing it, and the device is using blockdev_auto_del() then we get a
segfault when we attempt to access dinfo->auto_del.[1]
The fix is to check if drive_get_by_blockdev() actually returns a valid
dinfo pointer or not.
1. (qemu) pci_add auto storage file=images/test01.raw,if=virtio,id=block1,snapshot=on
(qemu) drive_del block1
(qemu) pci_del 5
*segfault*
Signed-off-by: Ryan Harper <ryanh@us.ibm.com>
Tested-by: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Currently device hotplug removal code is tied to device removal via
ACPI. All pci devices that are removable via device_del() require the
guest to respond to the request. In some cases the guest may not
respond leaving the device still accessible to the guest. The management
layer doesn't currently have a reliable way to revoke access to host
resource in the presence of an uncooperative guest.
This patch implements a new monitor command, drive_del, which
provides an explicit command to revoke access to a host block device.
drive_del first quiesces the block device (qemu_aio_flush;
bdrv_flush() and bdrv_close()). This prevents further IO from being
submitted against the host device. Finally, drive_del cleans up
pointers between the drive object (host resource) and the device
object (guest resource).
Signed-off-by: Ryan Harper <ryanh@us.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This implements the rerror option for SCSI disks.
It also includes minor changes to the write path where the same code is used
that was criticized in the review for the changes to the read path required for
rerror support.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Switch tree to lookup-by-name using qemu_find_opts().
Also hook up virtfs options so qemu_find_opts works for them too.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Block device change command did not copy BDRV_O_SNAPSHOT flag. Thus
the new image did not have this flag and the file got deleted during
opening.
Fix by copying BDRV_O_SNAPSHOT flag.
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Since commit cb4e5f8e, monitor command change makes the new media
readonly iff the type hint is BDRV_TYPE_CDROM, i.e. the drive was
created with media=cdrom. The intention is to avoid changing a block
device's read-only-ness. However, BDRV_TYPE_CDROM is only a hint. It
is currently sufficent for read-only. But it's not necessary, and it
may not remain sufficient.
Use bdrv_is_read_only() instead.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
We automatically delete blockdev host parts on unplug of the guest
device. Too much magic, but we can't change that now.
The delete happens early in the guest device teardown, before the
connection to the host part is severed. Thus, the guest part's
pointer to the host part dangles for a brief time. No actual harm
comes from this, but we'll catch such dangling pointers a few commits
down the road. Clean up the dangling pointers by delaying the
automatic deletion until the guest part's pointer is gone.
Device usb-storage deliberately makes two qdev properties refer to the
same drive, because it automatically creates a second device. Again,
too much magic we can't change now. Multiple references worked okay
before, but now free_drive() dies for the second one. Zap the extra
reference.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Historically, user monitor arguments beginning with '-' (eg. '-f')
were passed as integers down to handlers.
I've maintained this behavior in the new monitor because we didn't
have a boolean type at the very beginning of QMP. Today we have it
and this behavior is causing trouble to QMP's argument checker.
This commit fixes the problem by doing the following changes:
1. User Monitor
Before: the optional arg was represented as a QInt, we'd pass 1
down to handlers if the user specified the argument or
0 otherwise
This commit: the optional arg is represented as a QBool, we pass
true down to handlers if the user specified the
argument, otherwise _nothing_ is passed
2. QMP
Before: the client was required to pass the arg as QBool, but we'd
convert it to QInt internally. If the argument wasn't passed,
we'd pass 0 down
This commit: still require a QBool, but doesn't do any conversion and
doesn't pass any default value
3. Convert existing handlers (do_eject()/do_migrate()) to the new way
Before: Both handlers would expect a QInt value, either 0 or 1
This commit: Change the handlers to accept a QBool, they handle the
following cases:
A) true is passed: the option is enabled
B) false is passed: the option is disabled
C) nothing is passed: option not specified, use
default behavior
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
This changes the monitor eject_device() function to not check for
bdrv_is_inserted().
Example run where the bug manifests itself:
(output of 'info block' is stripped to include only the CD-ROM device)
(qemu) info block
ide1-cd0: type=cdrom removable=1 locked=0 [not inserted]
(qemu) change ide1-cd0 /dev/cdrom host_cdrom
(qemu) info block
ide1-cd0: type=cdrom removable=1 locked=0 file=/dev/cdrom ro=1 drv=host_cdrom encrypted=0
(qemu) eject ide1-cd0
(qemu) info block
ide1-cd0: type=cdrom removable=1 locked=0 file=/dev/cdrom ro=1 drv=host_cdrom encrypted=0
# at this point, a disk was inserted on the host CD-ROM drive
(qemu) info block
ide1-cd0: type=cdrom removable=1 locked=0 file=/dev/cdrom ro=1 drv=host_cdrom encrypted=0
(qemu) eject ide1-cd0
(qemu) info block
ide1-cd0: type=cdrom removable=1 locked=0 [not inserted]
(qemu)
The first eject command didn't work because the is_inserted() check
failed.
I have no clue why the code had the is_inserted() check, as it doesn't matter
if there is a disk present at the host drive, when the user wants the virtual
device to be disconnected from the host device.
The is_inserted() check has another side effect: a memory leak if the "change"
command is used multiple times, as do_change() calls eject_device() before
re-opening the block device, but bdrv_close() is never called.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This is the list of drives defined with drive_init(). Hide it, so it
doesn't get abused.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
do_commit() and mux_proc_byte() iterate over the list of drives
defined with drive_init(). This misses host block devices defined by
other means. Such means don't exist now, but will be introduced later
in this series.
Change them to use new bdrv_commit_all(), which iterates over all host
block devices.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
That's where they belong semantically (block device host part), even
though the actions are actually executed by guest device code.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Empty file used to create an empty drive (no media). Since commit
9dfd7c7a, it's an error: "qemu: could not open disk image : No such
file or directory". Older versions of libvirt can choke on this.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
We should use 'dinfo->serial' length, 'serial' is a pointer, so
the serial number length is currently limited to the pointer size.
This fixes https://bugs.launchpad.net/qemu/+bug/584143 and is also
valid for stable.
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Anything that moves hundreds of lines out of vl.c can't be all bad.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>