This adds 2 wrappers to read the unallocated_blocks_are_zero and
can_write_zeroes_with_unmap info from the BDI. The wrappers are
required to check for the existence of a backing_hd and
if the devices are opened with the correct flags.
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
This helper function behaves similarly to co_sleep_ns(), but the
sleeping coroutine will be resumed when using qemu_aio_wait().
Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Tested-by: Liu Yuan <namei.unix@gmail.com>
Reviewed-by: Liu Yuan <namei.unix@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The block layer generally keeps the size of an image cached in
bs->total_sectors so that it doesn't have to perform expensive
operations to get the size whenever it needs it.
This doesn't work however when using a backend that can change its size
without qemu being aware of it, i.e. passthrough of removable media like
CD-ROMs or floppy disks. For this reason, the caching is disabled when a
removable device is used.
It is obvious that checking whether the _guest_ device has removable
media isn't the right thing to do when we want to know whether the size
of the host backend can change. To make things worse, non-top-level
BlockDriverStates never have any device attached, which makes qemu
assume they are removable, so drv->bdrv_getlength() is always called on
the protocol layer. In the case of raw-posix, this causes unnecessary
lseek() system calls, which turned out to be rather expensive.
This patch completely changes the logic and disables bs->total_sectors
caching only for certain block driver types, for which a size change is
expected: host_cdrom and host_floppy on POSIX, host_device on win32; also
the raw format in case it sits on top of one of these protocols, but in
the common case the nested bdrv_getlength() call on the protocol driver
will use the cache again and avoid an expensive drv->bdrv_getlength()
call.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
This field is used by blkverify to disable external snapshots creation.
It will also be used by block filters like quorum to disable external
snapshot creation.
Signed-off-by: Benoit Canet <benoit@irqsave.net>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
if a raw device like an iscsi target or host device is used
the current implementation makes a second call out to get
the block status of bs->file.
Signed-off-by: Peter Lieven <pl@kamp.de>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Add a function for generically dumping the ImageInfoSpecific information
in a human-readable format to block/qapi.c.
Use this function in bdrv_image_info_dump and qemu-io-cmds.c:info_f to
allow qemu-img info resp. qemu-io -c info to print that format specific
information.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Add a function for retrieving an ImageInfoSpecific object from a block
driver.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Switch the string to enum type BlockJobType in BlockJobDriver.
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
We will use BlockJobType as the enum type name of block jobs in QAPI,
rename current BlockJobType to BlockJobDriver, which will eventually
become a set of operations, similar to block drivers.
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Some drivers will have driver specifics options but no filename.
This new bool allow the block layer to treat them correctly.
The .bdrv_needs_filename is set in drivers not having .bdrv_parse_filename and
not having .bdrv_open.
The first exception to this rule will be the quorum driver.
Signed-off-by: Benoit Canet <benoit@irqsave.net>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Add an Error ** parameter to bdrv_create and its associated functions to
allow more specific error messages.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Add an Error ** parameter to bdrv_open, bdrv_file_open and associated
functions to allow more specific error messages.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Add an Error ** parameter to BlockDriver.bdrv_open and
BlockDriver.bdrv_file_open to allow more specific error messages.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Snapshot creation actually already distinguish id and name since it take
a structured parameter *sn, but delete can't. Later an accurate delete
is needed in qmp_transaction abort and blockdev-snapshot-delete-sync,
so change its prototype. Also *errp is added to tip error, but return
value is kepted to let caller check what kind of error happens. Existing
caller for it are savevm, delvm and qemu-img, they are not impacted by
introducing a new function bdrv_snapshot_delete_by_id_or_name(), which
check the return value and do the operation again.
Before this patch:
For qcow2, it search id first then name to find the one to delete.
For rbd, it search name.
For sheepdog, it does nothing.
After this patch:
For qcow2, logic is the same by call it twice in caller.
For rbd, it always fails in delete with id, but still search for name
in second try, no change to user.
Some code for *errp is based on Pavel's patch.
Signed-off-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com>
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
To make it clear about id and name in searching, add this API
to distinguish them. Caller can choose to search by id or name,
*errp will be set only for exception.
Some code are modified based on Pavel's patch.
Signed-off-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com>
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This patch adds the "amend" option to qemu-img which allows changing
image options on existing image files. It also adds the generic bdrv
implementation which is basically just a wrapper for the image format
specific function.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Define the return value of get_block_status. Bits 0, 1, 2 and 9-62
are valid; bit 63 (the sign bit) is reserved for errors. Bits 3-8
are left for future extensions.
The return code is compatible with the old is_allocated API: if a driver
only returns 0 or 1 (aka BDRV_BLOCK_DATA) like is_allocated used to,
clients of is_allocated will not have any change in behavior. Still,
we will return more precise information in the next patches and the
new definition of bdrv_is_allocated is already prepared for this.
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
For now, bdrv_get_block_status is just another name for bdrv_is_allocated.
The next patches will add more flags.
This also touches all block drivers with a mostly mechanical rename. The
sole exception is cow; because it calls cow_co_is_allocated from the read
code, we keep that function and make cow_co_get_block_status a wrapper.
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Now that bdrv_is_allocated detects coroutine context, the two can
use the same code.
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
bdrv_is_allocated can detect coroutine context and go through a fast
path, similar to other block layer functions.
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Manage BlockDriverState lifecycle with refcnt, so bdrv_delete() is no
longer public and should be called by bdrv_unref() if refcnt is
decreased to 0.
This is an identical change because effectively, there's no multiple
reference of BDS now: no caller of bdrv_ref() yet, only bdrv_new() sets
bs->refcnt to 1, so all bdrv_unref() now actually delete the BDS.
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Introduce bdrv_ref/bdrv_unref to manage the lifecycle of
BlockDriverState. They are unused for now but will used to replace
bdrv_delete() later.
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
If the refcount of a refcount block is greater than one, we can at least
try to repair that problem by duplicating the affected block.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Convert block_job_sleep_ns and co_sleep_ns to use the new timer
API.
Signed-off-by: Alex Bligh <alex@alex.org.uk>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Add aio_timer_init and aio_timer_new wrapper functions.
Signed-off-by: Alex Bligh <alex@alex.org.uk>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Add a QEMUTimerListGroup each AioContext (meaning a QEMUTimerList
associated with each clock is added) and delete it when the
AioContext is freed.
Signed-off-by: Alex Bligh <alex@alex.org.uk>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
include/qemu/timer.h has no need to include main-loop.h and
doing so causes an issue for the next patch. Unfortunately
various files assume including timers.h will pull in main-loop.h.
Untangle this mess.
Signed-off-by: Alex Bligh <alex@alex.org.uk>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
In 4146b46c42e0989cb5842e04d88ab6ccb1713a48 (block: Produce zeros when
protocols reading beyond end of file), we break qemu-iotests ./check
-qcow2 022. This happens because qcow2 temporarily sets ->growable = 1
for vmstate accesses (which are stored beyond the end of regular image
data).
We introduce the bs->zero_beyond_eof to allow qcow2_load_vmstate() to
disable ->zero_beyond_eof temporarily in addition to enable ->growable.
[Since the broken patch "block: Produce zeros when protocols reading
beyond end of file" has not been merged yet, I have applied this fix
*first* and will then apply the next patch to keep the tree bisectable.
-- Stefan]
Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Asias He <asias@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
The .io_flush() handler no longer exists and has no users. Drop the
io_flush argument to aio_set_fd_handler() and related functions.
The AioFlushEventNotifierHandler and AioFlushHandler typedefs are no
longer used and are dropped too.
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
The throttling code was segfaulting since commit
02ffb50448 because some qemu_co_queue_next caller
does not run in a coroutine.
qemu_co_queue_do_restart assume that the caller is a coroutinne.
As suggested by Stefan fix this by entering the coroutine directly.
Also make sure like suggested that qemu_co_queue_next() and
qemu_co_queue_restart_all() can be called only in coroutines.
Signed-off-by: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
This patch adds sync-modes to the drive-backup interface and
implements the FULL, NONE and TOP modes of synchronization.
FULL performs as before copying the entire contents of the drive
while preserving the point-in-time using CoW.
NONE only copies new writes to the target drive.
TOP copies changes to the topmost drive image and preserves the
point-in-time using CoW.
For sync mode TOP are creating a new target image using the same backing
file as the original disk image. Then any new data that has been laid
on top of it since creation is copied in the main backup_run() loop.
There is an extra check in the 'TOP' case so that we don't bother to copy
all the data of the backing file as it already exists in the target.
This is where the bdrv_co_is_allocated() is used to determine if the
data exists in the topmost layer or below.
Also any new data being written is intercepted via the write_notifier
hook which ends up calling backup_do_cow() to copy old data out before
it gets overwritten.
For mode 'NONE' we create the new target image and only copy in the
original data from the disk image starting from the time the call was
made. This preserves the point in time data by only copying the parts
that are *going to change* to the target image. This way we can
reconstruct the final image by checking to see if the given block exists
in the new target image first, and if it does not, you can get it from
the original image. This is basically an optimization allowing you to
do point-in-time snapshots with low overhead vs the 'FULL' version.
Since there is no old data to copy out the loop in backup_run() for the
NONE case just calls qemu_coroutine_yield() which only wakes up after
an event (usually cancel in this case). The rest is handled by the
before_write notifier which again calls backup_do_cow() to write out
the old data so it can be preserved.
Signed-off-by: Ian Main <imain@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
BH will be used outside big lock, so introduce lock to protect
between the writers, ie, bh's adders and deleter. The lock only
affects the writers and bh's callback does not take this extra lock.
Note that for the same AioContext, aio_bh_poll() can not run in
parallel yet.
Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
bdrv_flush() can fail, and bdrv_flush_all() should return an error as
well if this happens for a block device. It returns the first error
return now, but still at least tries to flush the remaining devices even
in error cases.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
One of the major reasons for doing something new for -blockdev and
blockdev-add was that the old block layer code parses filenames instead
of just taking them literally. So we should really leave it untouched
when it's passing using the new interfaces (like -drive
file.filename=...).
This allows opening relative file names that contain a colon.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
.has_zero_init defaults to 1 for all formats and protocols.
this is a dangerous default since this means that all
new added drivers need to manually overwrite it to 0 if
they do not ensure that a device is zero initialized
after bdrv_create().
if a driver needs to explicitly set this value to
1 its easier to verify the correctness in the review process.
during review of the existing drivers it turned out
that ssh and gluster had a wrong default of 1.
both protocols support host_devices as backend
which are not by default zero initialized. this
wrong assumption will lead to possible corruption
if qemu-img convert is used to write to such a backend.
vpc and vmdk also defaulted to 1 altough they support
fixed respectively flat extends. this has to be addresses
in separate patches. both formats as well as the mentioned
ssh and gluster are turned to the default of 0 with this
patch for safety.
a similar problem with the wrong default existed for
iscsi most likely because the driver developer did
oversee the default value of 1.
Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
backup_start() creates a block job that copies a point-in-time snapshot
of a block device to a target block device.
We call backup_do_cow() for each write during backup. That function
reads the original data from the block device before it gets
overwritten. The data is then written to the target device.
Currently backup cluster size is hardcoded to 65536 bytes.
[I made a number of changes to Dietmar's original patch and folded them
in to make code review easy. Here is the full list:
* Drop BackupDumpFunc interface in favor of a target block device
* Detect zero clusters with buffer_is_zero() and use bdrv_co_write_zeroes()
* Use 0 delay instead of 1us, like other block jobs
* Unify creation/start functions into backup_start()
* Simplify cleanup, free bitmap in backup_run() instead of cb
* function
* Use HBitmap to avoid duplicating bitmap code
* Use bdrv_getlength() instead of accessing ->total_sectors
* directly
* Delete the backup.h header file, it is no longer necessary
* Move ./backup.c to block/backup.c
* Remove #ifdefed out code
* Coding style and whitespace cleanups
* Use bdrv_add_before_write_notifier() instead of blockjob-specific hooks
* Keep our own in-flight CowRequest list instead of using block.c
tracked requests. This means a little code duplication but is much
simpler than trying to share the tracked requests list and use the
backup block size.
* Add on_source_error and on_target_error error handling.
* Use trace events instead of DPRINTF()
-- stefanha]
Signed-off-by: Dietmar Maurer <dietmar@proxmox.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The bdrv_add_before_write_notifier() function installs a callback that
is invoked before a write request is processed. This will be used to
implement copy-on-write point-in-time snapshots where we need to copy
out old data before overwriting it.
Note that BdrvTrackedRequest is moved to block_int.h since it is passed
to .notify() functions.
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The RDMA event channel can be made non-blocking just like a TCP
socket. Exporting this function allows us to yield so that the
QEMU monitor remains available.
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Chegu Vinod <chegu_vinod@hp.com>
Tested-by: Chegu Vinod <chegu_vinod@hp.com>
Tested-by: Michael R. Hines <mrhines@us.ibm.com>
Signed-off-by: Michael R. Hines <mrhines@us.ibm.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Now image info will be retrieved as an embbed json object inside
BlockDeviceInfo, backing chain info and all related internal snapshot
info can be got in the enhanced recursive structure of ImageInfo. New
recursive member *backing-image is added to reflect the backing chain
status.
Signed-off-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
This patch adds function bdrv_query_image_info(), which will
retrieve image info in qmp object format. The implementation is
based on the code moved from qemu-img.c, but uses block layer
function to get snapshot info.
Signed-off-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
This patch adds function bdrv_query_snapshot_info_list(), which will
retrieve snapshot info of an image in qmp object format. The implementation
is based on the code moved from qemu-img.c with modification to fit more
for qmp based block layer API.
Signed-off-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
bdrv_snapshot_dump() and bdrv_image_info_dump() do not dump to a buffer now,
some internal buffers are still used for format control, which have no
chance to be truncated. As a result, these two functions have no more issue
of truncation, and they can be used by both qemu and qemu-img with correct
parameter specified.
Signed-off-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>