Error management is important for mirroring; otherwise, an error on the
target (even something as "innocent" as ENOSPC) requires to start again
with a full copy. Similar to on_read_error/on_write_error, two separate
knobs are provided for on_source_error (reads) and on_target_error (writes).
The default is 'report' for both.
The 'ignore' policy will leave the sector dirty, so that it will be
retried later. Thus, it will not cause corruption.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Switching to the target of the migration is done mostly asynchronously,
and reported to management via the BLOCK_JOB_COMPLETED event; the only
synchronous phase is opening the backing files. bdrv_open_backing_file
can always be done, even for migration of the full image (aka sync:
'full'). In this case, qmp_drive_mirror will create the target disk
with no backing file at all, and bdrv_open_backing_file will be a no-op.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This patch adds the implementation of a new job that mirrors a disk to
a new image while letting the guest continue using the old image.
The target is treated as a "black box" and data is copied from the
source to the target in the background. This can be used for several
purposes, including storage migration, continuous replication, and
observation of the guest I/O in an external program. It is also a
first step in replacing the inefficient block migration code that is
part of QEMU.
The job is possibly never-ending, but it is logically structured into
two phases: 1) copy all data as fast as possible until the target
first gets in sync with the source; 2) keep target in sync and
ensure that reopening to the target gets a correct (full) copy
of the source data.
The second phase is indicated by the progress in "info block-jobs"
reporting the current offset to be equal to the length of the file.
When the job is cancelled in the second phase, QEMU will run the
job until the source is clean and quiescent, then it will report
successful completion of the job.
In other words, the BLOCK_JOB_CANCELLED event means that the target
may _not_ be consistent with a past state of the source; the
BLOCK_JOB_COMPLETED event means that the target is consistent with
a past state of the source. (Note that it could already happen
that management lost the race against QEMU and got a completion
event instead of cancellation).
It is not yet possible to complete the job and switch over to the target
disk. The next patches will fix this and add many refinements to the
basic idea introduced here. These include improved error management,
some tunable knobs and performance optimizations.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This simplifies some code and error checking, and also fixes a bug.
bdrv_find_backing_image() should only be passed absolute filenames,
or filenames relative to the chain. In the QMP message handler for
block commit, when looking up the base do so from the determined top
image, so we know it is reachable from top.
Some of the error messages put out by block-commit have changed
slightly, which causes 2 tests cases for block-commit to fail.
This patch updates the test cases to look for the correct error
output.
Signed-off-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* 'trivial-patches' of git://github.com/stefanha/qemu:
versatilepb: Use symbolic indices for ARM PIC
qdev: kill bogus comment
qemu-barrier: Fix compiler version check for future gcc versions
hw: Add missing 'static' attribute for QEMUMachine
cleanup useless return sentence
qemu-sockets: Fix compiler warning (regression for MinGW)
vnc: Fix spelling (hellmen -> hellman) in comment
slirp: Fix spelling in comment (enought -> enough, insure -> ensure)
tcg/arm: Use tcg_out_mov_reg rather than inline equivalent code
cpu: Add missing 'static' attribute to qemu_global_mutex
configure: Support empty target list (--target-list=)
hw: Fix return value check for bdrv_read, bdrv_write
This patch cleans up return sentences in the end of void functions.
Reported-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Amos Kong <akong@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@gmail.com>
Avoid strncpy+manual-NUL-terminate. Use pstrcpy instead.
Acked-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Jim Meyering <meyering@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
* parse_vdiname: Use pstrcpy, not strncpy, when the destination
buffer must be NUL-terminated.
* sd_open: Likewise, avoid buffer overrun.
* do_sd_create: Likewise. Leave the preceding memset, since
pstrcpy does not NUL-fill, and filename needs that.
* sd_snapshot_create: Add a comment/question.
* find_vdi_name: Remove a useless memset.
* sd_snapshot_goto: Remove a useless memset.
Use pstrcpy to NUL-terminate, because find_vdi_name requires
that its vdi arg (filename parameter) be NUL-terminated.
It seems ok not to NUL-fill the buffer.
Do the same for snapid: remove useless memset-0 (instead,
zero tag[0]). Use pstrcpy, not strncpy.
* sd_snapshot_list: Use pstrcpy, not strncpy to write
into the ->name member. Each must be NUL-terminated.
Acked-by: Kevin Wolf <kwolf@redhat.com>
Acked-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Signed-off-by: Jim Meyering <meyering@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Currently it is impossible to write a blkdebug script that ping-pongs
between two states, because the second set-state rule will use the
state that is set in the first. If you have
[set-state]
event = "..."
state = "1"
new_state = "2"
[set-state]
event = "..."
state = "2"
new_state = "1"
for example the state will remain locked at 1. This can be fixed
by first processing all rules, and then setting the state.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This patch adds support for error management to streaming.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This will let block-stream reuse the enum. Places that used the enums
are renamed accordingly.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This adds the live commit coroutine. This iteration focuses on the
commit only below the active layer, and not the active layer itself.
The behaviour is similar to block streaming; the sectors are walked
through, and anything that exists above 'base' is committed back down
into base. At the end, intermediate images are deleted, and the
chain stitched together. Images are restored to their original open
flags upon completion.
Signed-off-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This patch adds gluster as the new block backend in QEMU. This gives
QEMU the ability to boot VM images from gluster volumes. Its already
possible to boot from VM images on gluster volumes using FUSE mount, but
this patchset provides the ability to boot VM images from gluster volumes
by by-passing the FUSE layer in gluster. This is made possible by
using libgfapi routines to perform IO on gluster volumes directly.
VM Image on gluster volume is specified like this:
file=gluster[+transport]://[server[:port]]/volname/image[?socket=...]
'gluster' is the protocol.
'transport' specifies the transport type used to connect to gluster
management daemon (glusterd). Valid transport types are
tcp, unix and rdma. If a transport type isn't specified, then tcp
type is assumed.
'server' specifies the server where the volume file specification for
the given volume resides. This can be either hostname, ipv4 address
or ipv6 address. ipv6 address needs to be within square brackets [ ].
If transport type is 'unix', then 'server' field should not be specifed.
The 'socket' field needs to be populated with the path to unix domain
socket.
'port' is the port number on which glusterd is listening. This is optional
and if not specified, QEMU will send 0 which will make gluster to use the
default port. If the transport type is unix, then 'port' should not be
specified.
'volname' is the name of the gluster volume which contains the VM image.
'image' is the path to the actual VM image that resides on gluster volume.
Examples:
file=gluster://1.2.3.4/testvol/a.img
file=gluster+tcp://1.2.3.4/testvol/a.img
file=gluster+tcp://1.2.3.4:24007/testvol/dir/a.img
file=gluster+tcp://[1:2:3:4:5:6:7:8]/testvol/dir/a.img
file=gluster+tcp://[1:2:3:4:5:6:7:8]:24007/testvol/dir/a.img
file=gluster+tcp://server.domain.com:24007/testvol/dir/a.img
file=gluster+unix:///testvol/dir/a.img?socket=/tmp/glusterd.socket
file=gluster+rdma://1.2.3.4:24007/testvol/a.img
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* kwolf/for-anthony:
block: remove keep_read_only flag from BlockDriverState struct
block: convert bdrv_commit() to use bdrv_reopen()
block: vpc image file reopen
block: vdi image file reopen
block: vmdk image file reopen
block: qcow image file reopen
block: qcow2 image file reopen
block: qed image file reopen
block: raw image file reopen
block: raw-posix image file reopen
block: purge s->aligned_buf and s->aligned_buf_size from raw-posix.c
block: use BDRV_O_NOCACHE instead of s->aligned_buf in raw-posix.c
block: do not parse BDRV_O_CACHE_WB in block drivers
block: move open flag parsing in raw block drivers to helper functions
block: move aio initialization into a helper function
block: Framework for reopening files safely
block: make bdrv_set_enable_write_cache() modify open_flags
block: correctly set the keep_read_only flag
blockdev: preserve readonly and snapshot states across media changes
There is currently nothing that needs to be done for VPC image
file reopen.
Signed-off-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
There is currently nothing that needs to be done for VDI reopen.
Signed-off-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This patch supports reopen for VMDK image files. VMDK extents are added
to the existing reopen queue, so that the transactional model of reopen
is maintained with multiple image files.
Signed-off-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
These are the stubs for the file reopen drivers for the qcow format.
There is currently nothing that needs to be done by the qcow driver
in reopen.
Signed-off-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
These are the stubs for the file reopen drivers for the qcow2 format.
There is currently nothing that needs to be done by the qcow2 driver
in reopen.
Signed-off-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
These are the stubs for the file reopen drivers for the qed format.
There is currently nothing that needs to be done by the qed driver
in reopen.
Signed-off-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
These are the stubs for the file reopen drivers for the raw format.
There is currently nothing that needs to be done by the raw driver
in reopen.
Signed-off-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This is derived from the Supriya Kannery's reopen patches.
This contains the raw-posix driver changes for the bdrv_reopen_*
functions. All changes are staged into a temporary scratch buffer
during the prepare() stage, and copied over to the live structure
during commit(). Upon abort(), all changes are abandoned, and the
live structures are unmodified.
The _prepare() will create an extra fd - either by means of a dup,
if possible, or opening a new fd if not (for instance, access
control changes). Upon _commit(), the original fd is closed and
the new fd is used. Upon _abort(), the duplicate/new fd is closed.
Signed-off-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The aligned_buf pointer and aligned_buf size are no longer used in
raw_posix.c, so remove all references to them.
Signed-off-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Rather than check for a non-NULL aligned_buf to determine if
raw_aio_submit needs to check for alignment, check for the presence
of BDRV_O_NOCACHE in the bs->open_flags.
Signed-off-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Block drivers should ignore BDRV_O_CACHE_WB in .bdrv_open flags,
and in the bs->open_flags.
This patch removes the code, leaving the behaviour behind as if
BDRV_O_CACHE_WB was set.
Signed-off-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Code motion, to move parsing of open flags into a helper function.
Signed-off-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Move AIO initialization for raw-posix block driver into a helper function.
In addition to just code motion, the aio_ctx pointer is checked for NULL,
prior to calling laio_init(), to make sure laio_init() is only run once.
Signed-off-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
We no longer need to explicitely call qemu_notify_event() any more
since this is now done automatically any time the filehandles we listen
to change.
Signed-off-by: Ronnie Sahlberg <ronniesahlberg@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
We need to support SG_IO from the synchronous iscsi_ioctl() since
scsi-block uses this to do an INQ to the device to discover its properties
This patch makes scsi-block work with iscsi.
Signed-off-by: Ronnie Sahlberg <ronniesahlberg@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
ccc-analyzer reports these warnings:
block/vdi.c:704:13: warning: Dereference of null pointer
bmap[i] = VDI_UNALLOCATED;
^
block/vdi.c:702:13: warning: Dereference of null pointer
bmap[i] = i;
^
Moving some code into the if block fixes this.
It also avoids calling function write with 0 bytes of data.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Report from smatch:
block/curl.c:546 curl_close(21) info: redundant null check on s->url calling free()
The check was redundant, and free was also wrong because the memory
was allocated using g_strdup.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This patch sets data to be sent to Sheepdog correctly and fixes savevm
and loadvm operations on a Sheepdog image.
Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* kwolf/for-anthony:
qemu-iotests: add backing file smaller than image test case
stream: complete early if end of backing file is reached
qed: refuse unaligned zero writes with a backing file
It is possible to create an image that is larger than its backing file.
Reading beyond the end of the backing file produces zeroes if no writes
have been made to those sectors in the image file.
This patch finishes streaming early when the end of the backing file is
reached. Without this patch the block job hangs and continually tries
to stream the first sectors beyond the end of the backing file.
To reproduce the hung block job bug:
$ qemu-img create -f qcow2 backing.qcow2 128M
$ qemu-img create -f qcow2 -o backing_file=backing.qcow2 image.qcow2 6G
$ qemu -drive if=virtio,cache=none,file=image.qcow2
(qemu) block_stream virtio0
(qemu) info block-jobs
The qemu-iotests 030 streaming test still passes.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Zero writes have cluster granularity in QED. Therefore they can only be
used to zero entire clusters.
If the zero write request leaves sectors untouched, zeroing the entire
cluster would obscure the backing file. Instead return -ENOTSUP, which
is handled by block.c:bdrv_co_do_write_zeroes() and falls back to a
regular write.
The qemu-iotests 034 test cases covers this scenario.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The number of blocks of the device is used to compute the device size
in bdrv_getlength()/iscsi_getlength().
For MMC devices, the ReturnedLogicalBlockAddress in the READCAPACITY10
has a special meaning when it is 0.
In this case it does not mean that LBA 0 is the last accessible LBA,
and thus the device has 1 readable block, but instead it means that the
disc is blank and there are no readable blocks.
This change ensures that when the iSCSI LUN is loaded with a blank
DVD-R disk or similar that bdrv_getlength() will return the correct
size of the device as 0 bytes.
Signed-off-by: Ronnie Sahlberg <ronniesahlberg@gmail.com>
* kwolf/for-anthony:
virtio-blk: hide VIRTIO_BLK_F_CONFIG_WCE from old machine types
Documentation: Warn against qemu-img on active image
vmdk: Read footer for streamOptimized images
vmdk: Fix header structure
Conflicts:
hw/virtio-blk.c
This patch fixes two main issues with block/iscsi.c:
1) iscsi_task_mgmt_abort_task_async calls iscsi_scsi_task_cancel which
was also directly called in iscsi_aio_cancel
2) a race between task completion and task abortion could happen cause
the scsi_free_scsi_task were done before iscsi_schedule_bh has finished.
To fix this, all the freeing of IscsiTasks and releasing of the AIOCBs
is centralized in iscsi_bh_cb, independent of whether the SCSI command
has completed or was cancelled.
3) iscsi_aio_cancel was not synchronously waiting for the end of the
command.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
It is always used with the same callback, remove the argument. And
its return value is never used, assume allocation succeeds.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This reverts commit 64e69e8092. The commit
returned immediately from iscsi_aio_cancel, risking corruption in case the
following happens:
guest qemu target
=========================================================================
send write 1 -------->
send write 1 -------->
cancel write 1 ------>
cancel write 1 ------>
<------------------ cancellation processed
send write 2 -------->
send write 2 -------->
<---------------- completed write 2
<------------------ completed write 2
<---------------- completed write 1
<---------------- cancellation not done
Here, the guest would see write 2 superseding write 1, when in fact the
outcome could have been the opposite. The right behavior is to return
only after the target says whether the cancellation was done or not, and
it will be implemented by the next three patches.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The footer takes precedence over the header when it exists. It contains
the real grain directory offset that is missing in the header. Without
this patch, streamOptimized images with a footer cannot be read.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>