An entry in the VDI block map will hold an offset to the actual block if
the block is allocated, or one of two specially-interpreted values if
not allocated. Using VirtualBox terminology, value VDI_IMAGE_BLOCK_FREE
(0xffffffff) represents a never-allocated block (semantically arbitrary
content). VDI_IMAGE_BLOCK_ZERO (0xfffffffe) represents a "discarded"
block (semantically zero-filled). block/vdi knows only about
VDI_IMAGE_BLOCK_FREE. Teach it about VDI_IMAGE_BLOCK_ZERO.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This provides built-in support for iSCSI to QEMU.
This has the advantage that the iSCSI devices need not be made visible to the host, which is useful if you have very many virtual machines and very many iscsi devices.
It also has the benefit that non-root users of QEMU can access iSCSI devices across the network without requiring root privilege on the host.
This driver interfaces with the multiplatform posix library for iscsi initiator/client access to iscsi devices hosted at
git://github.com/sahlberg/libiscsi.git
The patch adds the driver to interface with the iscsi library.
It also updated the configure script to
* by default, probe is libiscsi is available and if so, build
qemu against libiscsi.
* --enable-libiscsi
Force a build against libiscsi. If libiscsi is not available
the build will fail.
* --disable-libiscsi
Do not link against libiscsi, even if it is available.
When linked with libiscsi, qemu gains support to access iscsi resources such as disks and cdrom directly, without having to make the devices visible to the host.
You can specify devices using a iscsi url of the form :
iscsi://[<username>[:<password>@]]<host>[:<port]/<target-iqn-name>/<lun>
When using authentication, the password can optionally be set with
LIBISCSI_CHAP_PASSWORD="password" to avoid it showing up in the process list
Signed-off-by: Ronnie Sahlberg <ronniesahlberg@gmail.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Since coroutine operation is now mandatory, convert both bdrv_discard
implementations to coroutines. For qcow2, this means taking the lock
around the operation. raw-posix remains synchronous.
The bdrv_discard callback is then unused and can be eliminated.
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Since coroutine operation is now mandatory, convert all bdrv_flush
implementations to coroutines. For qcow2, this means taking the lock.
Other implementations are simpler and just forward bdrv_flush to the
underlying protocol, so they can avoid the lock.
The bdrv_flush callback is then unused and can be eliminated.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This does the first part of the conversion to coroutines, by
wrapping bdrv_write implementations to take the mutex.
Drivers that implement bdrv_write rather than bdrv_co_writev can
then benefit from asynchronous operation (at least if the underlying
protocol supports it, which is not the case for raw-win32), even
though they still operate with a bounce buffer.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This does the first part of the conversion to coroutines, by
wrapping bdrv_read implementations to take the mutex.
Drivers that implement bdrv_read rather than bdrv_co_readv can
then benefit from asynchronous operation (at least if the underlying
protocol supports it, which is not the case for raw-win32), even
though they still operate with a bounce buffer.
raw-win32 does not need the lock, because it cannot yield.
nbd also doesn't probably, but better be safe.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The big conversion of bdrv_read/write to coroutines caused the two
homonymous callbacks in BlockDriver to become reentrant. It goes
like this:
1) bdrv_read is now called in a coroutine, and calls bdrv_read or
bdrv_pread.
2) the nested bdrv_read goes through the fast path in bdrv_rw_co_entry;
3) in the common case when the protocol is file, bdrv_co_do_readv calls
bdrv_co_readv_em (and from here goes to bdrv_co_io_em), which yields
until the AIO operation is complete;
4) if bdrv_read had been called from a bottom half, the main loop
is free to iterate again: a device model or another bottom half
can then come and call bdrv_read again.
This applies to all four of read/write/flush/discard. It would also
apply to is_allocated, but it is not used from within coroutines:
besides qemu-img.c and qemu-io.c, which operate synchronously, the
only user is the monitor. Copy-on-read will introduce a use in the
block layer, and will require converting it.
The solution is "simply" to convert all drivers to coroutines! We
just need to add a CoMutex that is taken around affected operations.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Move vmdk_parent_open to vmdk_open. There's another path how
vmdk_parent_open can be reached:
vmdk_parse_extents() -> vmdk_open_sparse() -> vmdk_open_vmdk4() ->
vmdk_open_desc_file().
If that can happen, however, the code is bogus. vmdk_parent_open
reads from bs->file:
if (bdrv_pread(bs->file, s->desc_offset, desc, DESC_SIZE) != DESC_SIZE) {
but it is always called with s->desc_offset == 0 and with the same
bs->file. So the data that vmdk_parent_open reads comes always from the
same place, and anyway there is only one place where it can write it,
namely bs->backing_file.
So, if it cannot happen, the patched code is okay.
It is also possible that the recursive call can happen, but only once. In
that case there would still be a bug in vmdk_open_desc_file setting
s->desc_offset = 0, but the patched code is okay.
Finally, in the case where multiple recursive calls can happen the code
would need to be rewritten anyway. It is likely that this would anyway
involve adding several parameters to vmdk_parent_open, and calling it from
vmdk_open_vmdk4.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
While vmdk_open_desc_file (touched by the patch) correctly changed -1
to -EINVAL, vmdk_open did not. Fix it directly in vmdk_parent_open.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
If during allocation of compressed clusters the cluster was already allocated
uncompressed, fail and properly release the l2_table (the latter avoids a
failed assertion).
While at it, make it return some real error numbers instead of -1.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
This similarly adds support for coroutine and asynchronous discard.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Block drivers now only need to provide either of .bdrv_co_flush,
.bdrv_aio_flush() or for legacy drivers .bdrv_flush(). Remove
the redundant .bdrv_flush() implementations.
[Paolo Bonzini: change raw driver to bdrv_co_flush]
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This makes the following patch easier to review.
Cc: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The raw format delegates all operations to bs->file (the protocol).
Previously this block driver exposed both sync and aio interfaces.
Since the block layer now works in terms of coroutines, expose the
coroutine interfaces and drop the others. This avoids unnecessary
emulation of sync and aio interfaces.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Block drivers only need to provide one of sync, aio, or coroutine
interfaces. Since raw-posix.c provides aio interfaces, simply drop the
synchronous interfaces since they can be emulated using aio and
coroutines.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
cppcheck reported this error:
qemu/block/qcow.c:599: error: Mismatching allocation and deallocation: cluster_data
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Reviewed-by: Andreas Färber <afaerber@suse.de>
Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
The unused code was detected using cppcheck.
Cc: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Weil <weil@mail.berlios.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
cppcheck reported memory leaks and mismatched g_malloc() with free()
instead of g_free().
Fix these errors.
Cc: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Weil <weil@mail.berlios.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Allow to resize images that reside on host devices up to the available
space. This allows to grow images after resizing the device manually or
vice versa.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
QED's metadata caching strategy allows two parallel requests to race for
metadata lookup. The first one to complete will populate the metadata
cache and the second one will drop the data it just read in favor of the
cached data.
There is a use-after-free in qed_read_l2_table_cb() and
qed_commit_l2_update() where l2_table->offset was used after the
l2_table may have been freed due to a metadata lookup race. Fix this by
keeping the l2_offset in a local variable and not reaching into the
possibly freed l2_table.
Reported-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
The previous behaviour was to finish AIOCBs inside curl_aio_readv()
if the data was cached. This caused the following failed assertion
at hw/ide/pci.c:314: bmdma_cmd_writeb
"Assertion `bm->bus->dma->aiocb == ((void *)0)' failed."
By scheduling a QEMUBH and performing the completion inside the
callback, we avoid this problem.
Signed-off-by: Nick Thomas <nick@bytemark.co.uk>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The config string is variously delimited by =, @, and /, depending on the
field. Allow these characters to be escaped by preceeding them with \.
Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
librbd recently added async writeback and flush support. If the new
rbd_flush() call is available, call it.
Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Properly document the configuration string syntax and semantics. Remove
(out of date) details about the librbd implementation.
Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
If we are reading from the default config location, ignore any failures.
It is perfectly legal for the user to specify exactly the options they need
and to not rely on any config file.
Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Release extent_file on error in vmdk_parse_extents. Added closing files
in freeing extents.
Signed-off-by: Fam Zheng <famcool@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
nbd supports writing flags in bytes 24...27 of the header,
and uses that for the read-only flag. Add support for it
in qemu-nbd.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Those blanks violate the coding conventions, see
scripts/checkpatch.pl.
Blanks missing after colons in the changed lines were added.
This patch does not try to fix tabs, long lines and other
problems in the changed lines, therefore checkpatch.pl reports
many violations.
Signed-off-by: Stefan Weil <weil@mail.berlios.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
QCowL2Meta::offset is not cluster aligned but only sector aligned
however nb_clusters count cluster from cluster start.
This fix range check. Note that old code have no corruption issues
related to this check cause it only cause intersection to occur
when shouldn't.
Signed-off-by: Frediano Ziglio <freddy77@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
QCow2Meta structure was inserted into list before many fields are
initialized. Currently is not a problem cause all occur in a lock
but if qcow2_alloc_clusters would in a future unlock this lock
some issues could arise.
Initializing fields before inserting fix the problem.
Signed-off-by: Frediano Ziglio <freddy77@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Fix leak of s->snap in failure path. Simplify error paths for the whole
function.
Reported-by: Stefan Hajnoczi <stefanha@gmail.com>
Signed-off-by: Sage Weil <sage@newdream.net>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
No assignment in condition. Remove duplicate ret > 0 check.
Signed-off-by: Sage Weil <sage@newdream.net>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Allow the client id to be specified in the config string via 'id=' so that
users can control who they authenticate as. Currently they are stuck with
the default ('admin'). This is necessary for anyone using authentication
in their environment.
Signed-off-by: Sage Weil <sage@newdream.net>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The vSphere 4 exported image is streamOptimized extent, which is not
quite correctly handled. Ignore rdgOffset when RGD flag bit not set.
Signed-off-by: Fam Zheng <famcool@gmail.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Haiku provides a specially formed vmdk image, which let qemu abort. It a
combination of sparse header and flat data (i.e. with not l1/l2 table at
all). The fix is turn to descriptor when sparse header is zero in field
'capacity'.
Signed-off-by: Fam Zheng <famcool@gmail.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Creating streamOptimized subformat. Added subformat option
'streamOptimized', to create a image with compression enabled and each
cluster with a GrainMarker.
Signed-off-by: Fam Zheng <famcool@gmail.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Add support for reading/writing compressed extent.
Signed-off-by: Fam Zheng <famcool@gmail.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Added flags field for compressed/streamOptimized extents, open and save
image configuration.
Signed-off-by: Fam Zheng <famcool@gmail.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Factor out read/write extent code, since there will be more things to
take care of once reading/writing compressed clusters is introduced.
Signed-off-by: Fam Zheng <famcool@gmail.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>