The .bdrv_truncate() operation resizes images and growing is easy to
implement in QED. Simply check that the new size is valid and then
update the image_size header field to reflect the new size.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
One strategy to limit the startup delay of consistency check when
opening image files is to ensure that the file is marked dirty for as
little time as possible.
QED currently marks the image dirty when the first allocating write
request is issued and clears the dirty bit again when the image is
cleanly closed. In practice that means the image is marked dirty for
most of a guest's lifetime and prone to being in a dirty state upon
crash or power failure.
It is safe to clear the dirty bit after all allocating write requests
have completed and a flush has been performed. This patch adds a timer
after the last allocating write request completes. When the timer fires
it will flush and then clear the dirty bit. The timer is set to 5
seconds and is cancelled upon arrival of a new allocating write request.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The code changed here is an unused data type name (evt_flush_occurred).
Signed-off-by: Stefan Weil <weil@mail.berlios.de>
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
The qed_bytes_to_clusters() function is normally used with size_t
lengths. Consistency check used it with file size length and therefore
failed on 32-bit hosts when the image file is 4 GB or more.
Make qed_bytes_to_clusters() explicitly 64-bit and update consistency
check to keep 64-bit cluster counts.
Reported-by: Michael Tokarev <mjt@tls.msk.ru>
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Use get_option_parameter() to instead of duplicating the loop, and
use BDRV_SECTOR_SIZE to instead of 512
Signed-off-by: Mitnick Lyu <mitnick.lyu@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Zero clusters are similar to unallocated clusters except instead of reading
their value from a backing file when one is available, the cluster is always
read as zero.
This implements read support only. At this stage, QED will never write a
zero cluster.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
We also change the way the file parameter is parsed so IPv6 IP
addresses can be used, e.g.: "drive=nbd:[::1]:5000"
Signed-off-by: Nick Thomas <nick@bytemark.co.uk>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
qemu now has generic bitmap functions,
so don't redefine them in sheepdog.c,
use common header instead. A small cleanup.
Here's only one function which is actually
used in sheepdog and gets replaced with
a generic one (simplified):
- static inline int test_bit(int nr, const volatile unsigned long *addr)
+ static inline int test_bit(int nr, const unsigned long *addr)
{
- return ((1UL << (nr % BITS_PER_LONG))
& ((unsigned long*)addr)[nr / BITS_PER_LONG])) != 0;
+ return 1UL & (addr[nr / BITS_PER_LONG] >> (nr & (BITS_PER_LONG-1)));
}
The body is equivalent, but the argument is not: there's
"volatile" in there. Why it is used for - I'm not sure.
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Acked-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
This patch is similar to 171e3d6b99
which fixed qcow2:
Returning -EIO is far from optimal, but at least it's an error code.
In addition to read/write failures, -EIO is also returned when
decompress_cluster failed.
Cc: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Weil <weil@mail.berlios.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This patch is similar to 171e3d6b99
which fixed qcow2:
Returning -EIO is far from optimal, but at least it's an error code.
Cc: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Weil <weil@mail.berlios.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
When copying L2 tables (this happens only with internal snapshots), the order
wasn't completely safe, so that after a crash you could end up with a L2 table
that has too low refcount, possibly leading to corruption in the long run.
This patch puts the operations in the right order: First allocate the new
L2 table and replace the reference, and only then decrease the refcount of the
old table.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Instead of just returning -ENOTSUP, generate a more detailed error.
Unfortunately we don't have a helpful text for features that we don't know yet,
so just print the feature mask. It might be useful at least if someone asks for
help.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Anthony Liguori <aliguori@us.ibm.com>
Acked-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
The qcow2 driver is now declared responsible for any QCOW image that has
version 2 or greater (before this, version 3 would be detected as raw).
For everything newer than version 2, an error is reported.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Anthony Liguori <aliguori@us.ibm.com>
When reading a compressed cluster failed, qcow2 falsely returned success.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Requests could return success even though they failed when bdrv_aio_readv
returned NULL for a backing file read.
Reported-by: Chunqiang Tang <ctang@us.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This patch fixes the following bug in QCOW2. For a QCOW2 image that is larger
than its base image, when handling a read request straddling over the end of the
base image, the QCOW2 driver attempts to read beyond the end of the base image
and the request would fail.
This bug was found by Fast Virtual Disk (FVD)'s fully automated testing tool.
The following test triggered the bug.
dd if=/dev/zero of=/var/ramdisk/truth.raw count=0 bs=1 seek=1098561536
dd if=/dev/zero of=/var/ramdisk/zero-500M.raw count=0 bs=1 seek=593099264
./qemu-img create -f qcow2 -ocluster_size=65536,backing_fmt=blksim -b /var/ramdisk/zero-500M.raw /var/ramdisk/test.qcow2 1098561536
./qemu-io --auto --seed=30477694 --truth=/var/ramdisk/truth.raw --format=qcow2 --test=blksim:/var/ramdisk/test.qcow2 --verify_write=true --compare_before=false --compare_after=true --round=100000 --parallel=100 --io_size=10485760 --fail_prob=0 --cancel_prob=0 --instant_qemubh=true
Signed-off-by: Chunqiang Tang <ctang@us.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Error report from cppcheck:
block/vdi.c:122: error: Using sizeof for array given as function argument returns the size of pointer.
block/vdi.c:128: error: Using sizeof for array given as function argument returns the size of pointer.
Fix both by setting the correct size.
The buggy code is only used when QEMU is build without uuid support.
The bug is not critical, so there is no urgent need to apply it to
old versions of QEMU.
Cc: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Weil <weil@mail.berlios.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
For cache=unsafe we also need to set BDRV_O_CACHE_WB, otherwise we have some
strange unsafe writethrough mode.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Variables l2_modified and l2_size are not really used, remove them.
Spotted by GCC 4.6.0:
CC block/qcow2-refcount.o
/src/qemu/block/qcow2-refcount.c: In function 'qcow2_update_snapshot_refcount':
/src/qemu/block/qcow2-refcount.c:708:37: error: variable 'l2_modified' set but not used [-Werror=unused-but-set-variable]
/src/qemu/block/qcow2-refcount.c:708:9: error: variable 'l2_size' set but not used [-Werror=unused-but-set-variable]
CC: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The consistency check on open is necessary in order to fix inconsistent
table offsets left as a result of a crash mid-operation. Images with a
backing file actually flush before updating table offsets and are
therefore guaranteed to be consistent. Do not mark these images dirty.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This adds a bdrv_discard function to qcow2 that frees the discarded clusters.
It does not yet pass the discard on to the underlying file system driver, but
the space can be reused by future writes to the image.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
This patch parses the input filename in sd_create(), and enables us
specifying a target server to create sheepdog images.
Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Move size after the two pointers in struct Qcow2Cache to get better
packing of struct elements on 64 bit architectures.
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
QED relies on the underlying filesystem to extend the file and maintain
its size. Check that images are not created on a block device.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
qcow2 calls bdrv_flush() after performing COW in order to ensure that the
L2 table change is never written before the copy is safe on disk. Now that the
L2 table is cached, we can wait with flushing until we write out the next L2
table.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This adds some new cache functions to qcow2 which can be used for caching
refcount blocks and L2 tables. When used with cache=writethrough they work
like the old caching code which is spread all over qcow2, so for this case we
have merely a cleanup.
The interesting case is with writeback caching (this includes cache=none) where
data isn't written to disk immediately but only kept in cache initially. This
leads to some form of metadata write batching which avoids the current "write
to refcount block, flush, write to L2 table" pattern for each single request
when a lot of cluster allocations happen. Instead, cache entries are only
written out if its required to maintain the right order. In the pure cluster
allocation case this means that all metadata updates for requests are done in
memory initially and on sync, first the refcount blocks are written to disk,
then fsync, then L2 tables.
This improves performance of scenarios with lots of cluster allocations
noticably (e.g. installation or after taking a snapshot).
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
cpu_to_be64w() is called with an obviously non-aligned pointer. Use
cpu_to_be64wu() instead. It fixes unaligned accesses errors on IA64
hosts.
Cc: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Fix a file descriptor leak, reported by cppcheck:
[/src/qemu/block/vvfat.c:759]: (error) Resource leak: dir
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
In addition this adds missing braces to the function to be consistent
with the coding style.
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
It doesn't really make sense for functions in qcow2.c to be named
qcow_ so convert the names to match correctly.
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This patch adds support for the qemu-img check command. It also
introduces a dirty bit in the qed header to mark modified images as
needing a check. This bit is cleared when the image file is closed
cleanly.
If an image file is opened and it has the dirty bit set, a consistency
check will run and try to fix corrupted table offsets. These
corruptions may occur if there is power loss while an allocating write
is performed. Once the image is fixed it opens as normal again.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This patch implements the read/write state machine. Operations are
fully asynchronous and multiple operations may be active at any time.
Allocating writes lock tables to ensure metadata updates do not
interfere with each other. If two allocating writes need to update the
same L2 table they will run sequentially. If two allocating writes need
to update different L2 tables they will run in parallel.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This patch adds code to look up data cluster offsets in the image via
the L1/L2 tables. The L2 tables are writethrough cached in memory for
performance (each read/write requires a lookup so it is essential to
cache the tables).
With cluster lookup code in place it is possible to implement
bdrv_is_allocated() to query the number of contiguous
allocated/unallocated clusters.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This patch introduces the qed on-disk layout and implements image
creation. Later patches add read/write and other functionality.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Add support to discard blocks in a raw image residing on an XFS filesystem
by calling the XFS_IOC_UNRESVSP64 ioctl to punch holes. Support for other
hole punching mechanisms can be added when they become available.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Add a new bdrv_discard method to free blocks in a mapping image, and a new
drive property to set the granularity for these discard. If no discard
granularity support is set discard support is disabled.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
RBD is an block driver for the distributed file system Ceph
(http://ceph.newdream.net/). This driver uses librados (which is part
of the Ceph server) for direct access to the Ceph object store and is
running entirely in userspace (Yehuda also wrote a driver for the
linux kernel, that can be used to access rbd volumes as a block
device).
Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: Christian Brunner <chb@muc.de>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
All drivers use bs->file instead of s->hd for quite a while now, so it's time
to remove s->hd.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
When building on a 64 bit host which uses 'long' for int64_t,
GCC emits a warning:
CC block/blkverify.o
/src/qemu/block/blkverify.c: In function `blkverify_verify_readv':
/src/qemu/block/blkverify.c:304: warning: long long int format, long
unsigned int arg (arg 3)
Rework a77cffe7e9 to avoid the warning.
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The cache content may be destroyed after a failed read, better not use it any
more.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
This changes bdrv_flush to return 0 on success and -errno in case of failure.
It's a requirement for implementing proper error handle in users of bdrv_flush.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>