qcow2's get_cluster_offset() scans forward in the l2 table to find other
clusters that have the same allocation status as the first cluster.
This is used by (among others) qcow_is_allocated().
Unfortunately, it was not checking to be sure that it didn't fall off
the end of the l2 table. This patch adds that check.
The symptom that motivated me to look into this was that
bdrv_is_allocated() was returning false when there was in fact data
there. This is one of many ways this bug could lead to data corruption.
I checked the other place that scans for consecutive unallocated blocks
(alloc_cluster_offset()) and it appears to be OK:
nb_clusters = MIN(nb_clusters, s->l2_size - l2_index);
appears to prevent the same problem from occurring.
Signed-off-by: Nolan Leake <nolan <at> sigbus.net>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@6977 c046a42c-6fe2-441c-8c8c-71466251a162
Recent patches added two compiler warnings about the format string
usage in qcow_read_extensions. One is printing a uint64_t using
%lu which is incorrect on many platforms as it can be a unsigned
long long, the second one is printing the result of sizeof as
%lu, but it is a size_t so it needs to be printed using %zu.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@6944 c046a42c-6fe2-441c-8c8c-71466251a162
Use a qcow2 extension to keep the backing file format.
By keeping the backing file format, we can:
1. Provide a way to know the backing file format without probing
it (setting the format at creation time).
2. Enable using qcow2 format over host block devices.
(only if the user specifically asks for it, by providing the format
at creation time).
Also fixes a security flaw found by Daniel P. Berrange on [1]
which summarizes: "Autoprobing: just say no."
[1] http://lists.gnu.org/archive/html/qemu-devel/2008-12/msg01083.html
Signed-off-by: Uri Lublin <uril@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@6909 c046a42c-6fe2-441c-8c8c-71466251a162
Qcow2 extensions are build of magic (id) len (in bytes) and data.
They reside right after the qcow2 header.
If a backing filename exists it follows the qcow2 extension (if exist)
Qcow2 extensions are read upon image open.
Qcow2 extensions are identified by their magic.
Unknown qcow2 extensions (unknown magic) are skipped.
A Special magic of 0 means end-of-qcow2-extensions.
In this patchset, to be used to keep backing file format.
Based on a work done by Shahar Frank <sfrank@redhat.com>.
Signed-off-by: Uri Lublin <uril@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@6907 c046a42c-6fe2-441c-8c8c-71466251a162
This series is broken by design as it requires expensive IO operations at
open time causing very long delays when starting a virtual machine for the
first time.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@6816 c046a42c-6fe2-441c-8c8c-71466251a162
This series is broken by design as it requires expensive IO operations at
open time causing very long delays when starting a virtual machine for the
first time.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@6815 c046a42c-6fe2-441c-8c8c-71466251a162
This series is broken by design as it requires expensive IO operations at
open time causing very long delays when starting a virtual machine for the
first time.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@6813 c046a42c-6fe2-441c-8c8c-71466251a162
Consistently use the C99 named initializer format for the BlockDriver
methods to make the method table more readable and more easily
extensible.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@6768 c046a42c-6fe2-441c-8c8c-71466251a162
'num_free_bytes' is the number of non-allocated bytes below highest-allocation.
It's useful, together with the highest-allocation, to figure out how
fragmented the image is, and how likely it will run out-of-space soon.
For example when the highest allocation is high (almost end-of-disk), but
many bytes (clusters) are free, and can be re-allocated when neeeded, than
we know it's probably not going to reach end-of-disk-space soon.
Added bookkeeping to block-qcow2.c
Export it using BlockDeviceInfo
Show it upon 'info blockstats' if BlockDeviceInfo exists
Signed-off-by: Uri Lublin <uril@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@6407 c046a42c-6fe2-441c-8c8c-71466251a162
We want to know the highest written offset for qcow2 images.
This gives a pretty good (and easy to calculate) estimation to how
much more allocation can be done for the block device.
It can be usefull for allocating more diskspace for that image
(if possible, e.g. lvm) before we run out-of-disk-space
Signed-off-by: Uri Lublin <uril@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@6404 c046a42c-6fe2-441c-8c8c-71466251a162
Currently qemu_mallocz calls malloc and handling of zero by malloc is
implementation defined behaviour:
http://www.opengroup.org/onlinepubs/7990989775/xsh/malloc.html
malloc(0) on AIX returns NULL[1] and qcow2 images without snapshots
are thus unusable
[1] Unless special Linux compatibility define is used when compiling
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@6359 c046a42c-6fe2-441c-8c8c-71466251a162
When allocating multiple clusters at once, the qcow2 implementation
tries to find as many physically contiguous clusters as possible to
allow larger writes. This search includes allocated clusters which are
in the right place and still free clusters. If the range to allocate
spans clusters in patterns like "10 allocated, then 10 free, then again
10 allocated" it is only checked that the chunks of allocated clusters
are contiguous for themselves.
However, what is actually needed is to have _all_ allocated clusters
contiguous, starting at the first cluster of the allocation and spanning
multiple such chunks. This patch changes the check so that each offset
is not compared to the offset of the first cluster in its own chunk but
to the first cluster in the whole allocation.
I haven't seen it happen, but without this fix data corruption on qcow2
images is possible.
Signed-off-by: Kevin Wolf <kwolf@suse.de>
Acked-by: Gleb Natapov <gleb@redhat.com>
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@6213 c046a42c-6fe2-441c-8c8c-71466251a162
Correctly calculate number of contiguous clusters.
Acked-by: Kevin Wolf <kwolf@suse.de>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@6212 c046a42c-6fe2-441c-8c8c-71466251a162
qcow2 writes a cluster reference count on every cluster update. This causes
performance to crater when using anything but cache=writeback. This is most
noticeable when using savevm. Right now, qcow2 isn't a reliable format
regardless of the type of cache your using because metadata is not updated in
the correct order. Considering this, I think it's somewhat reasonable to use
writeback caching by default with qcow2 files.
It at least avoids the massive performance regression for users until we sort
out the issues in qcow2.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@5879 c046a42c-6fe2-441c-8c8c-71466251a162
Currently the order is this (during cow since it's the interesting case):
1. Decrement refcount of old clusters
2. Increment refcount for newly allocated clusters
3. Copy content of old sectors that will not be rewritten
4. Update L2 table with pointers to new clusters
5. Write guest data into new clusters (asynchronously)
There are several problems with this order. The first one is that if qemu
crashes (or killed or host reboots) after new clusters are linked into L2
table but before user data is written there, then on the next reboot guest
will find neither old data nor new one in those sectors and this is not
what gust expects even when journaling file system is in use. The other
problem is that if qemu is killed between steps 1 and 4 then refcount
of old cluster will be incorrect and may cause snapshot corruption.
The patch change the order to be like this:
1. Increment refcount for newly allocated clusters
2. Write guest data into new clusters (asynchronously)
3. Copy content of old sectors that were not rewritten
4. Update L2 table with pointers to new clusters
5. Decrement refcount of old clusters
Unexpected crash may cause cluster leakage, but guest data should be safe.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@5861 c046a42c-6fe2-441c-8c8c-71466251a162
Otherwise if VM is killed between two writes data may be lost.
But if offset and size fields are at the same disk block one
write should update them both simultaneously.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@5859 c046a42c-6fe2-441c-8c8c-71466251a162
Use it to remove code duplications from qcow_aio_read_cb().
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@5858 c046a42c-6fe2-441c-8c8c-71466251a162
I noticed the qemu_aio_flush was doing nothing at all. And a flood of
cmd_writeb commands leading to a noop-invocation of qemu_aio_flush
were executed.
In short all 'memset;goto redo' places must be fixed to use the bh and
not to call the callback in the context of bdrv_aio_read or the
bdrv_aio_read model falls apart. Reading from qcow2 holes is possible
with phyisical readahead (kind of breada in linux buffer cache).
This is needed at least for scsi, ide is lucky (or it has been
band-aided against this API breakage by fixing the symptom and not the
real bug).
Same bug exists in qcow of course, can be fixed later as it's less
urgent.
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@5574 c046a42c-6fe2-441c-8c8c-71466251a162
During the debugging of the new revision of the zero dedup patch I
stepped on the following bug in block-qcow2.c:alloc_cluster_offset(). I
am not sure what the exact damage this bug can do, but it may be very
nasty because you way not notice it effects until you will do some
snapshot operations or similar actions that rely on the reference
counting.
The bug is easy to spot using the new "check" verb I added to the
qemu-img in one of the previous patches. I will resend the qemu-img
patch again with the new version of the zero dedup.
Signed-off-by: Shahar Frank <shaharf@qumranet.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@5313 c046a42c-6fe2-441c-8c8c-71466251a162
With this container_of can actually be used without causing build errors.
Reformat container_of.
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@5234 c046a42c-6fe2-441c-8c8c-71466251a162
This was suggested by Kevin Wolf since this is, in fact, an error condition.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@5007 c046a42c-6fe2-441c-8c8c-71466251a162
Modify get_cluster_offset(), alloc_cluster_offset() to specify how many clusters
we want.
Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@5006 c046a42c-6fe2-441c-8c8c-71466251a162
Divide alloc_cluster_offset() into alloc_cluster_offset() and
alloc_compressed_cluster_offset().
Common parts are moved to free_any_clusters() and get_cluster_table();
Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@5005 c046a42c-6fe2-441c-8c8c-71466251a162
Extract code from get_cluster_offset() into new functions:
- seek_l2_table()
Search an l2 offset in the l2_cache table.
- l2_load()
Read the l2 entry from disk
- l2_allocate()
Allocate a new l2 entry.
Some comment fixups from Kevin Wolf
Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Signed-off-by: Kevin Wolf <kwolf@suse.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@5003 c046a42c-6fe2-441c-8c8c-71466251a162
Qemu 0.9.1 and earlier does not perform range checks for block device
read or write requests, which allows guest host users with root
privileges to access arbitrary memory and escape the virtual machine.
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@4037 c046a42c-6fe2-441c-8c8c-71466251a162
Remove QEMU_TOOL. Replace with QEMU_IMG and NEED_CPU_H.
Avoid linking qemu-img against whole system emulatior.
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@3578 c046a42c-6fe2-441c-8c8c-71466251a162