qemu-e2k

Author	SHA1	Message	Date
Max Reitz	772d1f973f	block/qcow2: falloc/full preallocating growth Implement the preallocation modes falloc and full for growing qcow2 images. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20170613202107.10125-15-mreitz@redhat.com Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-07-11 17:45:02 +02:00
Max Reitz	12cc30a8cb	block/qcow2: Add qcow2_refcount_area() This function creates a collection of self-describing refcount structures (including a new refcount table) at the end of a qcow2 image file. Optionally, these structures can also describe a number of additional clusters beyond themselves; this will be important for preallocated truncation, which will place the data clusters and L2 tables there. For now, we can use this function to replace the part of alloc_refcount_block() that grows the refcount table (from which it is actually derived). Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20170613202107.10125-13-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-07-11 17:45:02 +02:00
Vladimir Sementsov-Ogievskiy	469c71edc7	qcow2: add .bdrv_remove_persistent_dirty_bitmap Realize .bdrv_remove_persistent_dirty_bitmap interface. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20170628120530.31251-29-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-07-11 17:44:59 +02:00
Vladimir Sementsov-Ogievskiy	da0eb242ad	qcow2: add .bdrv_can_store_new_dirty_bitmap Realize .bdrv_can_store_new_dirty_bitmap interface. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20170628120530.31251-23-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-07-11 17:44:59 +02:00
Vladimir Sementsov-Ogievskiy	169b879359	qcow2: store bitmaps on reopening image as read-only Store bitmaps and mark them read-only on reopening image as read-only. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20170628120530.31251-21-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-07-11 17:44:58 +02:00
Vladimir Sementsov-Ogievskiy	5f72826e7f	qcow2: add persistent dirty bitmaps support Store persistent dirty bitmaps in qcow2 image. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20170628120530.31251-20-vsementsov@virtuozzo.com [mreitz: Always assign ret in store_bitmap() in case of an error] Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-07-11 17:44:58 +02:00
Vladimir Sementsov-Ogievskiy	1b6b0562db	qcow2: support .bdrv_reopen_bitmaps_rw Realize bdrv_reopen_bitmaps_rw interface. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20170628120530.31251-15-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-07-11 17:44:58 +02:00
Vladimir Sementsov-Ogievskiy	d1258dd0c8	qcow2: autoloading dirty bitmaps Auto loading bitmaps are bitmaps in Qcow2, with the AUTO flag set. They are loaded when the image is opened and become BdrvDirtyBitmaps for the corresponding drive. Extra data in bitmaps is not supported for now. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20170628120530.31251-12-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-07-11 17:44:58 +02:00
Vladimir Sementsov-Ogievskiy	88ddffae8f	qcow2: add bitmaps extension Add bitmap extension as specified in docs/specs/qcow2.txt. For now, just mirror extension header into Qcow2 state and check constraints. Also, calculate refcounts for qcow2 bitmaps, to not break qemu-img check. For now, disable image resize if it has bitmaps. It will be fixed later. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20170628120530.31251-9-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-07-11 17:44:57 +02:00
Vladimir Sementsov-Ogievskiy	8a5bb1f114	qcow2-refcount: rename inc_refcounts() and make it public This is needed for the following patch, which will introduce refcounts checking for qcow2 bitmaps. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20170628120530.31251-8-vsementsov@virtuozzo.com [mreitz: s/inc_refcounts/qcow2_inc_refcounts_imrt/ in one more (new) place] Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-07-11 17:44:57 +02:00
Daniel P. Berrange	4652b8f3e1	qcow2: add support for LUKS encryption format This adds support for using LUKS as an encryption format with the qcow2 file, using the new encrypt.format parameter to request "luks" format. e.g. # qemu-img create --object secret,data=123456,id=sec0 \ -f qcow2 -o encrypt.format=luks,encrypt.key-secret=sec0 \ test.qcow2 10G The legacy "encryption=on" parameter still results in creation of the old qcow2 AES format (and is equivalent to the new 'encryption-format=aes'). e.g. the following are equivalent: # qemu-img create --object secret,data=123456,id=sec0 \ -f qcow2 -o encryption=on,encrypt.key-secret=sec0 \ test.qcow2 10G # qemu-img create --object secret,data=123456,id=sec0 \ -f qcow2 -o encryption-format=aes,encrypt.key-secret=sec0 \ test.qcow2 10G With the LUKS format it is necessary to store the LUKS partition header and key material in the QCow2 file. This data can be many MB in size, so cannot go into the QCow2 header region directly. Thus the spec defines a FDE (Full Disk Encryption) header extension that specifies the offset of a set of clusters to hold the FDE headers, as well as the length of that region. The LUKS header is thus stored in these extra allocated clusters before the main image payload. Aside from all the cryptographic differences implied by use of the LUKS format, there is one further key difference between the use of legacy AES and LUKS encryption in qcow2. For LUKS, the initialiazation vectors are generated using the host physical sector as the input, rather than the guest virtual sector. This guarantees unique initialization vectors for all sectors when qcow2 internal snapshots are used, thus giving stronger protection against watermarking attacks. Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Message-id: 20170623162419.26068-14-berrange@redhat.com Reviewed-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-07-11 17:44:56 +02:00
Daniel P. Berrange	b25b387fa5	qcow2: convert QCow2 to use QCryptoBlock for encryption This converts the qcow2 driver to make use of the QCryptoBlock APIs for encrypting image content, using the legacy QCow2 AES scheme. With this change it is now required to use the QCryptoSecret object for providing passwords, instead of the current block password APIs / interactive prompting. $QEMU \ -object secret,id=sec0,file=/home/berrange/encrypted.pw \ -drive file=/home/berrange/encrypted.qcow2,encrypt.key-secret=sec0 The test 087 could be simplified since there is no longer a difference in behaviour when using blockdev_add with encrypted images for the running vs stopped CPU state. Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Message-id: 20170623162419.26068-12-berrange@redhat.com Reviewed-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-07-11 17:44:56 +02:00
Daniel P. Berrange	446d306d23	qcow2: make qcow2_encrypt_sectors encrypt in place Instead of requiring separate input/output buffers for encrypting data, change qcow2_encrypt_sectors() to assume use of a single buffer, encrypting in place. The current callers all used the same buffer for input/output already. Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Message-id: 20170623162419.26068-11-berrange@redhat.com Reviewed-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-07-11 17:44:56 +02:00
Alberto Garcia	ee22a9d869	qcow2: Merge the writing of the COW regions with the guest data If the guest tries to write data that results on the allocation of a new cluster, instead of writing the guest data first and then the data from the COW regions, write everything together using one single I/O operation. This can improve the write performance by 25% or more, depending on several factors such as the media type, the cluster size and the I/O request size. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-06-26 14:51:13 +02:00
Alberto Garcia	e034f5bcbc	qcow2: Use unsigned int for both members of Qcow2COWRegion Qcow2COWRegion has two attributes: - The offset of the COW region from the start of the first cluster touched by the I/O request. Since it's always going to be positive and the maximum request size is at most INT_MAX, we can use a regular unsigned int to store this offset. - The size of the COW region in bytes. This is guaranteed to be >= 0, so we should use an unsigned type instead. In x86_64 this reduces the size of Qcow2COWRegion from 16 to 8 bytes. It will also help keep some assertions simpler now that we know that there are no negative numbers. The prototype of do_perform_cow() is also updated to reflect these changes. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-06-26 14:51:13 +02:00
Eric Blake	d2cb36af2b	qcow2: Discard/zero clusters by byte count Passing a byte offset, but sector count, when we ultimately want to operate on cluster granularity, is madness. Clean up the external interfaces to take both offset and count as bytes, while still keeping the assertion added previously that the caller must align the values to a cluster. Then rename things to make sure backports don't get confused by changed units: instead of qcow2_discard_clusters() and qcow2_zero_clusters(), we now have qcow2_cluster_discard() and qcow2_cluster_zeroize(). The internal functions still operate on clusters at a time, and return an int for number of cleared clusters; but on an image with 2M clusters, a single L2 table holds 256k entries that each represent a 2M cluster, totalling well over INT_MAX bytes if we ever had a request for that many bytes at once. All our callers currently limit themselves to 32-bit bytes (and therefore fewer clusters), but by making this function 64-bit clean, we have one less place to clean up if we later improve the block layer to support 64-bit bytes through all operations (with the block layer auto-fragmenting on behalf of more-limited drivers), rather than the current state where some interfaces are artificially limited to INT_MAX at a time. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20170507000552.20847-13-eblake@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-05-11 14:28:07 +02:00
Eric Blake	fdfab37dfe	qcow2: Make distinction between zero cluster types obvious Treat plain zero clusters differently from allocated ones, so that we can simplify the logic of checking whether an offset is present. Do this by splitting QCOW2_CLUSTER_ZERO into two new enums, QCOW2_CLUSTER_ZERO_PLAIN and QCOW2_CLUSTER_ZERO_ALLOC. I tried to arrange the enum so that we could use 'ret <= QCOW2_CLUSTER_ZERO_PLAIN' for all unallocated types, and 'ret >= QCOW2_CLUSTER_ZERO_ALLOC' for allocated types, although I didn't actually end up taking advantage of the layout. In many cases, this leads to simpler code, by properly combining cases (sometimes, both zero types pair together, other times, plain zero is more like unallocated while allocated zero is more like normal). Signed-off-by: Eric Blake <eblake@redhat.com> Message-id: 20170507000552.20847-7-eblake@redhat.com Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-05-11 14:28:07 +02:00
Eric Blake	3ef9521893	qcow2: Name typedef for cluster type Although it doesn't add all that much type safety (this is C, after all), it does add a bit of legibility to use the name QCow2ClusterType instead of a plain int. In particular, qcow2_get_cluster_offset() has an overloaded return type; a QCow2ClusterType on success, and -errno on failure; keeping the cluster type in a separate variable makes it slightly easier for the next patch to make further computations based on the type. Suggested-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com> Message-id: 20170507000552.20847-6-eblake@redhat.com [mreitz: Use the new type in two more places (one of them pulled from the next patch)] Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-05-11 14:28:06 +02:00
Max Reitz	564a6b6938	qcow2: Reuse preallocated zero clusters Instead of just freeing preallocated zero clusters and completely allocating them from scratch, reuse them. We cannot do this in handle_copied(), however, since this is a COW operation. Therefore, we have to add the new logic to handle_alloc() and simply return the existing offset if it exists. The only catch is that we have to convince qcow2_alloc_cluster_link_l2() not to free the old clusters (because we have reused them). Reported-by: Eric Blake <eblake@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-05-11 12:08:24 +02:00
Alberto Garcia	7061a07898	qcow2: Optimize the refcount-block overlap check The metadata overlap checks introduced in `a40f1c2add` help detect corruption in the qcow2 image by verifying that data writes don't overlap with existing metadata sections. The 'refcount-block' check in particular iterates over the refcount table in order to get the addresses of all refcount blocks and check that none of them overlap with the region where we want to write. The problem with the refcount table is that since it always occupies complete clusters its size is usually very big. With the default values of cluster_size=64KB and refcount_bits=16 this table holds 8192 entries, each one of them enough to map 2GB worth of host clusters. So unless we're using images with several TB of allocated data this table is going to be mostly empty, and iterating over it is a waste of CPU. If the storage backend is fast enough this can have an effect on I/O performance. This patch keeps the index of the last used (i.e. non-zero) entry in the refcount table and updates it every time the table changes. The refcount-block overlap check then uses that index instead of reading the whole table. In my tests with a 4GB qcow2 file stored in RAM this doubles the amount of write IOPS. Signed-off-by: Alberto Garcia <berto@igalia.com> Message-id: 20170201123828.4815-1-berto@igalia.com Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-02-12 00:47:43 +01:00
Alberto Garcia	9dd76f82d9	qcow2: Remove stale FIXME comment It was from the time when none of the global functions had a qcow2_ prefix. Signed-off-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-11-11 15:54:55 +01:00
Fam Zheng	170f4b2e5c	qcow2: Support BDRV_REQ_MAY_UNMAP Handling this is similar to what is done to the L2 entry in the case of compressed clusters. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-10-24 17:54:03 +02:00
Ladi Prosek	d4b84d564e	Remove unused function declarations Unused function declarations were found using a simple gcc plugin and manually verified by grepping the sources. Signed-off-by: Ladi Prosek <lprosek@redhat.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2016-09-15 15:32:22 +03:00
Kevin Wolf	d46a0bb24d	qcow2: Implement .bdrv_co_pwritev() This changes qcow2 to implement the byte-based .bdrv_co_pwritev interface rather than the sector-based old one. As preallocation uses the same allocation function as normal writes, and the interface of that function needs to be changed, it is converted in the same patch. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2016-06-16 15:19:55 +02:00
Kevin Wolf	8556739355	qcow2: Use bytes instead of sectors for QCowL2Meta In preparation for implementing .bdrv_co_pwritev in qcow2. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2016-06-16 15:19:55 +02:00
Kevin Wolf	ecfe186380	qcow2: Implement .bdrv_co_preadv() Reading from qcow2 images is now byte granularity. Most of the affected code in qcow2 actually gets simpler with this change. The only exception is encryption, which is fixed on 512 bytes blocks; in order to keep this working, bs->request_alignment is set for encrypted images. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2016-06-16 15:19:55 +02:00
Denis V. Lunev	f3c3b87dae	qcow2: avoid extra flushes in qcow2 The problem with excessive flushing was found by a couple of performance tests: - parallel directory tree creation (from 2 processes) - 32 cached writes + fsync at the end in a loop For the first one results improved from 2.6 loops/sec to 3.5 loops/sec. Each loop creates 10^3 directories with 10 files in each. For the second one results improved from ~600 fsync/sec to ~1100 fsync/sec. Though, it was run on SSD so it probably won't show such performance gain on rotational media. qcow2_cache_flush() calls bdrv_flush() unconditionally after writing cache entries of a particular cache. This can lead to as many as 2 additional fdatasyncs inside bdrv_flush. We can simply skip all fdatasync calls inside qcow2_co_flush_to_os as bdrv_flush for sure will do the job. These flushes are necessary to keep the right order of writes to the different caches. Though this is not necessary in the current code base as this ordering is ensured through the flush in qcow2_cache_flush_dependency(). Signed-off-by: Denis V. Lunev <den@openvz.org> CC: Pavel Borzenkov <pborzenkov@virtuozzo.com> CC: Kevin Wolf <kwolf@redhat.com> CC: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-06-08 10:21:09 +02:00
Max Reitz	791c9a004e	qcow2: Add function for refcount order amendment Add a function qcow2_change_refcount_order() which allows changing the refcount order of a qcow2 image. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-12-18 14:34:43 +01:00
Max Reitz	8b13976d3f	block: Add opaque value to the amend CB Add an opaque value which is to be passed to the bdrv_amend_options() status callback. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-12-18 14:34:43 +01:00
Daniel P. Berrange	10817bf09d	coroutine: move into libqemuutil.a library The coroutine files are currently referenced by the block-obj-y variable. The coroutine functionality though is already used by more than just the block code. eg migration code uses coroutine yield. In the future the I/O channel code will also use the coroutine yield functionality. Since the coroutine code is nicely self-contained it can be easily built as part of the libqemuutil.a library, making it widely available. The headers are also moved into include/qemu, instead of the include/block directory, since they are now part of the util codebase, and the impl was never in the block/ directory either. Signed-off-by: Daniel P. Berrange <berrange@redhat.com>	2015-10-20 14:59:04 +01:00
Kevin Wolf	e394621fbd	qcow2: Remove forward declaration of QCowAIOCB This struct doesn't exist any more since commit `3fc48d09` in August 2011, it's about time to remove its forward declaration. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2015-10-16 15:34:30 +02:00
Max Reitz	b6d36def6d	qcow2: Make size_to_clusters() return uint64_t Sadly, some images may have more clusters than what can be represented using a plain int. We should be prepared for that case (in qcow2_check_refcounts() we actually were trying to catch that case, but since size_to_clusters() truncated the returned value, that check never did anything useful). Cc: qemu-stable <qemu-stable@nongnu.org> Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-09-14 16:51:37 +02:00
Kevin Wolf	ff99129ab8	qcow2: Rename BDRVQcowState to BDRVQcow2State BDRVQcowState is already used by qcow1, and gdb is always confused which one to use. Rename the qcow2 one so they can be distinguished. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com>	2015-09-14 16:51:36 +02:00
Alberto Garcia	279621c046	qcow2: add option to clean unused cache entries after some time This adds a new 'cache-clean-interval' option that cleans all qcow2 cache entries that haven't been used in a certain interval, given in seconds. This allows setting a large L2 cache size so it can handle scenarios with lots of I/O and at the same time use little memory during periods of inactivity. This feature currently relies on MADV_DONTNEED to free that memory, so it is not useful in systems that don't follow that behavior. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: a70d12da60433df9360ada648b3f34b8f6f354ce.1438690126.git.berto@igalia.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2015-09-04 21:00:32 +02:00
Daniel P. Berrange	f6fa64f6d2	block: convert qcow/qcow2 to use generic cipher API Switch the qcow/qcow2 block driver over to use the generic cipher API, this allows it to use the pluggable AES implementations, instead of being hardcoded to use QEMU's built-in impl. Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Message-Id: <1435770638-25715-10-git-send-email-berrange@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-07-08 13:11:01 +02:00
Daniel P. Berrange	6f2945cde6	crypto: move built-in AES implementation into crypto/ To prepare for a generic internal cipher API, move the built-in AES implementation into the crypto/ directory Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Message-Id: <1435770638-25715-3-git-send-email-berrange@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-07-07 12:04:13 +02:00
Max Reitz	bc85ef265a	qcow2: Add DEFAULT_L2_CACHE_CLUSTERS If a relatively large cluster size is chosen, the default of 1 MB L2 cache is not really appropriate. In this case, unless overridden by the user, the default cache size should not be determined by its size in bytes but by the number of L2 tables (clusters) it is supposed to contain. Note that without this patch, MIN_L2_CACHE_SIZE will effectively take over the same role. However, providing space for just two L2 tables is not enough to be the default. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-06-12 15:54:01 +02:00
Max Reitz	57e2166959	qcow2: Set MIN_L2_CACHE_SIZE to 2 The L2 cache must cover at least two L2 tables, because during COW two L2 tables are accessed simultaneously. Reported-by: Alexander Graf <agraf@suse.de> Cc: qemu-stable <qemu-stable@nongnu.org> Signed-off-by: Max Reitz <mreitz@redhat.com> Tested-by: Alexander Graf <agraf@suse.de> Reviewed-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-06-12 15:54:00 +02:00
Alberto Garcia	a3f1afb43a	qcow2: make qcow2_cache_put() a void function This function never receives an invalid table pointer, so we can make it void and remove all the error checking code. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-05-22 17:08:01 +02:00
Alberto Garcia	72e80b8901	qcow2: use one single memory block for the L2/refcount cache tables The qcow2 L2/refcount cache contains one separate table for each cache entry. Doing one allocation per table adds unnecessary overhead and it also requires us to store the address of each table separately. Since the size of the cache is constant during its lifetime, it's better to have an array that contains all the tables using one single allocation. In my tests measuring freshly created caches with sizes 128MB (L2) and 32MB (refcount) this uses around 10MB of RAM less. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-05-22 17:08:01 +02:00
Kevin Wolf	e4603fe139	qcow2: Fix header update with overridden backing file In recent qemu versions, it is possible to override the backing file name and format that is stored in the image file with values given at runtime. In such cases, the temporary override could end up in the image header if the qcow2 header was updated, while obviously correct behaviour would be to leave the on-disk backing file path/format unchanged. Fix this and add a test case for it. Reported-by: Michael Tokarev <mjt@tls.msk.ru> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 1428411796-2852-1-git-send-email-kwolf@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-04-08 10:29:20 +01:00
Max Reitz	7453c96b78	qcow2: Helper function for refcount modification Since refcounts do not always have to be a uint16_t, all refcount blocks and arrays in memory should not have a specific type (thus they become pointers to void) and for accessing them, two helper functions are used (a getter and a setter). Those functions are called indirectly through function pointers in the BDRVQcowState so they may later be exchanged for different refcount orders. With the check and repair functions using this function, the refcount array they are creating will be in big endian byte order; additionally, using realloc_refcount_array() makes the size of this refcount array always cluster-aligned. Both combined allow rebuild_refcount_structure() to drop the bounce buffer which was used to convert parts of the refcount array to big endian byte order and store them on disk. Instead, those parts can now be written directly. [ kwolf: Fixed a build failure on 32 bit and another with old glib ] Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-03-10 14:02:21 +01:00
Max Reitz	0e06528e98	qcow2: Use 64 bits for refcount values Refcounts may have a width of up to 64 bits, so qemu should use the same width to represent refcount values internally. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-03-10 14:02:21 +01:00
Max Reitz	2aabe7c7a1	qcow2: Use unsigned addend for update_refcount() update_refcount() and qcow2_update_cluster_refcount() currently take a signed addend. At least one caller passes a value directly derived from an absolute refcount that should be reached ("l2_refcount - 1" in expand_zero_clusters_in_l1()). Therefore, the addend should be unsigned as well; this will be especially important for 64 bit refcounts. Because update_refcount() then no longer knows whether the refcount should be increased or decreased, it now requires an additional flag which specified exactly that. The same applies to qcow2_update_cluster_refcount(). Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-03-10 14:02:21 +01:00
Max Reitz	7324c10f96	qcow2: Only return status from qcow2_get_refcount Refcounts can theoretically be of type uint64_t; in order to be able to represent the full range, qcow2_get_refcount() cannot use a single variable to represent both all refcount values and also keep some values reserved for errors. One solution would be to add an Error pointer parameter to qcow2_get_refcount(); however, no caller could (currently) pass that error message, so it would have to be emitted immediately and be passed to the next caller by returning -EIO or something similar. Therefore, an Error parameter does not offer any advantages here. The solution applied by this patch is simpler to use. Because no caller would be able to pass the error message, they would have to print it and free it, whereas with this patch the caller only needs to pass the returned integer (which is often a no-op from the code perspective, because that integer will be stored in a variable "ret" which will be returned by the fail path of many callers). Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-03-10 14:02:21 +01:00
Max Reitz	346a53df38	qcow2: Add two new fields to BDRVQcowState Add two new fields regarding refcount information (the bit width of every entry and the maximum refcount value) to the BDRVQcowState. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-03-10 14:02:20 +01:00
Kevin Wolf	20a1f9d071	qcow2: Remove unused struct QCowCreateState The only user went away five years ago with commit `a9420734` ('qcow2: Simplify image creation'). It's about time to remove it. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-03-09 11:12:00 +01:00
Max Reitz	44751917db	block/qcow2: Make get_refcount() global Reading the refcount of a cluster is an operation which can be useful in all of the qcow2 code, so make that function globally available. While touching this function, amend the comment describing the "addend" parameter: It is (no longer, if it ever was) necessary to have it set to -1 or 1; any value is fine. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Benoît Canet <benoit.canet@nodalink.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Benoit Canet <benoit@irqsave.net> Message-id: 1414404776-4919-6-git-send-email-mreitz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-11-03 11:41:49 +00:00
Max Reitz	4057a2b24a	block/qcow2: Implement status CB for amend The only really time-consuming operation potentially performed by qcow2_amend_options() is zero cluster expansion when downgrading qcow2 images from compat=1.1 to compat=0.10, so report status of that operation and that operation only through the status CB. For this, approximate the progress as the number of L1 entries visited during the operation. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Benoît Canet <benoit.canet@nodalink.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Benoit Canet <benoit@irqsave.net> Message-id: 1414404776-4919-5-git-send-email-mreitz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-11-03 11:41:49 +00:00
Max Reitz	808c4b6f30	qcow2: Allow "full" discard Normally, discarded sectors should read back as zero. However, there are cases in which a sector (or rather cluster) should be discarded as if they were never written in the first place, that is, reading them should fall through to the backing file again. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-id: 1414159063-25977-2-git-send-email-mreitz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-11-03 11:41:47 +00:00
Max Reitz	17bd5f4727	qcow2: Drop REFCOUNT_SHIFT With BDRVQcowState.refcount_block_bits, we don't need REFCOUNT_SHIFT anymore. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-10-23 15:34:02 +02:00
Max Reitz	1d13d65466	qcow2: Calculate refcount block entry count The size of a refblock entry is (in theory) variable; calculate therefore the number of entries per refblock and the according bit shift (1 << x == entry count) when opening an image. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-10-23 15:34:01 +02:00
Max Reitz	ee42b5ce0b	qcow2: Add overlap-check.template option Being able to set the overlap-check option to a string and then refine it via the overlap-check.* options is a nice idea for the command line but does not work so well for non-flattened dicts. In that case, one can only specify either but not both, so add a field to overlap-check.* which does the same as directly specifying overlap-check but can be used in conjunction with the other fields in non-flattened dicts. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 1408557576-14574-4-git-send-email-mreitz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-09-22 11:39:34 +01:00
Max Reitz	85186ebdac	qcow2: Add qcow2_signal_corruption() Add a helper function for easily marking an image corrupt (on fatal corruptions) while outputting an informative message to stderr and via QAPI. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Benoît Canet <benoit.canet@nodalink.com> Message-id: 1409926039-29044-3-git-send-email-mreitz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-09-22 11:39:27 +01:00
Max Reitz	6c1c8d5d3e	qcow2: Add runtime options for cache sizes Add options for specifying the size of the metadata caches. This can either be done directly for each cache (if only one is given, the other will be derived according to a default ratio) or combined for both. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-08-20 11:51:28 +02:00
Max Reitz	440ba08aea	qcow2: Constant cache size in bytes Specifying the metadata cache sizes in clusters results in less clusters (and much less bytes) covered for small cluster sizes and vice versa. Using a constant byte size reduces this difference, and makes it possible to manually specify the cache size in an easily comprehensible unit. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-08-20 11:51:28 +02:00
Kevin Wolf	5dae6e30c5	qcow2: Limit snapshot table size Even with a limit of 64k snapshots, each snapshot could have a filename and an ID with up to 64k, which would still lead to pretty large allocations, which could potentially lead to qemu aborting. Limit the total size of the snapshot table to an average of 1k per entry when the limit of 64k snapshots is fully used. This should be plenty for any reasonable user. This also fixes potential integer overflows of s->snapshot_size. Suggested-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-04-01 15:22:35 +02:00
Kevin Wolf	6a83f8b5be	qcow2: Check maximum L1 size in qcow2_snapshot_load_tmp() (CVE-2014-0143) This avoids an unbounded allocation. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-04-01 15:22:35 +02:00
Kevin Wolf	bb572aefbd	qcow2: Fix types in qcow2_alloc_clusters and alloc_clusters_noref In order to avoid integer overflows. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-04-01 15:22:34 +02:00
Kevin Wolf	2b5d5953ee	qcow2: Check new refcount table size on growth If the size becomes larger than what qcow2_open() would accept, fail the growing operation. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-04-01 15:22:34 +02:00
Kevin Wolf	ce48f2f441	qcow2: Validate snapshot table offset/size (CVE-2014-0144) This avoid unbounded memory allocation and fixes a potential buffer overflow on 32 bit hosts. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-04-01 14:19:09 +02:00
Hu Tao	16f0587e0a	qcow2: remove n_start and n_end of qcow2_alloc_cluster_offset() n_start can be actually calculated from offset. The number of sectors to be allocated(n_end - n_start) can be passed in in num. By removing n_start and n_end, we can save two parameters. The side effect is there is a bug in qcow2.c:preallocate() that passes incorrect n_start to qcow2_alloc_cluster_offset() is fixed. The bug can be triggerred by a larger cluster size than the default value(65536), for example: ./qemu-img create -f qcow2 \ -o 'cluster_size=131072,preallocation=metadata' file.img 4G Signed-off-by: Hu Tao <hutao@cn.fujitsu.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Benoit Canet <benoit@irqsave.net> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-02-09 09:12:39 +01:00
Hu Tao	46bae92713	qcow2: fix wrong value of L1E_OFFSET_MASK, L2E_OFFSET_MASK and REFT_OFFSET_MASK Accoring to qcow spec, the offset fields in l1e, l2e and ref table entry start at bit 9. The offset is cluster offset, and the smallest possible cluster size is 512 bytes. Signed-off-by: Hu Tao <hutao@cn.fujitsu.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-01-24 14:33:00 +01:00
Wenchao Xia	7b4c4781e3	snapshot: distinguish id and name in load_tmp Since later this function will be used so improve it. The only caller of it now is qemu-img, and it is not impacted by introduce function bdrv_snapshot_load_tmp_by_id_or_name() that call bdrv_snapshot_load_tmp() twice to keep old search logic. bdrv_snapshot_load_tmp_by_id_or_name() return int to let caller know the errno, and errno will be used later. Also fix a typo in comments of bdrv_snapshot_delete(). Signed-off-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-12-04 15:19:00 +01:00
Max Reitz	4a273c398b	qcow2: Add more overlap check bitmask macros Introduces the macros QCOW2_OL_CONSTANT and QCOW2_OL_ALL in addition to the already existing QCOW2_OL_CACHED, signifying all metadata overlap checks that can be performed in constant time (regardless of image size etc.) and truly all available overlap checks, respectively. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-10-11 16:50:00 +02:00
Max Reitz	05de7e86ca	qcow2: Add overlap-check options Add runtime options to tune the overlap checks to be performed before write accesses. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-10-11 16:50:00 +02:00
Max Reitz	3e3553905c	qcow2: Make overlap check mask variable Replace the QCOW2_OL_DEFAULT macro by a variable overlap_check in BDRVQcowState. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-10-11 16:50:00 +02:00
Max Reitz	231bb26764	qcow2: Use negated overflow check mask In qcow2_check_metadata_overlap and qcow2_pre_write_overlap_check, change the parameter signifying the checks to perform from its current positive form to a negative one, i.e., it will no longer explicitly specify every check to perform but rather a mask of checks not to perform. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-10-11 16:50:00 +02:00
Jeff Cody	c4217f645d	block: qcow2 - used QEMU_PACKED for on-disk structures QCowHeader and QCowExtension are structs that reside in the on-disk image format, and are read and written directly via bdrv_pread()/write(), and as such should be packed to avoid any unintentional struct padding. Signed-off-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-09-25 20:51:13 +02:00
Wenchao Xia	a89d89d3e6	snapshot: distinguish id and name in snapshot delete Snapshot creation actually already distinguish id and name since it take a structured parameter sn, but delete can't. Later an accurate delete is needed in qmp_transaction abort and blockdev-snapshot-delete-sync, so change its prototype. Also errp is added to tip error, but return value is kepted to let caller check what kind of error happens. Existing caller for it are savevm, delvm and qemu-img, they are not impacted by introducing a new function bdrv_snapshot_delete_by_id_or_name(), which check the return value and do the operation again. Before this patch: For qcow2, it search id first then name to find the one to delete. For rbd, it search name. For sheepdog, it does nothing. After this patch: For qcow2, logic is the same by call it twice in caller. For rbd, it always fails in delete with id, but still search for name in second try, no change to user. Some code for *errp is based on Pavel's patch. Signed-off-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com> Signed-off-by: Pavel Hrdina <phrdina@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-09-12 10:12:47 +02:00
Max Reitz	b6481f376b	qcow2: Save refcount order in BDRVQcowState Save the image refcount order in BDRVQcowState. This will be relevant for future code supporting different refcount orders than four and also for code that needs to verify a certain refcount order for an opened image. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-09-12 10:12:46 +02:00
Max Reitz	32b6444d23	qcow2-cluster: Expand zero clusters Add functionality for expanding zero clusters. This is necessary for downgrading the image version to one without zero cluster support. For non-backed images, this function may also just discard zero clusters instead of truly expanding them. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-09-12 10:12:46 +02:00
Max Reitz	e7108feaac	qcow2-cache: Empty cache Add a function for emptying a cache, i.e., flushing it and marking all elements invalid. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-09-12 10:12:46 +02:00
Kevin Wolf	1ebf561c11	qcow2: Discard VM state in active L1 after creating snapshot During savevm, the VM state is written to the active L1 of the image and then a snapshot is taken. After that, the VM state isn't needed any more in the active L1 and should be discarded. This is implemented by this patch. The impact of not discarding the VM state is that a snapshot can never become smaller than any previous snapshot (because it would be padded with old VM state), and more importantly that future savevm operations cause unnecessary COWs (with associated flushes), which makes subsequent snapshots much slower. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2013-09-12 10:12:46 +02:00
Kevin Wolf	670df5e3b4	qcow2: Pass discard type to qcow2_discard_clusters() The function will be used internally instead of only being called for guest discard requests. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2013-09-12 10:12:46 +02:00
Max Reitz	e23e400ec6	qcow2-refcount: Repair OFLAG_COPIED errors Since the OFLAG_COPIED checks are now executed after the refcounts have been repaired (if repairing), it is safe to assume that they are correct but the OFLAG_COPIED flag may be not. Therefore, if its value differs from what it should be (considering the according refcount), that discrepancy can be repaired by correctly setting (or clearing that flag. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-08-30 15:48:44 +02:00
Max Reitz	a40f1c2add	qcow2: Metadata overlap checks Two new functions are added; the first one checks a given range in the image file for overlaps with metadata (main header, L1 tables, L2 tables, refcount table and blocks). The second one should be used immediately before writing to the image file as it calls the first function and, upon collision, marks the image as corrupt and makes the BDS unusable, thereby preventing further access. Both functions take a bitmask argument specifying the structures which should be checked for overlaps, making it possible to also check metadata writes against colliding with other structures. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-08-30 15:48:43 +02:00
Max Reitz	69c9872653	qcow2: Add corrupt bit This adds an incompatible bit indicating corruption to qcow2. Any image with this bit set may not be written to unless for repairing (and subsequently clearing the bit if the repair has been successful). Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-08-30 15:48:43 +02:00
Peter Maydell	127c84e1a5	block/qcow2.h: Avoid "1LL << 63" (shifts into sign bit) The expression "1LL << 63" tries to shift the 1 into the sign bit of a 'long long', which provokes a clang sanitizer warning: runtime error: left shift of 1 by 63 places cannot be represented in type 'long long' Use "1ULL << 63" as the definition of QCOW_OFLAG_COPIED instead to avoid this. For consistency, we also update the other QCOW_OFLAG definitions to use the ULL suffix rather than LL, though only the shift by 63 is undefined behaviour. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-08-30 15:28:52 +02:00
Kevin Wolf	64aa99d3e0	qcow2: Use dashes instead of underscores in options This is what QMP wants to use. The options haven't been enabled in any release yet, so we're still free to change them. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2013-07-26 21:59:56 +02:00
Kevin Wolf	0b919fae31	qcow2: Batch discards This optimises the discard operation for freed clusters by batching discard requests (both snapshot deletion and bdrv_discard end up updating the refcounts cluster by cluster). Note that we don't discard asynchronously, but keep s->lock held. This is to avoid that a freed cluster is reallocated and written to while the discard is still in flight. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-06-24 10:25:17 +02:00
Kevin Wolf	67af674e47	qcow2: Options to enable discard for freed clusters Deleted snapshots are discarded in the image file by default, discard requests take their default from the -drive discard=... option and other places that free clusters must always be enabled explicitly. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-06-24 10:25:17 +02:00
Kevin Wolf	6cfcb9b8b9	qcow2: Add refcount update reason to all callers This adds a refcount update reason to all callers of update_refcounts(), so that a follow-up patch can use this information to decide whether clusters that reach a refcount of 0 should be discarded in the image file. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-06-24 10:25:17 +02:00
Kevin Wolf	2cf7cfa1cd	qcow2: Catch some L1 table index overflows This catches the situation that is described in the bug report at https://bugs.launchpad.net/qemu/+bug/865518 and goes like this: $ qemu-img create -f qcow2 huge.qcow2 $((10241024))T Formatting 'huge.qcow2', fmt=qcow2 size=1152921504606846976 encryption=off cluster_size=65536 lazy_refcounts=off $ qemu-io /tmp/huge.qcow2 -c "write $((102410241024102410241024 - 1024)) 512" Segmentation fault With this patch applied the segfault will be avoided, however the case will still fail, though gracefully: $ qemu-img create -f qcow2 /tmp/huge.qcow2 $((10241024))T Formatting 'huge.qcow2', fmt=qcow2 size=1152921504606846976 encryption=off cluster_size=65536 lazy_refcounts=off qemu-img: The image size is too large for file format 'qcow2' Note that even long before these overflow checks kick in, you get insanely high memory usage (up to INT_MAX sizeof(uint64_t) = 16 GB for the L1 table), so with somewhat smaller image sizes you'll probably see qemu aborting for a failed g_malloc(). If you need huge image sizes, you should increase the cluster size to the maximum of 2 MB in order to get higher limits. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-05-14 16:44:33 +02:00
Aurelien Jarno	753d9b82c5	aes: move aes.h from include/block to include/qemu Move aes.h from include/block to include/qemu to show it can be reused by other subsystems. Cc: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Edgar E. Iglesias <edgar.iglesias@gmail.com> Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2013-04-13 13:51:57 +02:00
Kevin Wolf	88c6588c51	qcow2: Allow requests with multiple l2metas Instead of expecting a single l2meta, have a list of them. This allows to still have a single I/O request for the guest data, even though multiple l2meta may be needed in order to describe both a COW overwrite and a new cluster allocation (typical sequential write case). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-03-28 11:52:44 +01:00
Kevin Wolf	c37f4cd71d	qcow2: Finalise interface of handle_alloc() The interface works completely on a byte granularity now and duplicated parameters are removed. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-03-28 11:52:43 +01:00
Kevin Wolf	3b8e2e260c	qcow2: handle_alloc(): Get rid of keep_clusters parameter handle_alloc() is now called with the offset at which the actual new allocation starts instead of the offset at which the whole write request starts, part of which may already be processed. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-03-28 11:52:43 +01:00
Kevin Wolf	65eb2e35c0	qcow2: Change handle_dependency to byte granularity This is a more precise description of what really constitutes a dependency. The behaviour doesn't change at this point because the COW area of the old request is still aligned to cluster boundaries and therefore an overlap is detected wheneven the requests touch any part of the same cluster. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-03-28 11:52:43 +01:00
Kevin Wolf	17a71e5823	qcow2: Handle dependencies earlier Handling overlapping allocations isn't just a detail of cluster allocation. It is rather one of three ways to get the host cluster offset for a write request: 1. If a request overlaps an in-flight allocations, the cluster offset can be taken from there (this is what handle_dependencies will evolve into) or the request must just wait until the allocation has completed. Accessing the L2 is not valid in this case, it has outdated information. 2. Outside overlapping areas, check the clusters that can be written to as they are, with no COW involved. 3. If a COW is required, allocate new clusters Changing the code to reflect this doesn't change the behaviour because overlaps cannot exist for clusters that are kept in step 2. It does however make it easier for later patches to work on clusters that belong to an allocation that is still in flight. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-03-28 11:52:42 +01:00
Kevin Wolf	acdfb480ba	qcow2: Fix segfault in qcow2_invalidate_cache Need to pass an options QDict to qcow2_open() now. This fixes a segfault on the migration target with qcow2. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-03-19 11:48:36 +01:00
Kevin Wolf	74c4510a3c	qcow2: Allow lazy refcounts to be enabled on the command line qcow2 images now accept a boolean lazy_refcounts options. Use it like this: -drive file=test.qcow2,lazy_refcounts=on If the option is specified on the command line, it overrides the default specified by the qcow2 header flags that were set when creating the image. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-03-15 16:07:49 +01:00
Paolo Bonzini	737e150e89	block: move include files to include/block/ Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2012-12-19 08:31:31 +01:00
Kevin Wolf	280d373579	qcow2: Enable dirty flag in qcow2_alloc_cluster_link_l2 This is closer to where the dirty flag is really needed, and it avoids having checks for special cases related to cluster allocation directly in the writev loop. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-12-13 15:37:59 +01:00
Kevin Wolf	f50f88b9fe	qcow2: Allocate l2meta only for cluster allocations Even for writes to already allocated clusters, an l2meta is allocated, though it stays effectively unused. After this patch, only allocating requests still have one. Each l2meta now describes an in-flight request that writes to clusters that are not yet hooked up in the L2 table. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-12-13 15:37:59 +01:00
Kevin Wolf	060bee8943	qcow2: Drop l2meta.cluster_offset There's no real reason to have an l2meta for normal requests that don't allocate anything. Before we can get rid of it, we must return the host cluster offset in a different way. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-12-13 15:37:59 +01:00
Kevin Wolf	593fb83cac	qcow2: Introduce Qcow2COWRegion This makes it easier to address the areas for which a COW must be performed. As a nice side effect, the COW code in qcow2_alloc_cluster_link_l2 becomes really trivial. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-12-13 15:37:59 +01:00
Kevin Wolf	1d3afd649b	qcow2: Round QCowL2Meta.offset down to cluster boundary The offset within the cluster is already present as n_start and this is what the code uses. QCowL2Meta.offset is only needed at a cluster granularity. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-12-13 15:37:59 +01:00
Stefan Hajnoczi	bfe8043e92	qcow2: implement lazy refcounts Lazy refcounts is a performance optimization for qcow2 that postpones refcount metadata updates and instead marks the image dirty. In the case of crash or power failure the image will be left in a dirty state and repaired next time it is opened. Reducing metadata I/O is important for cache=writethrough and cache=directsync because these modes guarantee that data is on disk after each write (hence we cannot take advantage of caching updates in RAM). Refcount metadata is not needed for guest->file block address translation and therefore does not need to be on-disk at the time of write completion - this is the motivation behind the lazy refcount optimization. The lazy refcount optimization must be enabled at image creation time: qemu-img create -f qcow2 -o compat=1.1,lazy_refcounts=on a.qcow2 10G qemu-system-x86_64 -drive if=virtio,file=a.qcow2,cache=writethrough Update qemu-iotests 031 and 036 since the extension header size changes when we add feature bit table entries. Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-08-06 22:39:14 +02:00
Stefan Hajnoczi	c61d0004bc	qcow2: introduce dirty bit This patch adds an incompatible feature bit to mark images that have not been closed cleanly. When a dirty image file is opened a consistency check and repair is performed. Update qemu-iotests 031 and 036 since the extension header size changes when we add feature bit table entries. Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-08-06 22:39:14 +02:00
Paolo Bonzini	6af4e9ead4	qcow2: always operate caches in writeback mode Writethrough does not need special-casing anymore in the qcow2 caches. The block layer adds flushes after every guest-initiated data write, and these will also flush the qcow2 caches to the OS. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-06-15 14:03:43 +02:00
Kevin Wolf	166acf546f	qcow2: Support for fixing refcount inconsistencies Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-06-15 14:03:42 +02:00
Kevin Wolf	621f058940	qcow2: Zero write support Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-04-20 15:57:30 +02:00
Kevin Wolf	cfcc4c62ff	qcow2: Support for feature table header extension Instead of printing an ugly bitmask, qemu can now print a more helpful string even for yet unknown features. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-04-20 15:57:29 +02:00
Kevin Wolf	6377af48b0	qcow2: Support reading zero clusters This adds support for reading zero clusters in version 3 images. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-04-20 15:57:29 +02:00
Kevin Wolf	6744cbab8c	qcow2: Version 3 images This adds the basic infrastructure to qcow2 to handle version 3 images. It includes code to create v3 images, allow header updates for v3 images and checks feature bits. It still misses support for zero clusters, so this is not a fully compliant implementation of v3 yet. The default for creating new images stays at v2 for now. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-04-20 15:57:29 +02:00
Kevin Wolf	76dc9e0c8f	qcow2: Ignore reserved bits in refcount table entries Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-04-20 15:57:29 +02:00
Kevin Wolf	68d000a390	qcow2: Ignore reserved bits in get_cluster_offset With this change, reading from a qcow2 image ignores all reserved bits that are set in an L1 or L2 table entry. Now get_cluster_offset() assigns *cluster_offset only the offset without any other flags. The cluster type is not longer encoded in the offset, but a positive return value in case of success. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-04-20 15:57:27 +02:00
Kevin Wolf	90b277593d	qcow2: Save disk size in snapshot header This allows that different snapshots of an image can have different sizes, which is a requirement for enabling image resizing even with images that have internal snapshots. We don't do the actual support for it now, but make sure that the additional field is present and not completely ignored in all version 3 images. When trying to load a snapshot of different size, it returns an error. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-04-20 15:57:27 +02:00
Kevin Wolf	250196f19c	qcow2: Reduce number of I/O requests If the first part of a write request is allocated, but the second isn't and it can be allocated so that the resulting area is contiguous, handle it at once. This is a common case for sequential writes. After this patch, alloc_cluster_offset() only checks if the clusters are already allocated or how many new clusters can be allocated contigouosly. The actual cluster allocation is split off into a new function do_alloc_cluster_offset(). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>	2012-03-12 15:14:07 +01:00
Kevin Wolf	256900b16b	qcow2: Add qcow2_alloc_clusters_at() This function allows to allocate clusters at a given offset in the image file. This is useful if you want to allocate the second part of an area that must be contiguous. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>	2012-03-12 15:14:07 +01:00
Kevin Wolf	75bab85ca0	qcow2: Keep unknown header extension when rewriting header If we want header extensions to work as compatible extensions, we can't destroy yet unknown header extensions when rewriting the header (e.g. for changing the backing file). Save all unknown header extensions in a list of blobs and include them in a new header. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-02-09 16:17:51 +01:00
Kevin Wolf	e24e49e619	qcow2: Update whole header at once In order to switch the backing file, qcow2 issues multiple write requests that only changed a part of the image header. Any failure after the first one would leave the header in an corrupted state. With this patch, the whole header is written at once, so we can't fail in the middle. At the same time, this gives us a reusable functions that updates all fields of the qcow2 header and not only the backing file. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-02-09 16:17:51 +01:00
Kevin Wolf	c2c9a46609	qcow2: Allow >4 GB VM state This is a compatible extension to the snapshot header format that allows saving a 64 bit VM state size. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2011-12-15 12:40:33 +01:00
Anthony Liguori	06d9260ffa	qcow2: implement bdrv_invalidate_cache (v2) We don't reopen the actual file, but instead invoke the close and open routines. We specifically ignore the backing file since it's contents are read-only and therefore immutable. Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>	2011-11-21 14:58:48 -06:00
Frediano Ziglio	a791236992	qcow2: removed unused depends_on field Signed-off-by: Frediano Ziglio <freddy77@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2011-09-12 15:17:17 +02:00
Frediano Ziglio	2f4b759367	qcow2: remove unused qcow2_create_refcount_update function Signed-off-by: Frediano Ziglio <freddy77@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2011-08-25 15:23:10 +02:00
Kevin Wolf	68d100e905	qcow2: Use coroutines Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2011-08-02 15:53:41 +02:00
Kevin Wolf	93913dfd8a	qcow2: Use Qcow2Cache in writeback mode during loadvm/savevm In snapshotting there is no guest involved, so we can safely use a writeback mode and do the flushes in the right place (i.e. at the very end). This improves the time that creating/restoring an internal snapshot takes with an image in writethrough mode. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2011-07-19 15:39:22 +02:00
Kevin Wolf	99cce9fa4e	qemu-img create: Fix displayed default cluster size When not specifying a cluster size on the command line, qemu-img printed a cluster size of 0: Formatting '/tmp/test.qcow2', fmt=qcow2 size=67108864 encryption=off cluster_size=0 This patch adds the default cluster size to the QEMUOptionParameter list, so that it displays the default value that is used. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2011-06-08 11:56:40 +02:00
Kevin Wolf	5ea929e3d1	qcow2: Add bdrv_discard support This adds a bdrv_discard function to qcow2 that frees the discarded clusters. It does not yet pass the discard on to the underlying file system driver, but the space can be reused by future writes to the image. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>	2011-01-31 10:03:00 +01:00
Kevin Wolf	3de0a2944b	qcow2: Batch flushes for COW qcow2 calls bdrv_flush() after performing COW in order to ensure that the L2 table change is never written before the copy is safe on disk. Now that the L2 table is cached, we can wait with flushing until we write out the next L2 table. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2011-01-24 16:41:49 +01:00
Kevin Wolf	29c1a7301a	qcow2: Use QcowCache Use the new functions of qcow2-cache.c for everything that works on refcount block and L2 tables. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2011-01-24 16:41:49 +01:00
Kevin Wolf	493810940b	qcow2: Add QcowCache This adds some new cache functions to qcow2 which can be used for caching refcount blocks and L2 tables. When used with cache=writethrough they work like the old caching code which is spread all over qcow2, so for this case we have merely a cleanup. The interesting case is with writeback caching (this includes cache=none) where data isn't written to disk immediately but only kept in cache initially. This leads to some form of metadata write batching which avoids the current "write to refcount block, flush, write to L2 table" pattern for each single request when a lot of cluster allocations happen. Instead, cache entries are only written out if its required to maintain the right order. In the pure cluster allocation case this means that all metadata updates for requests are done in memory initially and on sync, first the refcount blocks are written to disk, then fsync, then L2 tables. This improves performance of scenarios with lots of cluster allocations noticably (e.g. installation or after taking a snapshot). Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2011-01-24 11:08:51 +01:00
Kevin Wolf	80465c5016	block: Remove unused s->hd in various drivers All drivers use bs->file instead of s->hd for quite a while now, so it's time to remove s->hd. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>	2010-11-24 17:31:06 +01:00
edison	51ef67270b	Copy snapshots out of QCOW2 disk In order to backup snapshots, created from QCOW2 iamge, we want to copy snapshots out of QCOW2 disk to a seperate storage. The following patch adds a new option in "qemu-img": qemu-img convert -f qcow2 -O qcow2 -s snapshot_name src_img bck_img. Right now, it only supports to copy the full snapshot, delta snapshot is on the way. Changes from V1: all the comments from Kevin are addressed: Add read-only checking Fix coding style Change the name from bdrv_snapshot_load to bdrv_snapshot_load_tmp Signed-off-by: Disheng Su <edison@cloud.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2010-10-22 14:49:35 +02:00
Stefan Hajnoczi	72893756e0	qcow2: Support exact L1 table growth The L1 table grow operation includes a size calculation that bumps up the new L1 table size in order to anticipate the size needs of vmstate data. This helps reduce the number of times that the L1 table has to be grown when vmstate data is appended. This size overhead is not necessary during image creation, bdrv_truncate(), or snapshot goto operations. In fact, existing qemu-iotests that exercise table growth are no longer able to trigger it because image creation preallocates an L1 table that is too large after changes to qcow_create2(). This patch keeps the size calculation but also adds exact growth for callers that do not want to inflate the L1 table size unnecessarily. Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2010-10-22 14:49:35 +02:00
Kevin Wolf	bd28f83565	qcow2: Avoid bounce buffers for AIO read requests qcow2 used to use bounce buffers for any AIO requests. This does not only imply unnecessary copying, but also unbounded allocations which should be avoided. This patch removes bounce buffers from the normal AIO read path, and constrains them to a constant size for encrypted images. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2010-09-21 15:39:42 +02:00
Kevin Wolf	9ac228e02c	qcow2/vdi: Change check to distinguish error cases This distinguishes between harmless leaks and real corruption. Hopefully users better understand what qemu-img check wants to tell them. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2010-07-06 17:05:49 +02:00
Kevin Wolf	1c46efaa0a	qcow2: Allow qcow2_get_cluster_offset to return errors qcow2_get_cluster_offset() looks up a given virtual disk offset and returns the offset of the corresponding cluster in the image file. Errors (e.g. L2 table can't be read) are currenctly indicated by a return value of 0, which is unfortuately the same as for any unallocated cluster. So in effect we can't check for errors. This makes the old return value a by-reference parameter and returns the usual 0/-errno error code. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2010-05-28 13:29:11 +02:00
Stefan Hajnoczi	419b19d9b4	qcow2: Implement bdrv_truncate() for growing images This patch adds the ability to grow qcow2 images in-place using bdrv_truncate(). This enables qemu-img resize command support for qcow2. Snapshots are not supported and bdrv_truncate() will return -ENOTSUP. The notion of resizing an image with snapshots could lead to confusion: users may expect snapshots to remain unchanged, but this is not possible with the current qcow2 on-disk format where the header.size field is global instead of per-snapshot. Others may expect snapshots to change size along with the current image data. I think it is safest to not support snapshots and perhaps add behavior later if there is a consensus. Backing images continue to work. If the image is now larger than its backing image, zeroes are read when accessing beyond the end of the backing image. Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2010-05-03 10:07:32 +02:00
Kevin Wolf	66f82ceed6	block: Open the underlying image file in generic code Format drivers shouldn't need to bother with things like file names, but rather just get an open BlockDriverState for the underlying protocol. This patch introduces this behaviour for bdrv_open implementation. For protocols which need to access the filename to open their file/device/connection/... a new callback bdrv_file_open is introduced which doesn't get an underlying file opened. For now, also some of the more obscure formats use bdrv_file_open because they open() the file themselves instead of using the block.c functions. They need to be fixed in later patches. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2010-05-03 10:07:30 +02:00
Kevin Wolf	f4f0d391b2	qcow2: Fix signedness bugs Checking for return codes < 0 isn't really going to work with unsigned types. Use signed types instead. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>	2010-02-10 11:56:57 -06:00
Kevin Wolf	148da7ea9d	qcow2: Return 0/-errno in qcow2_alloc_cluster_offset Returning 0/-errno allows it to distingush different errors classes. The cluster offset of newly allocated clusters is now returned in the QCowL2Meta struct. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>	2010-01-26 14:59:19 -06:00
Kevin Wolf	72ecf02d7d	Revert "qcow2: Bring synchronous read/write back to life" It was merely a workaround and the real fix is done now. This reverts commit `ef845c3bf4`. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>	2009-10-27 12:28:59 -05:00
Kevin Wolf	ef845c3bf4	qcow2: Bring synchronous read/write back to life When the synchronous read and write functions were dropped, they were replaced by generic emulation functions. Unfortunately, these emulation functions don't provide the same semantics as the original functions did. The original bdrv_read would mean that we read some data synchronously and that we won't be interrupted during this read. The latter assumption is no longer true with the emulation function which needs to use qemu_aio_poll and therefore allows the callback of any other concurrent AIO request to be run during the read. Which in turn means that (meta)data read earlier could have changed and be invalid now. qcow2 is not prepared to work in this way and it's just scary how many places there are where other requests could run. I'm not sure yet where exactly it breaks, but you'll see breakage with virtio on qcow2 with a backing file. Providing synchronous functions again fixes the problem for me. Patchworks-ID: 35437 Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>	2009-10-15 09:32:04 -05:00
Kevin Wolf	80ee15a6b2	qcow2: Increase maximum cluster size to 2 MB This patch increases the maximum qcow2 cluster size to 2 MB. Starting with 128k clusters, L2 tables span 2 GB or more of virtual disk space, causing 32 bit truncation and wraparound of signed integers. Therefore some variables need to use a larger data type. While being at reviewing data types, change some integers that are used for array indices to unsigned. In some places they were checked against some upper limit but not for negative values. This could avoid potential segfaults with corrupted qcow2 images. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>	2009-10-05 09:32:52 -05:00
Blue Swirl	72cf2d4f0e	Fix sys-queue.h conflict for good Problem: Our file sys-queue.h is a copy of the BSD file, but there are some additions and it's not entirely compatible. Because of that, there have been conflicts with system headers on BSD systems. Some hacks have been introduced in the commits `15cc923584`, `f40d753718`, `96555a96d7` and `3990d09adf` but the fixes were fragile. Solution: Avoid the conflict entirely by renaming the functions and the file. Revert the previous hacks. Signed-off-by: Blue Swirl <blauwirbel@gmail.com>	2009-09-12 07:36:22 +00:00
Kevin Wolf	f214978a42	qcow2: Order concurrent AIO requests on the same unallocated cluster When two AIO requests write to the same cluster, and this cluster is unallocated, currently both requests allocate a new cluster and the second one merges the first one when it is completed. This means an cluster allocation, a read and a cluster deallocation which cause some overhead. If we simply let the second request wait until the first one is done, we improve overall performance with AIO requests (specifially, qcow2/virtio combinations). This patch maintains a list of in-flight requests that have allocated new clusters. A second request touching the same cluster is limited so that it either doesn't touch the allocation of the first request (so it can have a non-overlapping allocation) or it waits for the first request to complete. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>	2009-09-09 17:31:26 -05:00
Filip Navara	14899cdf3a	Fix QCOW2 debugging code to compile again Updated to use C99 comments. Signed-off-by: Filip Navara <filip.navara@gmail.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>	2009-06-29 08:52:40 -05:00
Kevin Wolf	ed6ccf0f51	qcow2: Rename global functions The qcow2 source is now split into several more manageable files. During the conversion quite some functions that were static before needed to be changed to be global to make the source compile again. We were lucky enough not to get name conflicts with these additional global names, but they are not nice. This patch adds a qcow2_ prefix to all of the global functions in qcow2. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>	2009-06-16 15:18:36 -05:00
Kevin Wolf	c142442b06	qcow2: Split out snapshot functions qcow2-snapshot.c contains the code related to snapshotting. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>	2009-06-16 15:18:36 -05:00
Kevin Wolf	45aba42fba	qcow2: Split out guest cluster functions qcow2-cluster.c contains all functions related to the management of guest clusters, i.e. what the guest sees on its virtual disk. This code is about mapping these guest clusters to host clusters in the image file using the two-level lookup tables. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>	2009-06-16 15:18:36 -05:00
Kevin Wolf	f7d0fe0239	qcow2: Split out refcount handling qcow2-refcount.c contains all functions which are related to cluster allocation and management in the image file. A large part of this is the reference counting of these clusters. Also a header file qcow2.h is introduced which will contain the interface of the split qcow2 modules. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>	2009-06-16 15:18:36 -05:00

1 2 3 4 5

244 Commits