qemu-e2k

Author	SHA1	Message	Date
Fam Zheng	4a9c9ea0d3	block: Detect multiplication overflow in bdrv_getlength Bogus image may have a large total_sectors that will overflow the multiplication. For cleanness, fix the return code so the error message will be meaningful. Reported-by: Richard W.M. Jones <rjones@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-05-22 17:08:01 +02:00
Denis V. Lunev	459b4e6612	block: align bounce buffers to page The following sequence int fd = open(argv[1], O_RDWR \| O_CREAT \| O_DIRECT, 0644); for (i = 0; i < 100000; i++) write(fd, buf, 4096); performs 5% better if buf is aligned to 4096 bytes. The difference is quite reliable. On the other hand we do not want at the moment to enforce bounce buffering if guest request is aligned to 512 bytes. The patch changes default bounce buffer optimal alignment to MAX(page size, 4k). 4k is chosen as maximal known sector size on real HDD. The justification of the performance improve is quite interesting. From the kernel point of view each request to the disk was split by two. This could be seen by blktrace like this: 9,0 11 1 0.000000000 11151 Q WS 312737792 + 1023 [qemu-img] 9,0 11 2 0.000007938 11151 Q WS 312738815 + 8 [qemu-img] 9,0 11 3 0.000030735 11151 Q WS 312738823 + 1016 [qemu-img] 9,0 11 4 0.000032482 11151 Q WS 312739839 + 8 [qemu-img] 9,0 11 5 0.000041379 11151 Q WS 312739847 + 1016 [qemu-img] 9,0 11 6 0.000042818 11151 Q WS 312740863 + 8 [qemu-img] 9,0 11 7 0.000051236 11151 Q WS 312740871 + 1017 [qemu-img] 9,0 5 1 0.169071519 11151 Q WS 312741888 + 1023 [qemu-img] After the patch the pattern becomes normal: 9,0 6 1 0.000000000 12422 Q WS 314834944 + 1024 [qemu-img] 9,0 6 2 0.000038527 12422 Q WS 314835968 + 1024 [qemu-img] 9,0 6 3 0.000072849 12422 Q WS 314836992 + 1024 [qemu-img] 9,0 6 4 0.000106276 12422 Q WS 314838016 + 1024 [qemu-img] and the amount of requests sent to disk (could be calculated counting number of lines in the output of blktrace) is reduced about 2 times. Both qemu-img and qemu-io are affected while qemu-kvm is not. The guest does his job well and real requests comes properly aligned (to page). Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-id: 1431441056-26198-3-git-send-email-den@openvz.org CC: Paolo Bonzini <pbonzini@redhat.com> CC: Kevin Wolf <kwolf@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-05-22 09:37:33 +01:00
Denis V. Lunev	4196d2f030	block: minimal bounce buffer alignment The patch introduces new concept: minimal memory alignment for bounce buffers. Original so called "optimal" value is actually minimal required value for aligment. It should be used for validation that the IOVec is properly aligned and bounce buffer is not required. Though, from the performance point of view, it would be better if bounce buffer or IOVec allocated by QEMU will be aligned stricter. The patch does not change any alignment value yet. Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-id: 1431441056-26198-2-git-send-email-den@openvz.org CC: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-05-22 09:37:33 +01:00
Stefan Hajnoczi	61007b316c	block: move I/O request processing to block/io.c The block.c file has grown to over 6000 lines. It is time to split this file so there are fewer conflicts and the code is easier to maintain. Extract I/O request processing code: * Read * Write * Zero writes and making the image empty * Flush * Discard * ioctl * Tracked requests and queuing * Throttling and copy-on-read * Block status and allocated functions * Refreshing block limits * Reading/writing vmstate * qemu_blockalign() and friends The patch simply moves code from block.c into block/io.c. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-04-28 15:36:17 +02:00
Stefan Hajnoczi	0eb7217e49	block: extract bdrv_setup_io_funcs() Move the code to install coroutine and aio emulation function pointers in a BlockDriver to its own function. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-04-28 15:36:17 +02:00
Stefan Hajnoczi	e0c47b6cb1	block: add bdrv_set_dirty()/bdrv_reset_dirty() to block_int.h The dirty bitmap functions are called from the block I/O processing code. Make them visible to block_int.h users so they can be used outside block.c. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-04-28 15:36:17 +02:00
Stefan Hajnoczi	4f5472cb2d	block: replace bdrv_states iteration with bdrv_next() The bdrv_states list is a static variable in block.c. bdrv_drain_all() and bdrv_flush_all() use this variable to iterate over all drives. The next patch will move bdrv_drain_all() and bdrv_flush_all() out of block.c so it's necessary to switch to the public bdrv_next() interface. Reviewed-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-04-28 15:36:17 +02:00
John Snow	ce1ffea8cd	block: Resize bitmaps on bdrv_truncate Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 1429314609-29776-16-git-send-email-jsnow@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-04-28 15:36:10 +02:00
John Snow	20dca81075	block: Ensure consistent bitmap function prototypes We often don't need the BlockDriverState for functions that operate on bitmaps. Remove it. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 1429314609-29776-15-git-send-email-jsnow@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-04-28 15:36:10 +02:00
John Snow	aa0c7ca506	block: add BdrvDirtyBitmap documentation Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 1429314609-29776-14-git-send-email-jsnow@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-04-28 15:36:10 +02:00
John Snow	a113534ffb	qmp: Add dirty bitmap status field in query-block Add the "frozen" status booleans, to inform clients when a bitmap is occupied doing a task. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 1429314609-29776-13-git-send-email-jsnow@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-04-28 15:36:10 +02:00
John Snow	e74e6b78e6	qmp: add block-dirty-bitmap-clear Add bdrv_clear_dirty_bitmap and a matching QMP command, qmp_block_dirty_bitmap_clear that enables a user to reset the bitmap attached to a drive. This allows us to reset a bitmap in the event of a full drive backup. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 1429314609-29776-12-git-send-email-jsnow@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-04-28 15:36:10 +02:00
John Snow	d58d845397	qmp: Add support of "dirty-bitmap" sync mode for drive-backup For "dirty-bitmap" sync mode, the block job will iterate through the given dirty bitmap to decide if a sector needs backup (backup all the dirty clusters and skip clean ones), just as allocation conditions of "top" sync mode. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 1429314609-29776-11-git-send-email-jsnow@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-04-28 15:36:10 +02:00
John Snow	9bd2b08f27	block: Add bitmap successors A bitmap successor is an anonymous BdrvDirtyBitmap that is intended to be created just prior to a sensitive operation (e.g. Incremental Backup) that can either succeed or fail, but during the course of which we still want a bitmap tracking writes. On creating a successor, we "freeze" the parent bitmap which prevents its deletion, enabling, anonymization, or creating a bitmap with the same name. On success, the parent bitmap can "abdicate" responsibility to the successor, which will inherit its name. The successor will have been tracking writes during the course of the backup operation. The parent will be safely deleted. On failure, we can "reclaim" the successor from the parent, unifying them such that the resulting bitmap describes all writes occurring since the last successful backup, for instance. Reclamation will thaw the parent, but not explicitly re-enable it. BdrvDirtyBitmap operations that target a single bitmap are protected by assertions that the bitmap is not frozen and/or disabled. BdrvDirtyBitmap operations that target a group of bitmaps, such as bdrv_{set,reset}_dirty will ignore frozen/disabled drives with a conditional instead. Internal functions that enable/disable dirty bitmaps have assertions added to them to prevent modifying frozen bitmaps. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 1429314609-29776-10-git-send-email-jsnow@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-04-28 15:36:10 +02:00
John Snow	b8e6fb752e	block: Add bitmap disabled status Add a status indicating the enabled/disabled state of the bitmap. A bitmap is by default enabled, but you can lock the bitmap into a read-only state by setting disabled = true. A previous version of this patch added a QMP interface for changing the state of the bitmap, but it has since been removed for now until a use case emerges where this state must be revealed to the user. The disabled state WILL be used internally for bitmap migration and bitmap persistence. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 1429314609-29776-9-git-send-email-jsnow@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-04-28 15:36:10 +02:00
John Snow	592fdd02ae	block: Introduce bdrv_dirty_bitmap_granularity() This returns the granularity (in bytes) of dirty bitmap, which matches the QMP interface and the existing query interface. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 1429314609-29776-6-git-send-email-jsnow@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-04-28 15:36:10 +02:00
John Snow	341ebc2f81	qmp: Add block-dirty-bitmap-add and block-dirty-bitmap-remove The new command pair is added to manage a user created dirty bitmap. The dirty bitmap's name is mandatory and must be unique for the same device, but different devices can have bitmaps with the same names. The granularity is an optional field. If it is not specified, we will choose a default granularity based on the cluster size if available, clamped to between 4K and 64K to mirror how the 'mirror' code was already choosing granularity. If we do not have cluster size info available, we choose 64K. This code has been factored out into a helper shared with block/mirror. This patch also introduces the 'block_dirty_bitmap_lookup' helper, which takes a device name and a dirty bitmap name and validates the lookup, returning NULL and setting errp if there is a problem with either field. This helper will be re-used in future patches in this series. The types added to block-core.json will be re-used in future patches in this series, see: 'qapi: Add transaction support to block-dirty-bitmap-{add, enable, disable}' Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 1429314609-29776-5-git-send-email-jsnow@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-04-28 15:36:10 +02:00
John Snow	5fba6c0e50	qmp: Ensure consistent granularity type We treat this field with a variety of different types everywhere in the code. Now it's just uint32_t. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 1429314609-29776-4-git-send-email-jsnow@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-04-28 15:36:10 +02:00
Fam Zheng	0db6e54a8a	qapi: Add optional field "name" to block dirty bitmap This field will be set for user created dirty bitmap. Also pass in an error pointer to bdrv_create_dirty_bitmap, so when a name is already taken on this BDS, it can report an error message. This is not global check, two BDSes can have dirty bitmap with a common name. Implemented bdrv_find_dirty_bitmap to find a dirty bitmap by name, will be used later when other QMP commands want to reference dirty bitmap by name. Add bdrv_dirty_bitmap_make_anon. This unsets the name of dirty bitmap. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 1429314609-29776-3-git-send-email-jsnow@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-04-28 15:36:10 +02:00
Alberto Garcia	d5a8ee60a0	qmp: fill in the image field in BlockDeviceInfo The image field in BlockDeviceInfo is supposed to contain an ImageInfo object. However that is being filled in by bdrv_query_info(), not by bdrv_block_device_info(), which is where BlockDeviceInfo is actually created. Anyone calling bdrv_block_device_info() directly will get a null image field. As a consequence of this, the HMP command 'info block -n -v' crashes QEMU. This patch moves the code that fills in that field from bdrv_query_info() to bdrv_block_device_info(). Signed-off-by: Alberto Garcia <berto@igalia.com> Message-id: 1429271563-3765-1-git-send-email-berto@igalia.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-04-28 15:36:09 +02:00
Alberto Garcia	81e5f78a9f	block: use bdrv_get_device_or_node_name() in error messages There are several error messages that identify a BlockDriverState by its device name. However those errors can be produced in nodes that don't have a device name associated. In those cases we should use bdrv_get_device_or_node_name() to fall back to the node name and produce a more meaningful message. The messages are also updated to use the more generic term 'node' instead of 'device'. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 9823a1f0514fdb0692e92868661c38a9e00a12d6.1428485266.git.berto@igalia.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-04-28 15:36:09 +02:00
Alberto Garcia	9b2aa84f87	block: add bdrv_get_device_or_node_name() This function gets the device name associated with a BlockDriverState, or its node name if the device name is empty. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 4fa30aa8d61d9052ce266fd5429a59a14e941255.1428485266.git.berto@igalia.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-04-28 15:36:09 +02:00
Paolo Bonzini	0b5a24454f	block: avoid unnecessary bottom halves bdrv_aio_* APIs can use coroutines to achieve asynchronicity. However, the coroutine may terminate without having yielded back to the caller (for example because of something that invokes a nested event loop, or because the coroutine is doing nothing at all). In this case, the bdrv_aio_* API must delay the completion to the next iteration of the main loop, because bdrv_aio_* will never invoke the callback before returning. This can be done with a bottom half, and indeed bdrv_aio_* is always using one for simplicity. It is possible to gain some performance (~3%) by avoiding this in the common case. A new field in the BlockAIOCBCoroutine struct is set to true until the first time the corotine has yielded to its creator, and completion goes through a new function bdrv_co_complete. If the flag is false, bdrv_co_complete invokes the callback immediately. If it is true, the caller will notice that the coroutine has completed and schedule the bottom half itself. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 1427524638-28157-1-git-send-email-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-04-28 15:36:09 +02:00
Fam Zheng	69da3b0b47	block: Pause block jobs in bdrv_drain_all This is necessary to suppress more IO requests from being generated from block job coroutines. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-id: 1428069921-2957-3-git-send-email-famz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-04-28 15:36:09 +02:00
Fam Zheng	de50a20a4c	block: Switch to host monotonic clock for IO throttling Currently, throttle timers won't make any progress when VCPU is not running, which would stall the request queue in utils, qtest, vm suspending, and live migration, without special handling. Block jobs are confusingly inconsistent between with and without throttling: if user sets a bps limit, stops the vm, then start a block job, the block job will not make any progress; in contrary, if user unsets the bps limit, or if it's not set, the block job will run normally. After this patch, with the host clock, even if the VCPUs are stopped, the throttle queues will be processed. This patch also enables potential to add throttle to bdrv_drain_all. Currently all requests are drained immediately. In other words whenever it is called, IO throttling goes ineffective (examples: system reset, migration and many block job operations.). This is a loophole that guest could exploit. If we use the host clock, we can later just trust the nested poll. This could be done on top. Note that for qemu-iotests case 093, which uses qtest, we still keep vm clock so the script can control the clock stepping in order to be deterministic. Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Fam Zheng <famz@redhat.com> Message-id: 1427268446-6426-1-git-send-email-famz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-04-28 15:36:08 +02:00
Stefan Hajnoczi	786a4ea82e	Convert (ffs(val) - 1) to ctz32(val) This commit was generated mechanically by coccinelle from the following semantic patch: @@ expression val; @@ - (ffs(val) - 1) + ctz32(val) The call sites have been audited to ensure the ffs(0) - 1 == -1 case never occurs (due to input validation, asserts, etc). Therefore we don't need to worry about the fact that ctz32(0) == 32. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 1427124571-28598-5-git-send-email-stefanha@redhat.com Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-04-28 15:36:08 +02:00
Fam Zheng	fc3959e466	block: Fix unaligned zero write If the zero write is not aligned, bdrv_co_do_pwritev will segfault because of accessing to the NULL qiov passed in by bdrv_co_write_zeroes. Fix this by allocating a local qiov in bdrv_co_do_pwritev if the request is not aligned. (In this case the padding iovs are necessary anyway, so it doesn't hurt.) Also add a check at the end of bdrv_co_do_pwritev to clear the zero flag if padding is involved. Signed-off-by: Fam Zheng <famz@redhat.com> Message-id: 1427160230-4489-2-git-send-email-famz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-03-27 10:01:12 +00:00
Fam Zheng	d51a2427f6	block: Drop bdrv_find All callers are converted, so drop it. Signed-off-by: Fam Zheng <famz@redhat.com> Message-id: 1425296209-1476-5-git-send-email-famz@redhat.com Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2015-03-16 12:10:30 -04:00
Markus Armbruster	a1f688f415	block: Deprecate QCOW/QCOW2 encryption We've steered users away from QCOW/QCOW2 encryption for a while, because it's a flawed design (commit `136cd19` Describe flaws in qcow/qcow2 encryption in the docs). In addition to flawed crypto, we have comically bad usability, and plain old bugs. Let me show you. = Example images = I'm going to use a raw image as backing file, and two QCOW2 images, one encrypted, and one not: $ qemu-img create -f raw backing.img 4m Formatting 'backing.img', fmt=raw size=4194304 $ qemu-img create -f qcow2 -o encryption,backing_file=backing.img,backing_fmt=raw geheim.qcow2 4m Formatting 'geheim.qcow2', fmt=qcow2 size=4194304 backing_file='backing.img' backing_fmt='raw' encryption=on cluster_size=65536 lazy_refcounts=off $ qemu-img create -f qcow2 -o backing_file=backing.img,backing_fmt=raw normal.qcow2 4m Formatting 'normal.qcow2', fmt=qcow2 size=4194304 backing_file='backing.img' backing_fmt='raw' encryption=off cluster_size=65536 lazy_refcounts=off = Usability issues = == Confusing startup == When no image is encrypted, and you don't give -S, QEMU starts the guest immediately: $ qemu-system-x86_64 -nodefaults -display none -monitor stdio normal.qcow2 QEMU 2.2.50 monitor - type 'help' for more information (qemu) info status VM status: running But as soon as there's an encrypted image in play, the guest is not started, with no notification whatsoever: $ qemu-system-x86_64 -nodefaults -display none -monitor stdio geheim.qcow2 QEMU 2.2.50 monitor - type 'help' for more information (qemu) info status VM status: paused (prelaunch) If the user figured out that he needs to type "cont" to enter his keys, the confusion enters the next level: "cont" asks for at most one key. If more are needed, it then silently does nothing. The user has to type "cont" once per encrypted image: $ qemu-system-x86_64 -nodefaults -display none -monitor stdio -drive if=none,file=geheim.qcow2 -drive if=none,file=geheim.qcow2 QEMU 2.2.50 monitor - type 'help' for more information (qemu) info status VM status: paused (prelaunch) (qemu) c none0 (geheim.qcow2) is encrypted. Password: **** (qemu) info status VM status: paused (prelaunch) (qemu) c none1 (geheim.qcow2) is encrypted. Password: **** (qemu) info status VM status: running == Incorrect passwords not caught == All existing encryption schemes give you the GIGO treatment: garbage password in, garbage data out. Guests usually refuse to mount garbage, but other usage is prone to data loss. == Need to stop the guest to add an encrypted image == $ qemu-system-x86_64 -nodefaults -display none -monitor stdio QEMU 2.2.50 monitor - type 'help' for more information (qemu) info status VM status: running (qemu) drive_add "" if=none,file=geheim.qcow2 Guest must be stopped for opening of encrypted image (qemu) stop (qemu) drive_add "" if=none,file=geheim.qcow2 OK Commit `c3adb58` added this restriction. Before, we could expose images lacking an encryption key to guests, with potentially catastrophic results. See also "Use without key is not always caught". = Bugs = == Use without key is not always caught == Encrypted images can be in an intermediate state "opened, but no key". The weird startup behavior and the need to stop the guest are there to ensure the guest isn't exposed to that state. But other things still are! * drive_backup $ qemu-system-x86_64 -nodefaults -display none -monitor stdio geheim.qcow2 QEMU 2.2.50 monitor - type 'help' for more information (qemu) drive_backup -f ide0-hd0 out.img raw Formatting 'out.img', fmt=raw size=4194304 I guess this writes encrypted data to raw image out.img. Good luck with figuring out how to decrypt that again. * commit $ qemu-system-x86_64 -nodefaults -display none -monitor stdio geheim.qcow2 QEMU 2.2.50 monitor - type 'help' for more information (qemu) commit ide0-hd0 I guess this writes encrypted data into the unencrypted raw backing image, effectively destroying it. == QMP device_add of usb-storage fails when it shouldn't == When the image is encrypted, device_add creates the device, defers actually attaching it to when the key becomes available, then fails. This is wrong. device_add must either create the device and succeed, or do nothing and fail. $ qemu-system-x86_64 -nodefaults -display none -usb -qmp stdio -drive if=none,id=foo,file=geheim.qcow2 {"QMP": {"version": {"qemu": {"micro": 50, "minor": 2, "major": 2}, "package": ""}, "capabilities": []}} { "execute": "qmp_capabilities" } {"return": {}} { "execute": "device_add", "arguments": { "driver": "usb-storage", "id": "bar", "drive": "foo" } } {"error": {"class": "DeviceEncrypted", "desc": "'foo' (geheim.qcow2) is encrypted"}} {"execute":"device_del","arguments": { "id": "bar" } } {"timestamp": {"seconds": 1426003440, "microseconds": 237181}, "event": "DEVICE_DELETED", "data": {"path": "/machine/peripheral/bar/bar.0/legacy[0]"}} {"timestamp": {"seconds": 1426003440, "microseconds": 238231}, "event": "DEVICE_DELETED", "data": {"device": "bar", "path": "/machine/peripheral/bar"}} {"return": {}} This stuff is worse than useless, it's a trap for users. If people become sufficiently interested in encrypted images to contribute a cryptographically sane implementation for QCOW2 (or whatever other format), then rewriting the necessary support around it from scratch will likely be easier and yield better results than fixing up the existing mess. Let's deprecate the mess now, drop it after a grace period, and move on. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-03-16 17:07:25 +01:00
Ekaterina Tumanova	892b7de832	block: add bdrv functions for geometry and blocksize Add driver functions for geometry and blocksize detection Signed-off-by: Ekaterina Tumanova <tumanova@linux.vnet.ibm.com> Reviewed-by: Thomas Huth <thuth@linux.vnet.ibm.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 1424087278-49393-2-git-send-email-tumanova@linux.vnet.ibm.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-03-10 14:02:21 +01:00
Markus Armbruster	dc523cd348	qemu-img: Suppress unhelpful extra errors in convert, amend img_convert() and img_amend() use qemu_opts_do_parse(), which reports errors with qerror_report_err(). Its error messages aren't helpful here, the caller reports one that actually makes sense. Reproducer: $ qemu-img convert -o backing_format=raw in.img out.img qemu-img: Invalid parameter 'backing_format' qemu-img: Invalid options for file format 'raw' To fix, propagate errors through qemu_opts_do_parse(). This lifts the error reporting into callers. Drop it from img_convert() and img_amend(), keep it in qemu_chr_parse_compat(), bdrv_img_create(). Since I'm touching qemu_opts_do_parse() anyway, write a function comment for it. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2015-02-26 14:51:21 +01:00
Markus Armbruster	f43e47dbf6	QemuOpts: Drop qemu_opt_set(), rename qemu_opt_set_err(), fix use qemu_opt_set() is a wrapper around qemu_opt_set() that reports the error with qerror_report_err(). Most of its users assume the function can't fail. Make them use qemu_opt_set_err() with &error_abort, so that should the assumption ever break, it'll break noisily. Just two users remain, in util/qemu-config.c. Switch them to qemu_opt_set_err() as well, then rename qemu_opt_set_err() to qemu_opt_set(). Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2015-02-26 14:49:31 +01:00
Markus Armbruster	6be4194b92	block: Suppress unhelpful extra errors in bdrv_img_create() bdrv_img_create() uses qemu_opt_set(), which reports errors with qerror_report_err(). Its error messages aren't helpful here, the caller reports one that actually makes sense. I don't know how to trigger the error conditions, though. Switch to qemu_opt_set_err() to get rid of the unwanted messages. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2015-02-26 14:48:31 +01:00
Markus Armbruster	39101f2511	QemuOpts: Convert qemu_opt_set_number() to Error, fix its use Return the Error object instead of reporting it with qerror_report_err(). Change callers that assume the function can't fail to pass &error_abort, so that should the assumption ever break, it'll break noisily. Turns out all callers outside its unit test assume that. We could drop the Error ** argument, but that would make the interface less regular, so don't. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2015-02-26 14:47:32 +01:00
Max Reitz	b9c649470b	block: Keep bdrv_check*_request()'s return value Do not throw away the value returned by bdrv_check_request() and bdrv_check_byte_request(). Fix up some coding style issues in the proximity of the affected hunks. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-id: 1423162705-32065-17-git-send-email-mreitz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-02-16 15:07:19 +00:00
Max Reitz	c0191e763b	block: Remove "growable" from BDS Now that request clamping is done in the BlockBackend, the "growable" field can be removed from the BlockDriverState. All BDSs are now treated as being "growable" (that is, they are allowed to grow; they are not necessarily actually able to). Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 1423162705-32065-16-git-send-email-mreitz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-02-16 15:07:19 +00:00
Max Reitz	b65a5e12a4	block: Add Error parameter to bdrv_find_protocol() The argument given to bdrv_find_protocol() is just a file name, which makes it difficult for the caller to reconstruct what protocol bdrv_find_protocol() was hoping to find. This patch adds an Error parameter to that function to solve this issue. Suggested-by: Eric Blake <eblake@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 1423162705-32065-4-git-send-email-mreitz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-02-16 15:07:18 +00:00
Markus Armbruster	b1ca639184	block: Eliminate silly QERR_ macros used for encryption keys The QERR_ macros are leftovers from the days of "rich" error objects. They're used with error_set() and qerror_report(), and expand into the first two arguments. This trickiness has become pointless. Clean up QERR_DEVICE_ENCRYPTED and QERR_DEVICE_NOT_ENCRYPTED. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 1422524221-8566-5-git-send-email-armbru@redhat.com Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2015-02-06 11:46:32 -05:00
Markus Armbruster	4d2855a348	block: New bdrv_add_key(), convert monitor to use it Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 1422524221-8566-4-git-send-email-armbru@redhat.com Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2015-02-06 11:46:32 -05:00
Peter Lieven	75af1f34cd	block: introduce BDRV_REQUEST_MAX_SECTORS we check and adjust request sizes at several places with sometimes inconsistent checks or default values: INT_MAX INT_MAX >> BDRV_SECTOR_BITS UINT_MAX >> BDRV_SECTOR_BITS SIZE_MAX >> BDRV_SECTOR_BITS This patches introdocues a macro for the maximal allowed sectors per request and uses it at several places. Signed-off-by: Peter Lieven <pl@kamp.de> Reviewed-by: Denis V. Lunev <den@openvz.org> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-02-06 17:24:22 +01:00
Peter Lieven	f4564d53c6	block: add accounting for merged requests Signed-off-by: Peter Lieven <pl@kamp.de> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-02-06 17:24:21 +01:00
Peter Lieven	98764152ad	block: change default for discard and write zeroes to INT_MAX do not trim requests if the driver does not supply a limit through BlockLimits. For write zeroes we still keep a limit for the unsupported path to avoid allocating a big bounce buffer. Suggested-by: Kevin Wolf <kwolf@redhat.com> Suggested-by: Denis V. Lunev <den@openvz.org> Signed-off-by: Peter Lieven <pl@kamp.de> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-02-06 17:24:21 +01:00
Jeff Cody	a1a11d10ab	block: remove unused variable in bdrv_commit As Stefan pointed out, the variable 'filename' in bdrv_commit is unused, despite being maintained in previous patches. With this patch, get rid of the variable for good. Signed-off-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-01-23 18:17:06 +01:00
Fam Zheng	bb00021de0	block: Split BLOCK_OP_TYPE_COMMIT to BLOCK_OP_TYPE_COMMIT_{SOURCE, TARGET} Like BLOCK_OP_TYPE_BACKUP_SOURCE and BLOCK_OP_TYPE_BACKUP_TARGET, block-commit involves two asymmetric devices. This change is not user-visible (yet), because commit only works with device names. But once we enable backing reference in blockdev-add, or specifying node-name in block-commit command, we don't want the user to start two commit jobs on the same backing chain, which will corrupt things because of the final bdrv_swap. Before we have per category blockers, splitting this type is still better. [Resolved virtio-blk dataplane conflict by replacing BLOCK_OP_TYPE_COMMIT with both BLOCK_OP_TYPE_COMMIT_{SOURCE, TARGET}. They are safe since the block job runs in the same AioContext as the dataplane IOThread. --Stefan] Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-01-13 13:43:29 +00:00
Peter Lieven	095e4fa4b5	block: limited request size in write zeroes unsupported path If bs->bl.max_write_zeroes is large and we end up in the unsupported path we might allocate a lot of memory for the iovector and/or even generate an oversized requests. Fix this by limiting the request by the minimum of the reported maximum transfer size or 16MB (32768 sectors). Reported-by: Denis V. Lunev <den@openvz.org> Signed-off-by: Peter Lieven <pl@kamp.de> Reviewed-by: Denis V. Lunev <den@openvz.org> Message-id: 1420457389-16332-1-git-send-email-pl@kamp.de Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-01-13 13:43:29 +00:00
Vladimir Sementsov-Ogievskiy	c4237dfa63	block: fix spoiling all dirty bitmaps by mirror and migration Mirror and migration use dirty bitmaps for their purposes, and since commit [block: per caller dirty bitmap] they use their own bitmaps, not the global one. But they use old functions bdrv_set_dirty and bdrv_reset_dirty, which change all dirty bitmaps. Named dirty bitmaps series by Fam and Snow are affected: mirroring and migration will spoil all (not related to this mirroring or migration) named dirty bitmaps. This patch fixes this by adding bdrv_set_dirty_bitmap and bdrv_reset_dirty_bitmap, which change concrete bitmap. Also, to prevent such mistakes in future, old functions bdrv_(set,reset)_dirty are made static, for internal block usage. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com> CC: John Snow <jsnow@redhat.com> CC: Fam Zheng <famz@redhat.com> CC: Denis V. Lunev <den@openvz.org> CC: Stefan Hajnoczi <stefanha@redhat.com> CC: Kevin Wolf <kwolf@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 1417081246-3593-1-git-send-email-vsementsov@parallels.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2015-01-13 11:47:56 +00:00
Max Reitz	291680186f	block: Relative backing file for image creation Relative backing filenames are always relative to the backed image's directory; the same applies to image creation. Therefore, if the backing file has to be opened for determining its size (in case the size has not been explicitly specified) its filename should be interpreted relative to the new image's base directory and not relative to qemu's working directory. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-01-13 11:47:56 +00:00
Max Reitz	9f07429e88	block: JSON filenames and relative backing files When using a relative backing file name, qemu needs to know the directory of the top image file. For JSON filenames, such a directory cannot be easily determined (e.g. how do you determine the directory of a qcow2 BDS directly on top of a quorum BDS?). Therefore, do not allow relative filenames for the backing file of BDSs only having a JSON filename. Furthermore, BDS::exact_filename should be used whenever possible. If BDS::filename is not equal to BDS::exact_filename, the former will always be a JSON object. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-01-13 11:47:56 +00:00
Max Reitz	0a82855a1a	block: Get full backing filename from string Introduce bdrv_get_full_backing_filename_from_filename(), a function which takes the name of the backed file and a potentially relative backing filename to produce the full (absolute) backing filename. Use this function from bdrv_get_full_backing_filename(). Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-01-13 11:47:56 +00:00
Paolo Bonzini	e012b78cf5	block: do not allocate an iovec per read of a growable/zero_after_eof BDS Most reads do not go past the end of the file, and they can use the input QEMUIOVector instead of creating one. This removes the qemu_iovec_* functions from the profile. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-01-13 11:47:56 +00:00

1 2 3 4 5 ...

825 Commits