qemu-e2k

Author	SHA1	Message	Date
Alexey Kardashevskiy	7ea2d269cb	block/migration: Disable cache invalidate for incoming migration When migrated using libvirt with "--copy-storage-all", at the end of migration there is race between NBD mirroring task trying to do flush and migration completion, both end up invalidating cache. Since qcow2 driver does not handle this situation very well, random crashes happen. This disables the BDRV_O_INCOMING flag for the block device being migrated once the cache has been invalidated. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> -- fixed parens by hand Signed-off-by: Juan Quintela <quintela@redhat.com>	2014-10-14 09:35:21 +02:00
Markus Armbruster	f5bebbbb28	util: Emancipate id_wellformed() from QemuOpts IDs have long spread beyond QemuOpts: not everything with an ID necessarily goes through QemuOpts. Commit `9aebf3b` is about such a case: block layer names are meant to be well-formed IDs, but some of them don't go through QemuOpts, and thus weren't checked. The commit fixed that the straightforward way: rename the internal QemuOpts helper id_wellformed() to qemu_opts_id_wellformed() and give it external linkage. Instead of using it directly in block.c, the commit adds wrapper bdrv_is_valid_name(), probably to hide the connection to QemuOpts. Go one logical step further: emancipate IDs from QemuOpts. Rename the function back to id_wellformed(), and put it in another file. While there, clean up its value to bool. Peel off the bdrv_is_valid_name() wrapper. [Replaced stray return 0 with return false to match bool returns used elsewhere in id_wellformed(). --Stefan] Signed-off-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-10-03 10:30:33 +01:00
Kevin Wolf	9aebf3b892	block: Validate node-name The device_name of a BlockDriverState is currently checked because it is always used as a QemuOpts ID and qemu_opts_create() checks whether such IDs are wellformed. node-name is supposed to share the same namespace, but it isn't checked currently. This patch adds explicit checks both for device_name and node-name so that the same rules will still apply even if QemuOpts won't be used any more at some point. qemu-img used to use names with spaces in them, which isn't allowed any more. Replace them with underscores. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-09-25 15:24:32 +02:00
Markus Armbruster	d224469d87	block: Improve message for device name clashing with node name Suggested-by: Benoit Canet <benoit.canet@nodalink.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Benoît Canet <benoit.canet@nodalink.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-09-25 15:24:14 +02:00
Markus Armbruster	3ae59580a0	block: Keep DriveInfo alive until BlockDriverState dies If the BDS's refcnt > 0, drive_del() destroys the DriveInfo, but not the BDS. This can happen in three places: * Device model destruction during unplug: blockdev_auto_del() * Xen IDE unplug: pci_piix3_xen_ide_unplug() * drive_del command when no device model is attached: do_drive_del() The other callers of drive_del are on error paths where refcnt == 1. If the user somehow manages to plug in a device model using a BDS that has gone through drive_del(), the legacy configuration passed in DriveInfo doesn't reach the device model, and automatic deletion on unplug doesn't work. Worse, some device models such as scsi-disk crash when DriveInfo doesn't exist. This is theoretical; I didn't research an actual reproducer. The problem was introduced when we replaced DriveInfo reference counting by BDS reference counting in commit a94a3fa..fa510eb. Fix by keeping DriveInfo alive until its BDS dies. This affects qemu_drive_opts: now you can't reuse the same ID for new drive options until the BDS dies. Before, you could, but since the code always attempts to create a BDS with the same ID next, the enclosing operation "create a new drive" failed anyway. Different error path, same result. Unfortunately, the fix involves use of blockdev.c stuff from block.c, which is a layering violation. Fortunately, my forthcoming BlockBackend work will get rid of it again. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Benoît Canet <benoit.canet@nodalink.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-09-25 15:24:14 +02:00
Fam Zheng	8007429a99	block: Rename qemu_aio_release -> qemu_aio_unref Suggested-by: Benoît Canet <benoit.canet@irqsave.net> Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-09-22 11:39:17 +01:00
Fam Zheng	ca5fd113b8	block: Drop AIOCBInfo.cancel Now that all the implementations are converted to asynchronous version and we can emulate synchronous cancellation with it. Let's drop the unused member. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-09-22 11:39:16 +01:00
Fam Zheng	f600ac1902	block: Drop bdrv_em_aiocb_info.cancel Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-09-22 11:39:00 +01:00
Fam Zheng	3acabd685e	block: Drop bdrv_em_co_aiocb_info.cancel Also drop the now unused ->done pointer. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-09-22 11:38:59 +01:00
Fam Zheng	02c50efe08	block: Add bdrv_aio_cancel_async This is the async version of bdrv_aio_cancel, which doesn't block the caller. It guarantees that the cb is called either before returning or some time later. bdrv_aio_cancel can base on bdrv_aio_cancel_async, later we can convert all .io_cancel implementations to .io_cancel_async, and the aio_poll is the common logic. In the end, .io_cancel can be dropped. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-09-22 11:38:58 +01:00
Fam Zheng	f197fe2b2c	block: Add refcnt in BlockDriverAIOCB This will be useful in synchronous cancel emulation with bdrv_aio_cancel_async. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-09-22 11:38:57 +01:00
Luiz Capitulino	624ff5736e	block: extend BLOCK_IO_ERROR with reason string BLOCK_IO_ERROR events are logged by libvirt, which helps with post mortem analysis of guests. However, one information that we miss today is a human readable string describing the cause of the I/O error. This commit adds that string it to BLOCK_IO_ERROR. Note that this string is a debugging aid for humans, meaning that it should not parsed by applications. Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-09-11 17:14:13 +02:00
Benoît Canet	5366d0c8bc	block: Make the block accounting functions operate on BlockAcctStats This is the next step for decoupling block accounting functions from BlockDriverState. In a future commit the BlockAcctStats structure will be moved from BlockDriverState to the device models structures. Note that bdrv_get_stats was introduced so device models can retrieve the BlockAcctStats structure of a BlockDriverState without being aware of it's layout. This function should go away when BlockAcctStats will be embedded in the device models structures. CC: Kevin Wolf <kwolf@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> CC: Keith Busch <keith.busch@intel.com> CC: Anthony Liguori <aliguori@amazon.com> CC: "Michael S. Tsirkin" <mst@redhat.com> CC: Paolo Bonzini <pbonzini@redhat.com> CC: Eric Blake <eblake@redhat.com> CC: Peter Maydell <peter.maydell@linaro.org> CC: Michael Tokarev <mjt@tls.msk.ru> CC: John Snow <jsnow@redhat.com> CC: Markus Armbruster <armbru@redhat.com> CC: Alexander Graf <agraf@suse.de> CC: Max Reitz <mreitz@redhat.com> Signed-off-by: Benoît Canet <benoit.canet@nodalink.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-09-10 10:41:29 +02:00
Benoît Canet	5e5a94b605	block: Extract the block accounting code The plan is to add new accounting metrics (latency, invalid requests, failed requests, queue depth) and block.c is overpopulated so it will be better to work in a separate module. Moreover the long term plan is to have statistics in each of the BDS of the graph for metrology purpose; this means that the device model statistics must move from the topmost BDS to the device model. So we need to decouple the statistic code from BlockDriverState. This is another argument for the extraction of the code in a separate module. CC: Kevin Wolf <kwolf@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> CC: Max Reitz <mreitz@redhat.com> CC: Eric Blake <eblake@redhat.com> CC: Benoit Canet <benoit@irqsave.net> CC: Fam Zheng <famz@redhat.com> CC: Peter Crosthwaite <peter.crosthwaite@xilinx.com> CC: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Benoît Canet <benoit.canet@nodalink.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-09-10 10:41:29 +02:00
Benoît Canet	0ddd0ad96a	block: Extract the BlockAcctStats structure Extract the block accounting statistics into a structure so the block device models can hold them in the future. CC: Kevin Wolf <kwolf@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> CC: Max Reitz <mreitz@redhat.com> CC: Eric Blake <eblake@redhat.com> Signed-off-by: Benoît Canet <benoit.canet@nodalink.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-09-10 10:41:29 +02:00
Luiz Capitulino	c7c2ff0c7e	block: extend BLOCK_IO_ERROR event with nospace indicator Management software, such as RHEV's vdsm, want to be able to allocate disk space on demand. The basic use case is to start a VM with a small disk and then the disk is enlarged when QEMU hits a ENOSPC condition. To this end, the management software has to be notified when QEMU encounters ENOSPC. The solution implemented by this commit is simple: it extends the BLOCK_IO_ERROR with a 'nospace' key, which is true when QEMU is stopped due to ENOSPC. Note that support for querying this event is already present in query-block by means of the 'io-status' key. Also, the new 'nospace' BLOCK_IO_ERROR field shares the same semantics with 'io-status', which basically means that werror= has to be set to either 'stop' or 'enospc' to enable 'nospace'. Finally, this commit also updates the 'io-status' key doc in the schema with a list of supported device models. Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-09-10 10:41:29 +02:00
Liu Yuan	6bb4515849	block: kill tail whitespace in block.c Cc: Kevin Wolf <kwolf@redhat.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Liu Yuan <namei.unix@gmail.com> Reviewed-by: Benoît Canet <benoit.canet@nodalink.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-09-08 11:12:42 +01:00
Stefan Hajnoczi	391827eb10	block: fix overlapping multiwrite requests When request A is a strict superset of request B: AAAAAAAA BBBB multiwrite_merge() merges them as follows: AABBBB The tail of request A should have been included: AABBBBAA This patch fixes data loss but this code path is probably rare. Since guests cannot assume ordering between in-flight requests, few applications submit overlapping write requests. Reported-by: Slava Pestov <sviatoslav.pestov@gmail.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Benoit Canet <benoit@irqsave.net>	2014-08-29 14:09:43 +01:00
Max Reitz	33384421b3	block: Add AIO context notifiers If a long-running operation on a BDS wants to always remain in the same AIO context, it somehow needs to keep track of the BDS changing its context. This adds a function for registering callbacks on a BDS which are called whenever the BDS is attached or detached from an AIO context. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-08-29 10:48:45 +01:00
Stefan Hajnoczi	ada4240103	block: sort formats alphabetically in bdrv_iterate_format() Format names are best consumed in alphabetical order. This makes human-readable output easy to produce. bdrv_iterate_format() already has an array of format strings. Sort them before invoking the iteration callback. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Benoît Canet <benoit.canet@nodalink.com>	2014-08-28 13:42:25 +01:00
Max Reitz	91af701412	block: Add bdrv_refresh_filename() Some block devices may not have a filename in their BDS; and for some, there may not even be a normal filename at all. To work around this, add a function which tries to construct a valid filename for the BDS.filename field. If a filename exists or a block driver is able to reconstruct a valid filename (which is placed in BDS.exact_filename), this can directly be used. If no filename can be constructed, we can still construct an options QDict which is then converted to a JSON object and prefixed with the "json:" pseudo protocol prefix. The QDict is placed in BDS.full_open_options. For most block drivers, this process can be done automatically; those that need special handling may define a .bdrv_refresh_filename() method to fill BDS.exact_filename and BDS.full_open_options themselves. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-08-20 14:31:56 +02:00
Markus Armbruster	5839e53bbc	block: Use g_new() & friends where that makes obvious sense g_new(T, n) is neater than g_malloc(sizeof(T) * n). It's also safer, for two reasons. One, it catches multiplication overflowing size_t. Two, it returns T * rather than void , which lets the compiler catch more type errors. Patch created with Coccinelle, with two manual changes on top: Add const to bdrv_iterate_format() to keep the types straight * Convert the allocation in bdrv_drop_intermediate(), which Coccinelle inexplicably misses Coccinelle semantic patch: @@ type T; @@ -g_malloc(sizeof(T)) +g_new(T, 1) @@ type T; @@ -g_try_malloc(sizeof(T)) +g_try_new(T, 1) @@ type T; @@ -g_malloc0(sizeof(T)) +g_new0(T, 1) @@ type T; @@ -g_try_malloc0(sizeof(T)) +g_try_new0(T, 1) @@ type T; expression n; @@ -g_malloc(sizeof(T) * (n)) +g_new(T, n) @@ type T; expression n; @@ -g_try_malloc(sizeof(T) * (n)) +g_try_new(T, n) @@ type T; expression n; @@ -g_malloc0(sizeof(T) * (n)) +g_new0(T, n) @@ type T; expression n; @@ -g_try_malloc0(sizeof(T) * (n)) +g_try_new0(T, n) @@ type T; expression p, n; @@ -g_realloc(p, sizeof(T) * (n)) +g_renew(T, p, n) @@ type T; expression p, n; @@ -g_try_realloc(p, sizeof(T) * (n)) +g_try_renew(T, p, n) Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-08-20 11:51:28 +02:00
Max Reitz	908bcd540f	block: Catch !bs->drv in bdrv_check() qemu-img check calls bdrv_check() twice if the first run repaired some inconsistencies. If the first run however again triggered corruption prevention (on qcow2) due to very bad inconsistencies, bs->drv may be NULL afterwards. Thus, bdrv_check() should check whether bs->drv is set. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-08-15 15:07:16 +02:00
Kevin Wolf	857d4f46c3	block: Handle failure for potentially large allocations Some code in the block layer makes potentially huge allocations. Failure is not completely unexpected there, so avoid aborting qemu and handle out-of-memory situations gracefully. This patch addresses bounce buffer allocations in block.c. While at it, convert bdrv_commit() from plain g_malloc() to qemu_try_blockalign(). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-08-15 15:07:15 +02:00
Kevin Wolf	7d2a35cc92	block: Introduce qemu_try_blockalign() This function returns NULL instead of aborting when an allocation fails. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Benoit Canet <benoit@irqsave.net>	2014-08-15 15:07:15 +02:00
Jeff Cody	9a4d5ca607	block: allow bdrv_unref() to be passed NULL pointers If bdrv_unref() is passed a NULL BDS pointer, it is safe to exit with no operation. This will allow cleanup code to blindly call bdrv_unref() on a BDS that has been initialized to NULL. Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-08-15 15:07:14 +02:00
Stefan Hajnoczi	2a87151fb2	block: bump coroutine pool size for drives When a BlockDriverState is associated with a storage controller DeviceState we expect guest I/O. Use this opportunity to bump the coroutine pool size by 64. This patch ensures that the coroutine pool size scales with the number of drives attached to the guest. It should increase coroutine pool usage (which makes qemu_coroutine_create() fast) without hogging too much memory when fewer drives are attached. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2014-08-15 15:07:14 +02:00
Markus Armbruster	52bf1e722d	block: Avoid bdrv_get_geometry() where errors should be detected bdrv_get_geometry() hides errors. Use bdrv_nb_sectors() or bdrv_getlength() instead where that's obviously inappropriate. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Benoit Canet <benoit@irqsave.net> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-08-15 15:07:13 +02:00
Markus Armbruster	75d3d21f9e	block: Drop superfluous aligning of bdrv_getlength()'s value It returns a multiple of the sector size. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Benoit Canet <benoit@irqsave.net> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-08-15 15:07:13 +02:00
Markus Armbruster	57322b7811	block: Use bdrv_nb_sectors() where sectors, not bytes are wanted Instead of bdrv_getlength(). Aside: a few of these callers don't handle errors. I didn't investigate whether they should. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Benoit Canet <benoit@irqsave.net> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-08-15 15:07:13 +02:00
Markus Armbruster	30a7f2fc91	block: Use bdrv_nb_sectors() in bdrv_co_get_block_status() Instead of bdrv_getlength(). Replace variables length, length2 by total_sectors, nb_sectors2. Bonus: use total_sectors instead of the slightly unclean bs->total_sectors. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Benoit Canet <benoit@irqsave.net> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-08-15 15:07:13 +02:00
Markus Armbruster	4049082c4b	block: Use bdrv_nb_sectors() in bdrv_aligned_preadv() Instead of bdrv_getlength(). Eliminate variable len. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Benoit Canet <benoit@irqsave.net> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-08-15 15:07:13 +02:00
Markus Armbruster	d32f7c101b	block: Use bdrv_nb_sectors() in bdrv_make_zero() Instead of bdrv_getlength(). Variable target_size is initially in bytes, then changes meaning to sectors. Ugh. Replace by target_sectors. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Benoit Canet <benoit@irqsave.net> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-08-15 15:07:13 +02:00
Markus Armbruster	65a9bb25d6	block: New bdrv_nb_sectors() A call to retrieve the image size converts between bytes and sectors several times: * BlockDriver method bdrv_getlength() returns bytes. * refresh_total_sectors() converts to sectors, rounding up, and stores in total_sectors. * bdrv_getlength() converts total_sectors back to bytes (now rounded up to a multiple of the sector size). * Callers wanting sectors rather bytes convert it right back. Example: bdrv_get_geometry(). bdrv_nb_sectors() provides a way to omit the last two conversions. It's exactly bdrv_getlength() with the conversion to bytes omitted. It's functionally like bdrv_get_geometry() without its odd error handling. Reimplement bdrv_getlength() and bdrv_get_geometry() on top of bdrv_nb_sectors(). The next patches will convert some users of bdrv_getlength() to bdrv_nb_sectors(). Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Benoit Canet <benoit@irqsave.net> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-08-15 15:07:12 +02:00
Kevin Wolf	3baca89139	block: Add Error argument to bdrv_refresh_limits() Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-07-18 13:18:43 +01:00
Kevin Wolf	8eb029c26e	block: Assert qiov length matches request length At least raw-posix relies on this because it can allocate bounce buffers based on the request length, but access it using all of the qiov entries later. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2014-07-14 12:03:20 +02:00
Kevin Wolf	33f461e0c5	block: Make qiov match the request size until EOF If a read request goes across EOF, the block driver sees a shortened request that stops at EOF (the rest is memsetted in block.c), however the original qiov was used for this request. This patch makes the qiov size match the request size, avoiding a potential buffer overflow in raw-posix. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2014-07-14 12:03:20 +02:00
Paolo Bonzini	b47ec2c456	block: prefer aio_poll to qemu_aio_wait Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-07-09 15:50:11 +02:00
Kevin Wolf	01fb2705bd	block: Fix bdrv_is_allocated() return value bdrv_is_allocated() should return either 0 or 1 in successful cases. We're lucky that currently, the callers that rely on this (e.g. because they check for ret == 1) don't seem to break badly. They just might skip some optimisation or in the case of qemu-io 'map' print separate lines where a single line would suffice. In theory, a wrong allocation status could lead to image corruption with certain operations, so let's fix this quickly. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2014-07-09 15:50:11 +02:00
Ming Lei	448ad91db4	block: block: introduce APIs for submitting IO as a batch This patch introduces three APIs so that following patches can support queuing I/O requests and submitting them as a batch for improving I/O performance. Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Ming Lei <ming.lei@canonical.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-07-07 11:05:17 +02:00
Jeff Cody	54e2690090	block: extend block-commit to accept a string for the backing file On some image chains, QEMU may not always be able to resolve the filenames properly, when updating the backing file of an image after a block commit. For instance, certain relative pathnames may fail, or drives may have been specified originally by file descriptor (e.g. /dev/fd/???), or a relative protocol pathname may have been used. In these instances, QEMU may lack the information to be able to make the correct choice, but the user or management layer most likely does have that knowledge. With this extension to the block-commit api, the user is able to change the backing file of the overlay image as part of the block-commit operation. This allows the change to be 'safe', in the sense that if the attempt to write the overlay image metadata fails, then the block-commit operation returns failure, without disrupting the guest. If the commit top is the active layer, then specifying the backing file string will be treated as an error (there is no overlay image to modify in that case). If a backing file string is not specified in the command, the backing file string to use is determined in the same manner as it was previously. Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-07-01 10:47:01 +02:00
Jeff Cody	5a6684d2b9	block: add helper function to determine if a BDS is in a chain This is a small helper function, to determine if 'base' is in the chain of BlockDriverState 'top'. It returns true if it is in the chain, and false otherwise. If either argument is NULL, it will also return false. Reviewed-by: Benoit Canet <benoit@irqsave.net> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-07-01 10:47:01 +02:00
Jeff Cody	4caf0fcd45	block: simplify bdrv_find_base() and bdrv_find_overlay() This simplifies the function bdrv_find_overlay(). With this change, bdrv_find_base() is just a subset of usage of bdrv_find_overlay(), so this also takes advantage of that. Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Benoit Canet <benoit@irqsave.net> Signed-off-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-07-01 10:15:34 +02:00
Chen Gang	6b8aeca574	block.c: Don't return success for bdrv_append_temp_snapshot() failure When failure occurs, 'ret' need be set, or may return 0 to indicate success. Previously, an error was set in errp, but 0 was returned anyway. So let bdrv_append_temp_snapshot() return an error code and use that for the bdrv_open() return value. Also, error_propagate() need be called only one time within a function. Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-06-27 20:00:00 +02:00
Benoît Canet	09158f00e0	block: Add replaces argument to drive-mirror drive-mirror will bdrv_swap the new BDS named node-name with the one pointed by replaces when the mirroring is finished. Signed-off-by: Benoit Canet <benoit@irqsave.net> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-06-27 20:00:00 +02:00
Jeff Cody	9c75e168bc	block: check for RESIZE blocker in the QMP command, not bdrv_truncate() If we check for the RESIZE blocker in bdrv_truncate(), that means a commit will fail if the overlay layer is larger than the base, due to the backing blocker. This is a regression in behavior from 2.0; currently, commit will try to grow the size of the base image to match the overlay size, if the overlay size is larger. By moving this into the QMP command qmp_block_resize(), it allows usage of bdrv_truncate() within block jobs. Signed-off-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-06-27 11:37:35 +02:00
Kevin Wolf	20cca275c6	block: Remove a special case for protocols The only semantic change is that bs->open_flags gets BDRV_O_PROTOCOL set now. This isn't useful, but it doesn't hurt either. The code that was previously skipped by 'goto done' is automatically disabled because protocol drivers don't support backing files (and if they did, this would probably be a fix) and can't have snapshot_flags set. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2014-06-26 13:51:01 +02:00
Kevin Wolf	8ee79e707a	block: Catch backing files assigned to non-COW drivers Since we parse backing.* options to add a backing file from the command line when the driver didn't assign one, it has been possible to have a backing file for e.g. raw images (it just was never accessed). This is obvious nonsense and should be rejected. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2014-06-26 13:51:01 +02:00
Kevin Wolf	76c591b013	block: Remove second bdrv_open() recursion This recursion was introduced in commit `505d7583` in order to allow nesting image formats. It only ever takes effect when the user explicitly specifies a driver name and that driver isn't suitable for the protocol level. We can check this earlier in bdrv_open() and if the explicitly requested driver is a format driver, clear BDRV_O_PROTOCOL so that another bs->file layer is opened. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2014-06-26 13:51:01 +02:00
Kevin Wolf	b348f3311c	block: Inline bdrv_file_open() It doesn't do much any more, we can move the code to bdrv_open() now. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Benoit Canet <benoit@irqsave.net> Reviewed-by: Eric Blake <eblake@redhat.com>	2014-06-26 13:51:01 +02:00
Kevin Wolf	f4788adcb4	block: Use common driver selection code for bdrv_open_file() This moves the bdrv_open_file() call a bit down so that it can use the bdrv_open() code that selects the right block driver. The code between the old and the new call site is either common code (the error message for an unknown driver has been unified now) or doesn't run with cleared BDRV_O_PROTOCOL (added an if block in one place, whereas the right path was already asserted in another place) Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Benoit Canet <benoit@irqsave.net>	2014-06-26 13:51:01 +02:00
Kevin Wolf	17b005f1d4	block: Always pass driver name through options QDict The "driver" entry in the options QDict is now only missing if we're opening an image with format probing. We also catch cases now where both the drv argument and a "driver" option is specified, e.g. by specifying -drive format=qcow2,driver=raw Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2014-06-26 13:51:01 +02:00
Kevin Wolf	5e5c4f63f4	block: Move json: parsing to bdrv_fill_options() Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2014-06-26 13:51:01 +02:00
Kevin Wolf	462f5bcf69	block: Move bdrv_fill_options() call to bdrv_open() bs->options now contains the modified version of the options. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2014-06-26 13:51:01 +02:00
Kevin Wolf	f54120ff1a	block: Create bdrv_fill_options() The idea of bdrv_fill_options() is to convert every parameter for opening images, in particular the filename and flags, to entries in the options QDict. This patch starts with moving the filename parsing and driver probing part from bdrv_file_open() to the new function. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2014-06-26 13:51:01 +02:00
Chen Gang	5db97df274	block.c: Remove useless 'buf' variable 'buf' is not used actually, so remove it and related snprintf() statement. Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2014-06-24 20:01:24 +04:00
Wenchao Xia	5a2d2cbd88	qapi event: convert BLOCK_IO_ERROR and BLOCK_JOB_ERROR Signed-off-by: Wenchao Xia <wenchaoqemu@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>	2014-06-23 11:12:27 -04:00
Wenchao Xia	a5ee7bd454	qapi event: convert DEVICE_TRAY_MOVED Signed-off-by: Wenchao Xia <wenchaoqemu@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>	2014-06-23 11:12:27 -04:00
Wenchao Xia	a589569f2f	qapi: adjust existing defines In order to let event defines use existing types later, instead of redefine new ones, some old type defines for spice and vnc are changed, and BlockErrorAction is moved from block.h to qapi schema. Note that BlockErrorAction is not merged with BlockdevOnError. At this point, VncInfo is not made a child of VncBasicInfo, because VncBasicInfo has mandatory fields where VncInfo makes them optional. Signed-off-by: Wenchao Xia <wenchaoqemu@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>	2014-06-23 11:01:25 -04:00
Paolo Bonzini	2bd3bce8ef	block: asynchronously stop the VM on I/O errors With virtio-blk dataplane, I/O errors might occur while QEMU is not in the main I/O thread. However, it's invalid to call vm_stop when we're neither in a VCPU thread nor in the main I/O thread, even if we were to take the iothread mutex around it. To avoid this problem, we can raise a request to the main I/O thread, similar to what QEMU does when vm_stop is called from a CPU thread. We know that bdrv_error_action is called from an AIO callback, and the moment at which the callback will fire is not well-defined; it depends on the moment at which the disk or OS finishes the operation, which can happen at any time. Note that QEMU is certainly not in a CPU thread and we do not need to call cpu_stop_current() like vm_stop() does. However, we need to ensure that any action taken by management will result in correct detection of the error _and_ a running VM. In particular: - the event must be raised after the iostatus has been set, so that "info block" will return an iostatus that matches the event. - the VM must be stopped after the iostatus has been set, so that "info block" will return an iostatus that matches the runstate. The ordering between the STOP and BLOCK_IO_ERROR events is preserved; BLOCK_IO_ERROR is documented to come first. This makes bdrv_error_action() thread safe (assuming QMP events are, which is attacked by a separate series). Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-06-23 16:36:13 +08:00
Chunyan Liu	c282e1fdf7	cleanup QEMUOptionParameter Now that all backend drivers are using QemuOpts, remove all QEMUOptionParameter related codes. Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com> Signed-off-by: Chunyan Liu <cyliu@suse.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-06-16 17:23:21 +08:00
Chunyan Liu	83d0521a1e	change block layer to support both QemuOpts and QEMUOptionParamter Change block layer to support both QemuOpts and QEMUOptionParameter. After this patch, it will change backend drivers one by one. At the end, QEMUOptionParameter will be removed and only QemuOpts is kept. Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com> Signed-off-by: Chunyan Liu <cyliu@suse.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-06-16 17:23:20 +08:00
Stefan Hajnoczi	13af91ebf0	throttle: add throttle_detach/attach_aio_context() Block I/O throttling uses timers and currently always adds them to the main loop. Throttling will break if bdrv_set_aio_context() is used to move a BlockDriverState to a different AioContext. This patch adds throttle_detach/attach_aio_context() interfaces so the throttling timers and uses them to move timers to the new AioContext. Note that bdrv_set_aio_context() already drains all requests so we're sure no throttled requests are pending. The test cases need to be updated since the throttle_init() interface has changed. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Benoit Canet <benoit@irqsave.net>	2014-06-04 09:56:12 +02:00
Stefan Hajnoczi	dcd042282d	block: add bdrv_set_aio_context() Up until now all BlockDriverState instances have used the QEMU main loop for fd handlers, timers, and BHs. This is not scalable on SMP guests and hosts so we need to move to a model with multiple event loops on different host CPUs. bdrv_set_aio_context() assigns the AioContext event loop to use for a particular BlockDriverState. It first detaches the entire BlockDriverState graph from the current AioContext and then attaches to the new AioContext. This function will be used by virtio-blk data-plane to assign a BlockDriverState to its IOThread AioContext. Make bdrv_aio_set_context() public since data-plane should not include block_int.h. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-06-04 09:56:11 +02:00
Stefan Hajnoczi	9b536adcbe	block: acquire AioContext in bdrv_drain_all() Modify bdrv_drain_all() to take into account that BlockDriverState instances may be running in different AioContexts. This patch changes the implementation of bdrv_drain_all() while preserving the semantics. Previously kicking throttled requests and checking for pending requests were done across all BlockDriverState instances in sequence. Now we process each BlockDriverState in turn, making sure to acquire and release its AioContext. This prevents race conditions between the thread executing bdrv_drain_all() and the thread running the AioContext. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-06-04 09:56:11 +02:00
Stefan Hajnoczi	ed78cda3de	block: acquire AioContext in bdrv_*_all() bdrv_close_all(), bdrv_commit_all(), bdrv_flush_all(), bdrv_invalidate_cache_all(), and bdrv_clear_incoming_migration_all() are called by main loop code and touch all BlockDriverState instances. Some BlockDriverState instances may be running in another AioContext. Make sure to acquire the AioContext before closing the BlockDriverState. This will protect against race conditions once virtio-blk data-plane is using the BlockDriverState from another AioContext event loop. Note that this patch does not convert bdrv_drain_all() yet since that conversion is non-trivial. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-06-04 09:56:11 +02:00
Stefan Hajnoczi	2572b37a47	block: use BlockDriverState AioContext Drop the assumption that we're using the main AioContext. Convert qemu_aio_wait() to aio_poll() and qemu_bh_new() to aio_bh_new() so the BlockDriverState AioContext is used. Note there is still one qemu_aio_wait() left in bdrv_create() but we do not have a BlockDriverState there and only main loop code invokes this function. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-06-04 09:56:11 +02:00
Markus Armbruster	b20e61e0d5	block: Plug memory leak on brv_open_image() error path Introduced in commit da557a. Spotted by Coverity. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Benoit Canet <benoit@irqsave.net> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-05-30 14:26:54 +02:00
Fam Zheng	ce782938b8	block: Drop redundant bdrv_refresh_limits The above bdrv_set_backing_hd already does this. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-05-28 14:28:46 +02:00
Fam Zheng	826b6ca0b0	block: Add backing_blocker in BlockDriverState This makes use of op_blocker and blocks all the operations except for commit target, on each BlockDriverState->backing_hd. The asserts for op_blocker in bdrv_swap are removed because with this change, the target of block commit has at least the backing blocker of its child, so the assertion is not true. Callers should do their check. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-05-28 14:28:46 +02:00
Fam Zheng	920beae103	block: Use bdrv_set_backing_hd everywhere We need to handle the coming backing_blocker properly, so don't open code the assignment, instead, call bdrv_set_backing_hd to change backing_hd. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-05-28 14:28:46 +02:00
Fam Zheng	8d24cce1e3	block: Add bdrv_set_backing_hd() This is the common but non-trivial steps to assign or change the backing_hd of BDS. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-05-28 14:28:46 +02:00
Fam Zheng	3718d8ab65	block: Replace in_use with operation blocker This drops BlockDriverState.in_use with op_blockers: - Call bdrv_op_block_all in place of bdrv_set_in_use(bs, 1). - Call bdrv_op_unblock_all in place of bdrv_set_in_use(bs, 0). - Check bdrv_op_is_blocked() in place of bdrv_in_use(bs). The specific types are used, e.g. in place of starting block backup, bdrv_op_is_blocked(bs, BLOCK_OP_TYPE_BACKUP, ...). There is one exception in block_job_create, where bdrv_op_blocker_is_empty() is used, because we don't know the operation type here. This doesn't matter because in a few commits away we will drop the check and move it to callers that _do_ know the type. - Check bdrv_op_blocker_is_empty() in place of assert(!bs->in_use). Note: there is only bdrv_op_block_all and bdrv_op_unblock_all callers at this moment. So although the checks are specific to op types, this changes can still be seen as identical logic with previously with in_use. The difference is error message are improved because of blocker error info. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-05-28 14:28:46 +02:00
Fam Zheng	fbe40ff780	block: Introduce op_blockers to BlockDriverState BlockDriverState.op_blockers is an array of lists with BLOCK_OP_TYPE_MAX elements. Each list is a list of blockers of an operation type (BlockOpType), that marks this BDS as currently blocked for a certain type of operation with reason errors stored in the list. The rule of usage is: * BDS user who wants to take an operation should check if there's any blocker of the type with bdrv_op_is_blocked(). * BDS user who wants to block certain types of operation, should call bdrv_op_block (or bdrv_op_block_all to block all types of operations, which is similar to the existing bdrv_set_in_use()). * A blocker is only referenced by op_blockers, so the lifecycle is managed by caller, and shouldn't be lost until unblock, so typically a caller does these: - Allocate a blocker with error_setg or similar, call bdrv_op_block() to block some operations. - Hold the blocker, do his job. - Unblock operations that it blocked, with the same reason pointer passed to bdrv_op_unblock(). - Release the blocker with error_free(). Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Benoit Canet <benoit@irqsave.net> Reviewed-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-05-28 14:28:46 +02:00
Peter Lieven	465bee1da8	block: optimize zero writes with bdrv_write_zeroes this patch tries to optimize zero write requests by automatically using bdrv_write_zeroes if it is supported by the format. This significantly speeds up file system initialization and should speed zero write test used to test backend storage performance. I ran the following 2 tests on my internal SSD with a 50G QCOW2 container and on an attached iSCSI storage. a) mkfs.ext4 -E lazy_itable_init=0,lazy_journal_init=0 /dev/vdX QCOW2 [off] [on] [unmap] ----- runtime: 14secs 1.1secs 1.1secs filesize: 937M 18M 18M iSCSI [off] [on] [unmap] ---- runtime: 9.3s 0.9s 0.9s b) dd if=/dev/zero of=/dev/vdX bs=1M oflag=direct QCOW2 [off] [on] [unmap] ----- runtime: 246secs 18secs 18secs filesize: 51G 192K 192K throughput: 203M/s 2.3G/s 2.3G/s iSCSI* [off] [on] [unmap] ---- runtime: 8mins 45secs 33secs throughput: 106M/s 1.2G/s 1.6G/s allocated: 100% 100% 0% * The storage was connected via an 1Gbit interface. It seems to internally handle writing zeroes via WRITESAME16 very fast. Signed-off-by: Peter Lieven <pl@kamp.de> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-05-19 13:42:27 +02:00
Max Reitz	4993f7ea7e	block: Allow JSON filenames If the filename given to bdrv_open() is prefixed with "json:", parse the rest as a JSON object and merge the result into the options QDict. If there are conflicts, the options QDict takes precedence. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-05-19 11:36:49 +02:00
Kevin Wolf	e88ae2264d	block: Fix bdrv_is_allocated() for short backing files bdrv_is_allocated() shouldn't return true for sectors that are unallocated, but after the end of a short backing file, even though such sectors are (correctly) marked as containing zeros. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2014-05-19 11:36:48 +02:00
Kevin Wolf	b1e6fc0817	block: Fix open flags with BDRV_O_SNAPSHOT The immediately visible effect of this patch is that it fixes committing a temporary snapshot to its backing file. Previously, it would fail with a "permission denied" error because bdrv_inherited_flags() forced the backing file to be read-only, ignoring the r/w reopen of bdrv_commit(). The bigger problem this revealed is that the original open flags must actually only be applied to the temporary snapshot, and the original image file must be treated as a backing file of the temporary snapshot and get the right flags for that. Reported-by: Jan Kiszka <jan.kiszka@web.de> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-05-09 20:57:31 +02:00
Kevin Wolf	f1f25a2e2e	block: Fix open_flags in bdrv_reopen() Use the same function as bdrv_open() for determining what the right flags for bs->file are. Without doing this, a reopen means that bs->file loses BDRV_O_CACHE_WB or BDRV_O_UNMAP if bs doesn't have it as well. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2014-04-30 11:05:00 +02:00
Kevin Wolf	7e3d98dd31	Revert "block: another bdrv_append fix" This reverts commit `3a389e7926`. The commit was wrong and what it tried to fix just works today without any change. What the commit tried to fix: When creating live snapshots, the new image file is opened with BDRV_O_NO_BACKING because the whole backing chain is already opened. It is then appended to the chain using bdrv_append(). The result of this was that the image had a backing file, but BDRV_O_NO_BACKING was still set. This is obviously inconsistent. There used to be some places in qemu that closed and image and then opened it again, with its old flags (a bdrv_open()/close() sequence involves reopening the whole backing file chain, too). In this case the BDRV_O_NO_BACKING flag meant that the backing chain wasn't reopened and only the top layer was left. (Most, but not all of these places are replaced by bdrv_reopen() today, which doesn't touch the backing files at all.) Other places that looked at bs->open_flags weren't interested in BDRV_O_NO_BACKING, so no breakage there. What it actually did: The commit moved the BDRV_O_NO_BACKING away to the backing file. Because the bdrv_open()/close() sequences only looked at the flags of the top level BlockDriverState and used it for the whole chain, the flag didn't hurt there any more. Obviously, it is still inconsistent because the backing file may have another backing file, but without practical impact. At the same time, it swapped all other flags. This is practically irrelevant as long as live snapshots only allow opening the new layer with the same flags as the old top layer. It still doesn't make any sense, and it is a time bomb that explodes as soon as the flags can differ. bdrv_append_temp_snapshot() is such a case: It adds the new flag BDRV_O_TEMPORARY for the temporary snapshot. The swapping of commit `3a389e79` results in the following nonsensical configuration: bs->open_flags: BDRV_O_TEMPORARY cleared bs->file->open_flags: BDRV_O_TEMPORARY set bs->backing_hd->open_flags: BDRV_O_TEMPORARY set bs->backing_hd->file->open_flags: BDRV_O_TEMPORARY cleared We're still lucky because the format layer ignores the flag and the protocol layer happens to get the right value, but sooner or later this is bound to go wrong... What the right fix would have been: Simply clear the BDRV_O_NO_BACKING flag when the BlockDriverState is appended to an existing backing file chain, because now it does have a backing file. Commit `4ddc07ca` already implemented this silently in bdrv_append(), so we don't have to come up with a new fix. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2014-04-30 11:05:00 +02:00
Kevin Wolf	8bfea15dda	block: Unlink temporary files in raw-posix/win32 Instead of having unlink() calls in the generic block layer, where we aren't even guarateed to have a file name, move them to those block drivers that are actually used and that always have a filename. Gets us rid of some #ifdefs as well. The patch also converts bs->is_temporary to a new BDRV_O_TEMPORARY open flag so that it is inherited in the protocol layer and the raw-posix and raw-win32 drivers can unlink the file. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2014-04-30 11:05:00 +02:00
Kevin Wolf	5669b44de5	block: Remove BDRV_O_COPY_ON_READ for bs->file Copy on Read makes sense on the format level where backing files are implemented, but it's not required on the protocol level. While it shouldn't actively break anything to have COR enabled on both layers, needless serialisation and allocation checks may impact performance. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2014-04-30 11:05:00 +02:00
Kevin Wolf	317fc44ef2	block: Create bdrv_backing_flags() Instead of manipulation flags inline, move the derivation of the flags of a backing file into a new function next to the existing functions that derive flags for bs->file and for the block driver open function. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2014-04-30 11:05:00 +02:00
Kevin Wolf	0b50cc8853	block: Create bdrv_inherited_flags() Instead of having bdrv_open_flags() as a function that creates flags for several unrelated places and then adding open-coded flags on top, create a new function that derives the flags for bs->file from the flags for bs. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2014-04-30 11:05:00 +02:00
Jeff Cody	e855e4fb7b	block: Ignore duplicate or NULL format_name in bdrv_iterate_format Some block drivers have multiple BlockDriver instances with identical format_name fields (e.g. gluster, nbd). Both qemu-img and qemu will use bdrv_iterate_format() to list the supported formats when a help option is invoked. As protocols and formats may register multiple drivers, redundant listings of formats occur (e.g., "Supported formats: ... gluster gluster gluster gluster ... "). Since the list of driver formats will be small, this performs a simple linear search on format_name, and ignores any duplicates. The end result change is that the iterator will no longer receive duplicate string names, nor will it receive NULL pointers. Signed-off-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-04-29 11:58:07 +02:00
Markus Armbruster	0fb6395c0c	Use error_is_set() only when necessary (again) error_is_set(&var) is the same as var != NULL, but it takes whole-program analysis to figure that out. Unnecessarily hard for optimizers, static checkers, and human readers. Commit `84d18f0` dumbed it down to obvious, but a few more have crept in since, and documentation was overlooked. Dumb these down, too. Signed-off-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-04-25 18:05:06 +02:00
Benoît Canet	1ba4b6a553	block: Prevent coroutine stack overflow when recursing in bdrv_open_backing_file. In 1.7.1 qcow2_create2 reopen the file for flushing without the BDRV_O_NO_BACKING flags. As a consequence the code would recursively open the whole backing chain. These three stack arrays would pile up through the recursion and lead to a coroutine stack overflow. Convert these array to malloced buffers in order to streamline the coroutine footprint. Symptoms where freezes or segfaults on production machines while taking QMP externals snapshots. The overflow disturbed coroutine switching. [Resolved conflicts on qemu.git/master since the patch was against v1.7.1 --Stefan] Signed-off-by: Benoit Canet <benoit.canet@gmail.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-04-25 18:05:05 +02:00
Kevin Wolf	f2d953ec31	block: Catch duplicate IDs in bdrv_new() Since commit `f298d071`, block devices added with blockdev-add don't have a QemuOpts around in dinfo->opts. Consequently, we can't rely any more on QemuOpts catching duplicate IDs for block devices. This patch adds a new check for duplicate IDs to bdrv_new(), and moves the existing check that the ID isn't already taken for a node-name there as well. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2014-04-22 12:00:28 +02:00
Kevin Wolf	98522f63f4	block: Add errp to bdrv_new() This patch adds an errp parameter to bdrv_new() and updates all its callers. The next patches will make use of this in order to check for duplicate IDs. Most of the callers know that their ID is fine, so they can simply assert that there is no error. Behaviour doesn't change with this patch yet as bdrv_new() doesn't actually assign errors to errp. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2014-04-22 12:00:20 +02:00
Kevin Wolf	636ea3708c	block: Remove -errno return value from bdrv_assign_node_name It takes an errp argument. That's enough for error handling. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-04-22 11:57:02 +02:00
Fam Zheng	b8afb520e4	block: Handle error of bdrv_getlength in bdrv_create_dirty_bitmap bdrv_getlength could fail, check the return value before using it. Return NULL and set errno if it fails. Callers are updated to handle the error case. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-04-22 11:57:02 +02:00
Kevin Wolf	9ce10c0bdc	block: Check bdrv_getlength() return value in bdrv_make_zero() Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2014-04-22 11:57:02 +02:00
Kevin Wolf	da15ee5134	block: Catch integer overflow in bdrv_rw_co() Insanely large requests could cause an integer overflow in bdrv_rw_co() while converting sectors to bytes. This patch catches the problem and returns an error (if we hadn't overflown the integer here, bdrv_check_byte_request() would have rejected the request, so we're not breaking anything that was supposed to work before). We actually do have a test case that triggers behaviour where we accidentally let such a request pass, so that it would return success, but read 0 bytes instead of the requested 4 GB. It fails now like it should. If the vdi block driver wants to be able to deal with huge images, it can't read the whole block bitmap at once into memory like it does today, but needs to use a metadata cache like qcow2 does. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-04-22 11:57:02 +02:00
Kevin Wolf	1dd3a44753	block: Limit size to INT_MAX in bdrv_check_byte_request() Commit `8f4754ed` intended to protect against integer overflow bugs in block drivers by making sure that a single request that is passed to drivers is no longer than INT_MAX bytes. However, meanwhile there are some callers that don't use that code path any more but call bdrv_check_byte_request() directy, so let's add a check there as well. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2014-04-22 11:57:02 +02:00
Kevin Wolf	54db38a479	block: Fix nb_sectors check in bdrv_check_byte_request() nb_sectors is signed, check for negative values. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2014-04-22 11:57:02 +02:00
Kevin Wolf	f187743acd	block: Check bdrv_getlength() return value in bdrv_append_temp_snapshot() Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2014-04-04 19:35:52 +02:00
Kevin Wolf	b998875dcf	block: Fix snapshot=on for protocol parsed from filename Since commit `9fd3171a`, BDRV_O_SNAPSHOT uses an option QDict to specify the originally requested image as the backing file of the newly created temporary snapshot. This means that the filename is stored in "file.filename", which is an option that is not parsed for protocol names. Therefore things like -drive file=nbd:localhost:10809 were broken because it looked for a local file with the literal name 'nbd:localhost:10809'. This patch changes the way BDRV_O_SNAPSHOT works once again. We now open the originally requested image as normal, and then do a similar operation as for live snapshots to put the temporary snapshot on top. This way, both driver specific options and parsed filenames work. As a nice side effect, this results in code movement to factor bdrv_append_temp_snapshot() out. This is a good preparation for moving its call to drive_init() and friends eventually. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2014-04-04 19:35:51 +02:00
Kevin Wolf	e3fa4bfa72	block: Don't parse 'filename' option When using the QDict option 'filename', it is supposed to be interpreted literally. The code did correctly avoid guessing the protocol from any string before the first colon, but it still called bdrv_parse_filename() which would, for example, incorrectly remove a 'file:' prefix in the raw-posix driver. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2014-04-04 17:10:25 +02:00
Kevin Wolf	8f4754ede5	block: Limit request size (CVE-2014-0143) Limiting the size of a single request to INT_MAX not only fixes a direct integer overflow in bdrv_check_request() (which would only trigger bad behaviour with ridiculously huge images, as in close to 2^64 bytes), but can also prevent overflows in all block drivers. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-04-01 15:22:35 +02:00
Kevin Wolf	5a8a30db47	block: Add error handling to bdrv_invalidate_cache() If it returns an error, the migrated VM will not be started, but qemu exits with an error message. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Benoit Canet <benoit@irqsave.net>	2014-03-19 09:39:41 +01:00
Markus Armbruster	c3adb58fe0	blockdev: Refuse to open encrypted image unless paused Opening an encrypted image takes an additional step: setting the key. Between open and the key set, the image must not be used. We have some protection against accidental use in place: you can't unpause a guest while we're missing keys. You can, however, hot-plug block devices lacking keys into a running guest just fine, or insert media lacking keys. In the latter case, notifying the guest of the insert is delayed until the key is set, which may suffice to protect at least some guests in common usage. This patch makes the protection apply in more cases, in a rather heavy-handed way: it doesn't let you open encrypted images unless we're in a paused state. It doesn't extend the protection to users other than the guest (block jobs?). Use of runstate_check() from block.c is disgusting. Best I can do right now. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-03-14 16:24:42 +01:00
Max Reitz	9562f69cfd	block: Unlink temporary file If the image file cannot be opened and was created as a temporary file, it should be deleted; thus, in this case, we should jump to the "unlink_and_fail" label and not just to "fail". Reported-by: Benoît Canet <benoit@irqsave.net> Signed-off-by: Max Reitz <mreitz@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-03-13 14:42:24 +01:00
Benoît Canet	b5042a3622	block: Rewrite the snapshot authorization mechanism for block filters. This patch keep the recursive way of doing things but simplify it by giving two responsabilities to all block filters implementors. They will need to do two things: -Set the is_filter field of their block driver to true. -Implement the bdrv_recurse_is_first_non_filter method of their block driver like it is done on the Quorum block driver. (block/quorum.c) [Paolo Bonzini <pbonzini@redhat.com> pointed out that this patch changes the semantics of blkverify, which now recurses down both bs->file and s->test_file. -- Stefan] Reported-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Benoit Canet <benoit@irqsave.net> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-03-13 14:23:27 +01:00
Max Reitz	938789ea92	block: bs->drv may be NULL in bdrv_debug_resume() Currently, bdrv_debug_resume() requires every bs->drv in the BDS stack to be NULL until a bs->drv with an implementation of bdrv_debug_resume() is found. For a normal function, this would be fine, but this is a function for debugging purposes and should therefore allow intermediate BDS not to have a driver (i.e., be "ejected"). Otherwise, it is hard to debug such situations. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-03-13 14:23:27 +01:00
Kevin Wolf	3456a8d185	block: Update image size in bdrv_invalidate_cache() After migration has completed, we call bdrv_invalidate_cache() so that drivers which cache some data drop their stale copy of the data and reread it from the image file to get a new version of data that the source modified while the migration was running. Reloading metadata from the image file is useless, though, if the size of the image file stays stale (this is a value that is cached for all image formats in block.c). Reads from (meta)data after the old EOF return only zeroes, causing image corruption. We need to update bs->total_sectors in all layers that could potentially have changed their size (i.e. backing files are not a concern - if they are changed, we're in bigger trouble) Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-03-13 14:23:27 +01:00
Kevin Wolf	eb909c7f72	block: Fix error path segfault in bdrv_open() Using an invalid option for a block device that is opened with BDRV_O_PROTOCOL led to drv = NULL, and when trying to include the driver name in the error message, qemu dereferenced it: $ x86_64-softmmu/qemu-system-x86_64 -drive file=/tmp/test.qcow2,file.foo=bar Segmentation fault (core dumped) With this patch applied, the expected error message is printed: $ x86_64-softmmu/qemu-system-x86_64 -drive file=/tmp/test.qcow2,file.foo=bar qemu-system-x86_64: -drive file=/tmp/test.qcow2,file.foo=bar: could not open disk image /tmp/test.qcow2: Block protocol 'file' doesn't support the option 'foo' Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Benoit Canet <benoit@irqsave.net>	2014-03-06 17:29:24 +01:00
Max Reitz	cd5d031e75	block: Keep "filename" option after parsing Currently, bdrv_file_open() always removes the "filename" option from the options QDict after bdrv_parse_filename() has been (successfully) called. However, for drivers with bdrv_needs_filename, it makes more sense for bdrv_parse_filename() to overwrite the "filename" option and for bdrv_file_open() to fetch the filename from there. Since there currently are no drivers that implement bdrv_parse_filename() and have bdrv_needs_filename set, this does not change current behavior. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Benoit Canet <benoit@irqsave.net> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-03-06 16:18:01 +01:00
Benoît Canet	90ce8a061b	block: make bdrv_swap rebuild the bs graph node list field. Moving only the node_name one field could lead to some inconsitencies where a node_name was defined on a bs which was not registered in the graph node list. bdrv_swap between a named node bs and a non named node bs would lead to this. bdrv_make_anon would then crash because it would try to remove the bs from the graph node list while it is not in it. This patch remove named node bses from the graph node list before doing the swap then insert them back. Signed-off-by: Benoit Canet <benoit@irqsave.net> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-03-06 11:33:10 +01:00
Kevin Wolf	47ea2de2d6	block: Fix bs->request_alignment assertion for bs->sg=1 For sg backends, bs->request_alignment is meaningless and may be 0. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Benoit Canet <benoit@irqsave.net> Acked-by: Paolo Bonzini <pbonzini@redhat.com>	2014-03-05 16:58:37 +01:00
Amit Shah	69bef7931e	block: use /var/tmp instead of /tmp for -snapshot If TMPDIR is not specified, the default was to use /tmp for the working copy of the block devices. Update this to /var/tmp instead, so systems using tmp-on-tmpfs don't end up inadvertently using RAM for the block device. Signed-off-by: Amit Shah <amit.shah@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-02-28 18:59:07 +01:00
Max Reitz	f7d9fd8c72	block: Remove bdrv_open_image()'s force_raw option This option is now unnecessary since specifying BDRV_O_PROTOCOL as flag will do exactly the same. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Benoit Canet <benoit@irqsave.net> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-02-21 21:02:22 +01:00
Max Reitz	5acd9d81e1	block: Reuse success path from bdrv_open() The fail and success paths of bdrv_file_open() may be further shortened by reusing code already existent in bdrv_open(). This includes bdrv_file_open() not taking the reference to options which allows the removal of QDECREF(options) in that function. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Benoit Canet <benoit@irqsave.net> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-02-21 21:02:22 +01:00
Max Reitz	5469a2a688	block: Handle bs->options in bdrv_open() only The fail paths of bdrv_file_open() and bdrv_open() naturally exhibit similarities, thus it is possible to reuse the one from bdrv_open() and shorten the one in bdrv_file_open() accordingly. Also, setting bs->options in bdrv_file_open() is not necessary if it is already done in bdrv_open(). Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Benoit Canet <benoit@irqsave.net> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-02-21 21:02:22 +01:00
Max Reitz	d4446eae63	block: Remove bdrv_new() from bdrv_file_open() Change bdrv_file_open() to take a simple pointer to an already existing BDS instead of an indirect one. The BDS will be created in bdrv_open() if necessary. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Benoit Canet <benoit@irqsave.net> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-02-21 21:02:22 +01:00
Max Reitz	5d12aa63c7	block: Reuse reference handling from bdrv_open() Remove the reference parameter and the related handling code from bdrv_file_open(), since it exists in bdrv_open() now as well. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Benoit Canet <benoit@irqsave.net> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-02-21 21:02:22 +01:00
Max Reitz	2e40134bfd	block: Make bdrv_file_open() static Add the bdrv_open() option BDRV_O_PROTOCOL which results in passing the call to bdrv_file_open(). Additionally, make bdrv_file_open() static and therefore bdrv_open() the only way to call it. Consequently, all existing calls to bdrv_file_open() have to be adjusted to use bdrv_open() with the BDRV_O_PROTOCOL flag instead. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Benoit Canet <benoit@irqsave.net> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-02-21 21:02:22 +01:00
Max Reitz	ddf5636dc9	block: Add reference parameter to bdrv_open() Allow bdrv_open() to handle references to existing block devices just as bdrv_file_open() is already capable of. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-02-21 21:02:22 +01:00
Max Reitz	f67503e5bd	block: Change BDS parameter of bdrv_open() to ** Make bdrv_open() take a pointer to a BDS pointer, similarly to bdrv_file_open(). If a pointer to a NULL pointer is given, bdrv_open() will create a new BDS with an empty name; if the BDS pointer is not NULL, that existing BDS will be reused (in the same way as bdrv_open() already did). Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-02-21 21:02:21 +01:00
Kevin Wolf	e6dc8a1f83	block: Fix bdrv_is_first_non_filter() Consider top level BlockDriverStates as well. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Benoit Canet <benoit@irqsave.net> Tested-by: Benoit Canet <benoit@irqsave.net>	2014-02-21 21:02:21 +01:00
Peter Maydell	4c0c9bbe78	Merge remote-tracking branch 'remotes/qmp-unstable/queue/qmp' into staging * remotes/qmp-unstable/queue/qmp: monitor: Add object_add class argument completion. monitor: Add object_del id argument completion. monitor: Add device_add device argument completion. monitor: Add device_del id argument completion. qmp: expose list of supported character device backends Use error_is_set() only when necessary QMP: allow JSON dict arguments in qmp-shell hmp: migrate command (without -d) now blocks correctly Conflicts: blockdev.c [PMM: resolved trivial conflict in blockdev.c] Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2014-02-20 12:10:23 +00:00
Markus Armbruster	84d18f065f	Use error_is_set() only when necessary error_is_set(&var) is the same as var != NULL, but it takes whole-program analysis to figure that out. Unnecessarily hard for optimizers, static checkers, and human readers. Dumb it down to obvious. Gets rid of several dozen Coverity false positives. Note that the obvious form is already used in many places. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Andreas Färber <afaerber@suse.de> Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>	2014-02-17 11:57:23 -05:00
Benoît Canet	0c5e94ee83	block: Open by reference will try device then node_name. Since we introduced node_name for named bs of the graph modify the opening by reference to use it as a fallback. This patch also enforce the separation of the device id and graph node namespaces. Signed-off-by: Benoit Canet <benoit@irqsave.net> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-02-14 18:05:39 +01:00
Benoît Canet	dd67fa5052	block: Relax bdrv_lookup_bs constraints. The following patch will reuse bdrv_lookup_bs in order to open images by references so the rules of usage of bdrv_lookup_bs must be relaxed a bit. Signed-off-by: Benoit Canet <benoit@irqsave.net> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-02-14 18:05:39 +01:00
Kevin Wolf	e96126ffa5	block: Fix 32 bit truncation in mark_request_serialising() On 32 bit hosts, size_t is too small for align as the bitmask ~(align - 1) will zero out the higher 32 bits of the offset. While at it, change the local overlap_bytes variable to unsigned to match the field in BdrvTrackedRequest. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Laszlo Ersek <lersek@redhat.com>	2014-02-09 09:12:39 +01:00
Kevin Wolf	5f5bcd80f8	block: Don't call ROUND_UP with negative values The behaviour of the ROUND_UP macro with negative numbers isn't obvious. It happens to do the right thing in this please, but better avoid it. Suggested-by: Laszlo Ersek <lersek@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Laszlo Ersek <lersek@redhat.com>	2014-02-09 09:12:39 +01:00
Kevin Wolf	af91f9a73c	block: bdrv_aligned_pwritev: Assert overlap range This adds assertions that the request that we actually end up passing to the block driver (which includes RMW data and has therefore potentially been rounded to alignment boundaries) is fully covered by the overlap_{offset,size} fields of the associated BdrvTrackedRequest. Suggested-by: Laszlo Ersek <lersek@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Laszlo Ersek <lersek@redhat.com>	2014-02-09 09:12:39 +01:00
Kevin Wolf	99c4a85ce6	block: Fix memory leaks in bdrv_co_do_pwritev() The error path for a failure in one of the two bdrv_aligned_preadv() calls leaked head_buf or tail_buf, respectively. This fixes the memory leak. Reported-by: Laszlo Ersek <lersek@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Laszlo Ersek <lersek@redhat.com>	2014-02-09 09:12:39 +01:00
Kevin Wolf	765003db02	block: Fail gracefully with missing filename This fixes a regression introduced in commit `2a05cbe42` ('block: Allow block devices without files'): $ qemu-system-x86_64 -drive driver=file qemu-system-x86_64: block.c:892: bdrv_open_common: Assertion `!drv->bdrv_needs_filename \|\| filename != ((void *)0)' failed. Now the respective check must be performed not only in bdrv_file_open(), but also in bdrv_open(). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-02-09 09:12:38 +01:00
Kevin Wolf	d5103588aa	block: Switch bdrv_io_limits_intercept() to byte granularity Request sizes used to be rounded down to the next sector boundary, allowing to bypass the I/O limit. Now all requests are accounted for with their exact byte size. Reported-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2014-01-24 17:40:28 +01:00
Kevin Wolf	9e1cb96d9a	qemu-iotests: Test pwritev RMW logic Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2014-01-24 17:40:25 +01:00
Kevin Wolf	8407d5d7e2	block: Make bdrv_pwrite() a bdrv_prwv_co() wrapper Instead of implementing the alignment adjustment here, use the now existing functionality of bdrv_co_do_pwritev(). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2014-01-24 17:40:03 +01:00
Kevin Wolf	a3ef657185	block: Make bdrv_pread() a bdrv_prwv_co() wrapper Instead of implementing the alignment adjustment here, use the now existing functionality of bdrv_co_do_preadv(). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2014-01-24 17:40:03 +01:00
Kevin Wolf	775aa8b6e0	block: Change coroutine wrapper to byte granularity Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2014-01-24 17:40:03 +01:00
Kevin Wolf	28de2dcd88	block: Assert serialisation assumptions in pwritev If a request calls wait_serialising_requests() and actually has to wait in this function (i.e. a coroutine yield), other requests can run and previously read data (like the head or tail buffer) could become outdated. In this case, we would have to restart from the beginning to read in the updated data. However, we're lucky and don't actually need to do that: A request can only wait in the first call of wait_serialising_requests() because we mark it as serialising before that call, so any later requests would wait. So as we don't wait in practice, we don't have to reload the data. This is an important assumption that may not be broken or data corruption will happen. Document it with some assertions. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2014-01-24 17:40:03 +01:00
Kevin Wolf	3b8242e0ea	block: Align requests in bdrv_co_do_pwritev() This patch changes bdrv_co_do_pwritev() to actually be what its name promises. If requests aren't properly aligned, it performs a RMW. Requests touching the same block are serialised against the RMW request. Further optimisation of this is possible by differentiating types of requests (concurrent reads should actually be okay here). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Benoit Canet <benoit@irqsave.net>	2014-01-24 17:40:02 +01:00
Kevin Wolf	6460440f34	block: Allow wait_serialising_requests() at any point We can only have a single wait_serialising_requests() call per request because otherwise we can run into deadlocks where requests are waiting for each other. The same is true when wait_serialising_requests() is not at the very beginning of a request, so that other requests can be issued between the start of the tracking and wait_serialising_requests(). Fix this by changing wait_serialising_requests() to ignore requests that are already (directly or indirectly) waiting for the calling request. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Benoit Canet <benoit@irqsave.net>	2014-01-24 17:40:02 +01:00
Kevin Wolf	7327145f63	block: Make overlap range for serialisation dynamic Copy on Read wants to serialise with all requests touching the same cluster, so wait_serialising_requests() rounded to cluster boundaries. Other users like alignment RMW will have different requirements, though (requests touching the same sector), so make it dynamic. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Benoit Canet <benoit@irqsave.net>	2014-01-24 17:40:02 +01:00
Kevin Wolf	2dbafdc012	block: Generalise and optimise COR serialisation Change the API so that specific requests can be marked serialising. Only these requests are checked for overlaps then. This means that during a Copy on Read operation, not all requests overlapping other requests are serialised any more, but only those that actually overlap with the specific COR request. Also remove COR from function and variable names because this functionality can be useful in other contexts. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Benoit Canet <benoit@irqsave.net>	2014-01-24 17:40:02 +01:00
Kevin Wolf	ec746e10cb	block: Make zero-after-EOF work with larger alignment Odd file sizes could make bdrv_aligned_preadv() shorten the request in non-aligned ways. Fix it by rounding to the required alignment instead of 512 bytes. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Benoit Canet <benoit@irqsave.net>	2014-01-24 17:40:02 +01:00
Kevin Wolf	65afd211c7	block: Allow waiting for overlapping requests between begin/end Previously, it was not possible to use wait_for_overlapping_requests() between tracked_request_begin()/end() because it would wait for itself. Ignore the current request in the overlap check and run more of the bdrv_co_do_preadv/pwritev code with a BdrvTrackedRequest present. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Benoit Canet <benoit@irqsave.net>	2014-01-24 17:40:02 +01:00
Kevin Wolf	793ed47a7a	block: Switch BdrvTrackedRequest to byte granularity Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Benoit Canet <benoit@irqsave.net>	2014-01-24 17:40:02 +01:00
Kevin Wolf	6601553e27	block: Introduce bdrv_co_do_pwritev() This is going to become the bdrv_co_do_preadv() equivalent for writes. In this patch, however, just a function taking byte offsets is created, it doesn't align anything yet. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Benoit Canet <benoit@irqsave.net>	2014-01-24 17:40:02 +01:00
Kevin Wolf	244eadef5c	block: write: Handle COR dependency after I/O throttling First waiting for all COR requests to complete and calling the throttling function afterwards means that the request could be delayed and we still need to wait for the COR request even if it was issued only after the throttled write request. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Benoit Canet <benoit@irqsave.net>	2014-01-24 17:40:02 +01:00
Kevin Wolf	b404f72036	block: Introduce bdrv_aligned_pwritev() This separates the part of bdrv_co_do_writev() that needs to happen before the request is modified to match the backend alignment, and a part that needs to be executed afterwards and passes the request to the BlockDriver. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Benoit Canet <benoit@irqsave.net>	2014-01-24 17:40:02 +01:00
Kevin Wolf	1b0288ae7f	block: Introduce bdrv_co_do_preadv() Similar to bdrv_pread(), which aligns byte-aligned request to 512 byte sectors, bdrv_co_do_preadv() takes a byte-aligned request and aligns it to the alignment specified in bs->request_alignment. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Benoit Canet <benoit@irqsave.net>	2014-01-24 17:40:02 +01:00
Kevin Wolf	d0c7f642f5	block: Introduce bdrv_aligned_preadv() This separates the part of bdrv_co_do_readv() that needs to happen before the request is modified to match the backend alignment, and a part that needs to be executed afterwards and passes the request to the BlockDriver. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2014-01-24 17:40:02 +01:00
Paolo Bonzini	c25f53b06e	raw: Probe required direct I/O alignment Add a bs->request_alignment field that contains the required offset/length alignment for I/O requests and fill it in the raw block drivers. Use ioctls if possible, else see what alignment it takes for O_DIRECT to succeed. While at it, also expose the memory alignment requirements, which may be (and in practice are) different from the disk alignment requirements. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2014-01-24 17:40:02 +01:00
Paolo Bonzini	1b7fd72955	block: rename buffer_alignment to guest_block_size The alignment field is now set to the value that is promised to the guest, rather than required by the host. The next patches will make QEMU aware of the host-provided values, so make this clear. The alignment is also not about memory buffers, but about the sectors on the disk, change the documentation of the field. At this point, the field is set by the device emulation, but completely ignored by the block layer. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Benoit Canet <benoit@irqsave.net>	2014-01-24 17:40:01 +01:00
Kevin Wolf	339064d506	block: Don't use guest sector size for qemu_blockalign() bs->buffer_alignment is set by the device emulation and contains the logical block size of the guest device. This isn't something that the block layer should know, and even less something to use for determining the right alignment of buffers to be used for the host. The new BlockLimits field opt_mem_alignment tells the qemu block layer the optimal alignment to be used so that no bounce buffer must be used in the driver. This patch may change the buffer alignment from 4k to 512 for all callers that used qemu_blockalign() with the top-level image format BlockDriverState. The value was never propagated to other levels in the tree, so in particular raw-posix never required anything else than 512. While on disks with 4k sectors direct I/O requires a 4k alignment, memory may still be okay when aligned to 512 byte boundaries. This is what must have happened in practice, because otherwise this would already have failed earlier. Therefore I don't expect regressions even with this intermediate state. Later, raw-posix can implement the hook and expose a different memory alignment requirement. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2014-01-24 17:40:01 +01:00
Kevin Wolf	1ff735bdc4	block: Detect unaligned length in bdrv_qiov_is_aligned() For an O_DIRECT request to succeed, it's not only necessary that all base addresses in the qiov are aligned, but also that each length in it is aligned. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2014-01-24 17:40:01 +01:00

1 2 3 4 5 ...

844 Commits