qemu-e2k

Commit Graph

Author	SHA1	Message	Date
Kevin Wolf	38f3ef574b	raw: Prohibit dangerous writes for probed images If the user neglects to specify the image format, QEMU probes the image to guess it automatically, for convenience. Relying on format probing is insecure for raw images (CVE-2008-2004). If the guest writes a suitable header to the device, the next probe will recognize a format chosen by the guest. A malicious guest can abuse this to gain access to host files, e.g. by crafting a QCOW2 header with backing file /etc/shadow. Commit `1e72d3b` (April 2008) provided -drive parameter format to let users disable probing. Commit `f965509` (March 2009) extended QCOW2 to optionally store the backing file format, to let users disable backing file probing. QED has had a flag to suppress probing since the beginning (2010), set whenever a raw backing file is assigned. All of these additions that allow to avoid format probing have to be specified explicitly. The default still allows the attack. In order to fix this, commit `79368c8` (July 2010) put probed raw images in a restricted mode, in which they wouldn't be able to overwrite the first few bytes of the image so that they would identify as a different image. If a write to the first sector would write one of the signatures of another driver, qemu would instead zero out the first four bytes. This patch was later reverted in commit `8b33d9e` (September 2010) because it didn't get the handling of unaligned qiov members right. Today's block layer that is based on coroutines and has qiov utility functions makes it much easier to get this functionality right, so this patch implements it. The other differences of this patch to the old one are that it doesn't silently write something different than the guest requested by zeroing out some bytes (it fails the request instead) and that it doesn't maintain a list of signatures in the raw driver (it calls the usual probe function instead). Note that this change doesn't introduce new breakage for false positive cases where the guest legitimately writes data into the first sector that matches the signatures of an image format (e.g. for nested virt): These cases were broken before, only the failure mode changes from corruption after the next restart (when the wrong format is probed) to failing the problematic write request. Also note that like in the original patch, the restrictions only apply if the image format has been guessed by probing. Explicitly specifying a format allows guests to write anything they like. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 1416497234-29880-8-git-send-email-kwolf@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-12-10 10:31:13 +01:00
Kevin Wolf	7cddd3728e	block: Read only one sector for format probing The only image format driver that even potentially accesses anything after 512 bytes in its bdrv_probe() implementation is VMDK, which reads a plain-text descriptor file. In practice, the field it's looking for seems to come first and will be well within the first 512 bytes, too. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 1416497234-29880-7-git-send-email-kwolf@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-12-10 10:31:12 +01:00
Markus Armbruster	c6684249fd	block: Factor bdrv_probe_all() out of find_image_format() Signed-off-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 1416497234-29880-6-git-send-email-kwolf@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-12-10 10:31:12 +01:00
Fam Zheng	20a9e77dfa	block: Add bdrv_get_node_name This returns the node name of a BDS. Remove the TODO comment and expect the callers to be explicit. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-12-10 10:25:29 +01:00
Fam Zheng	04df765ab4	block: Add bdrv_next_node Similar to bdrv_next, this traverses through graph_bdrv_states. Will be useful to enumerate all the named nodes. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-12-10 10:25:29 +01:00
Fam Zheng	f3a9cfddae	block: Fix max nb_sectors in bdrv_make_zero In bdrv_rw_co we report -EINVAL for nb_sectors > INT_MAX / BDRV_SECTOR_SIZE, so a caller shouldn't exceed it. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Message-id: 1415603264-21497-1-git-send-email-famz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-11-14 09:20:35 +00:00
Peter Maydell	776346cd63	trivial patches for 2014-11-11 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQEcBAABAgAGBQJUYh9vAAoJEL7lnXSkw9fbgPQH/065L5+SpaJR1Nte9Lz3N2s1 a6tGSI22yu85tKvYCdYjeoVHSkSTyR57FdTfUd2xc2QPj+J4sWXpA81KILBGTJUp NMpmLpWg4LOh8Ek4ViRgmFFdryzIFa4dT4gc1AcSAIAQ6jsgK1dM7m5kfncC3TN0 TUs248vJ2i/DaE0k8TOeJqxJTqInoFttlJEqG7RD+V5JznokE4zpFNXHDGx9BptE W2J38GJ/TKRPe9UrHMKZI1r6+ZBdXyE/CaqsNNKLJdqrHgSQuAyK/PS6dQbM4BLg M1qdP7Tp0wOlvv9qoEZMOEiUsi54XPqLgaLMbW74Yp5X459fqmLW2imy49pHXt8= =klsW -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/mjt/tags/pull-trivial-patches-2014-11-11' into staging trivial patches for 2014-11-11 # gpg: Signature made Tue 11 Nov 2014 14:38:39 GMT using RSA key ID A4C3D7DB # gpg: Good signature from "Michael Tokarev <mjt@tls.msk.ru>" # gpg: aka "Michael Tokarev <mjt@corpit.ru>" # gpg: aka "Michael Tokarev <mjt@debian.org>" * remotes/mjt/tags/pull-trivial-patches-2014-11-11: block: Fix comment for bdrv_co_get_block_status sysbus: Correct SYSTEM_BUS(obj) defines target-i386: cpu: keeping function parameters alignment on new line xen-hvm: Remove redundant variable 'xstate' coroutine-sigaltstack: Change jmp_buf to sigjmp_buf pc-bios: petalogix-s3adsp1800.dtb: Use 'xlnx, xps-ethernetlite-2.00.a' instead of 'xlnx, xps-ethernetlite-2.00.b' gdbstub: Add a missing case of signal number translation in gdbstub numa: make 'info numa' take into account hotplugged memory slirp/smbd: modify/set several parameters in generated smbd.conf qemu-doc.texi: fix typos in x509 examples icc_bus: fix typo ICC_BRIGDE -> ICC_BRIDGE Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2014-11-11 14:50:10 +00:00
Fam Zheng	705be728c0	block: Fix comment for bdrv_co_get_block_status It returns more information than binary, fix the comment. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2014-11-11 17:36:19 +03:00
Max Reitz	e56934bece	block: Propagate error in bdrv_img_create() If the specified backing file could not be opened, do not generate a new error message which contains the message which has been generated by bdrv_open(), but just propagate the latter. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Peter Lieven <pl@kamp.de> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-11-06 12:45:47 +01:00
Stefan Hajnoczi	5a7e7a0bad	block: let mirror blockjob run in BDS AioContext The mirror block job must run in the BlockDriverState AioContext so that it works with dataplane. Acquire the AioContext in blockdev.c so starting the block job is safe. Note that to_replace is treated separately from other BlockDriverStates in that it does not need to be in the same AioContext. Explicitly acquire/release to_replace's AioContext when accessing it. The completion code in block/mirror.c must perform BDS graph manipulation and bdrv_reopen() from the main loop. Use block_job_defer_to_main_loop() to achieve that. The bdrv_drain_all() call is not allowed outside the main loop since it could lead to lock ordering problems. Use bdrv_drain(bs) instead because we have acquired the AioContext so nothing else can sneak in I/O. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 1413889440-32577-10-git-send-email-stefanha@redhat.com	2014-11-03 11:41:49 +00:00
Stefan Hajnoczi	5b98db0ad3	block: add bdrv_drain() Now that op blockers are in use, we can ensure that no other sources are generating I/O on a BlockDriverState. Therefore it is possible to drain requests for a single BDS. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 1413889440-32577-7-git-send-email-stefanha@redhat.com	2014-11-03 11:41:49 +00:00
Max Reitz	7748543420	block: Add status callback to bdrv_amend_options() Depending on the changed options and the image format, bdrv_amend_options() may take a significant amount of time. In these cases, a way to be informed about the operation's status is desirable. Since the operation is rather complex and may fundamentally change the image, implementing it as AIO or a coroutine does not seem feasible. On the other hand, implementing it as a block job would be significantly more difficult than a simple callback and would not add benefits other than progress report to the amending operation, because it should not actually be run as a block job at all. A callback may not be very pretty, but it's very easy to implement and perfectly fits its purpose here. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Benoît Canet <benoit.canet@nodalink.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-id: 1414404776-4919-2-git-send-email-mreitz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-11-03 11:41:48 +00:00
Peter Maydell	573742a543	block.c: Fix type of IoOperationType variable in send_qmp_error_event() The local variable 'ac' in send_qmp_error_event() is declared with the wrong type, which causes clang to complain when it is initialized and again when it is used: block.c:3655:20: warning: implicit conversion from enumeration type 'enum IoOperationType' to different enumeration type 'BlockErrorAction' (aka 'enum BlockErrorAction') [-Wenum-conversion] ac = is_read ? IO_OPERATION_TYPE_READ : IO_OPERATION_TYPE_WRITE; ~ ^~~~~~~~~~~~~~~~~~~~~~ block.c:3655:45: warning: implicit conversion from enumeration type 'enum IoOperationType' to different enumeration type 'BlockErrorAction' (aka 'enum BlockErrorAction') [-Wenum-conversion] ac = is_read ? IO_OPERATION_TYPE_READ : IO_OPERATION_TYPE_WRITE; ~ ^~~~~~~~~~~~~~~~~~~~~~~ block.c:3656:62: warning: implicit conversion from enumeration type 'BlockErrorAction' (aka 'enum BlockErrorAction') to different enumeration type 'IoOperationType' (aka 'enum IoOperationType') [-Wenum-conversion] qapi_event_send_block_io_error(bdrv_get_device_name(bs), ac, action, ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^~ Correct the type to IoOperationType, and rename the variable to 'optype' to match its correct type. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Luiz Capitulino <lcapitulino@redhat.com> Message-id: 1412969583-21045-1-git-send-email-peter.maydell@linaro.org Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-11-03 09:48:41 +00:00
Peter Lieven	6c5a42ac34	block: avoid creating oversized writes in multiwrite_merge Signed-off-by: Peter Lieven <pl@kamp.de> Reviewed-by: Ronnie Sahlberg <ronniesahlberg@gmail.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-11-03 09:48:41 +00:00
Peter Lieven	2647fab57d	BlockLimits: introduce max_transfer_length Signed-off-by: Peter Lieven <pl@kamp.de> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-11-03 09:48:41 +00:00
Max Reitz	59c9a95fd2	block: Respect underlying file's EOF When falling through to the underlying file in bdrv_co_get_block_status(), if it returns that the query offset is beyond the file end (by setting *pnum to 0), return the range to be zero and do not let the number of sectors for which information could be obtained be overwritten. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-10-23 15:34:02 +02:00
Max Reitz	9ebd844805	block: Add qemu_{,try_}blockalign0() These functions call their non-0-counterparts and then fill the allocated buffer with 0 (if the allocation has been successful). Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-10-23 15:34:01 +02:00
Markus Armbruster	a7f53e26a6	block: Lift device model API into BlockBackend Move device model attachment / detachment and the BlockDevOps device model callbacks and their wrappers from BlockDriverState to BlockBackend. Wrapper calls in block.c change from bdrv_dev_FOO_cb(bs, ...) to if (bs->blk) { bdrv_dev_FOO_cb(bs->blk, ...); } No change, because both bdrv_dev_change_media_cb() and bdrv_dev_resize_cb() do nothing when no device model is attached, and a device model can be attached only when bs->blk. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-10-20 14:03:50 +02:00
Markus Armbruster	097310b53e	block: Rename BlockDriverCompletionFunc to BlockCompletionFunc I'll use it with block backends shortly, and the name is going to fit badly there. It's a block layer thing anyway, not just a block driver thing. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-10-20 13:41:27 +02:00
Markus Armbruster	7c84b1b831	block: Rename BlockDriverAIOCB* to BlockAIOCB* I'll use BlockDriverAIOCB with block backends shortly, and the name is going to fit badly there. It's a block layer thing anyway, not just a block driver thing. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-10-20 13:41:27 +02:00
Markus Armbruster	7f06d47eff	block: Merge BlockBackend and BlockDriverState name spaces BlockBackend's name space is separate only to keep the initial patches simple. Time to merge the two. Retain bdrv_find() and bdrv_get_device_name() for now, to keep this series manageable. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-10-20 13:41:26 +02:00
Markus Armbruster	bfb197e0d9	block: Eliminate BlockDriverState member device_name[] device_name[] can become non-empty only in bdrv_new_root() and bdrv_move_feature_fields(). The latter is used only to undo damage done by bdrv_swap(). The former is called only by blk_new_with_bs(). Therefore, when a BlockDriverState's device_name[] is non-empty, then it's been created with a BlockBackend, and vice versa. Furthermore, blk_new_with_bs() keeps the two names equal. Therefore, device_name[] is redundant. Eliminate it. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-10-20 13:41:26 +02:00
Markus Armbruster	fea68bb6e9	block: Eliminate bdrv_iterate(), use bdrv_next() Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Benoît Canet <benoit.canet@nodalink.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-10-20 13:41:26 +02:00
Markus Armbruster	18e46a033d	block: Connect BlockBackend and DriveInfo Make the BlockBackend own the DriveInfo. Change blockdev_init() to return the BlockBackend instead of the DriveInfo. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-10-20 13:41:26 +02:00
Markus Armbruster	7e7d56d9e0	block: Connect BlockBackend to BlockDriverState Convenience function blk_new_with_bs() creates a BlockBackend with its BlockDriverState. Callers have to unref both. The commit after next will relieve them of the need to unref the BlockDriverState. Complication: due to the silly way drive_del works, we need a way to hide a BlockBackend, just like bdrv_make_anon(). To emphasize its "special" status, give the function a suitably off-putting name: blk_hide_on_behalf_of_do_drive_del(). Unfortunately, hiding turns the BlockBackend's name into the empty string. Can't avoid that without breaking the blk->bs->device_name equals blk->name invariant. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-10-20 13:41:26 +02:00
Markus Armbruster	e4e9986b1c	block: Split bdrv_new_root() off bdrv_new() Creating an anonymous BDS can't fail. Make that obvious. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Benoît Canet <benoit.canet@nodalink.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-10-20 13:41:26 +02:00
Alexey Kardashevskiy	7ea2d269cb	block/migration: Disable cache invalidate for incoming migration When migrated using libvirt with "--copy-storage-all", at the end of migration there is race between NBD mirroring task trying to do flush and migration completion, both end up invalidating cache. Since qcow2 driver does not handle this situation very well, random crashes happen. This disables the BDRV_O_INCOMING flag for the block device being migrated once the cache has been invalidated. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> -- fixed parens by hand Signed-off-by: Juan Quintela <quintela@redhat.com>	2014-10-14 09:35:21 +02:00
Markus Armbruster	f5bebbbb28	util: Emancipate id_wellformed() from QemuOpts IDs have long spread beyond QemuOpts: not everything with an ID necessarily goes through QemuOpts. Commit `9aebf3b` is about such a case: block layer names are meant to be well-formed IDs, but some of them don't go through QemuOpts, and thus weren't checked. The commit fixed that the straightforward way: rename the internal QemuOpts helper id_wellformed() to qemu_opts_id_wellformed() and give it external linkage. Instead of using it directly in block.c, the commit adds wrapper bdrv_is_valid_name(), probably to hide the connection to QemuOpts. Go one logical step further: emancipate IDs from QemuOpts. Rename the function back to id_wellformed(), and put it in another file. While there, clean up its value to bool. Peel off the bdrv_is_valid_name() wrapper. [Replaced stray return 0 with return false to match bool returns used elsewhere in id_wellformed(). --Stefan] Signed-off-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-10-03 10:30:33 +01:00
Kevin Wolf	9aebf3b892	block: Validate node-name The device_name of a BlockDriverState is currently checked because it is always used as a QemuOpts ID and qemu_opts_create() checks whether such IDs are wellformed. node-name is supposed to share the same namespace, but it isn't checked currently. This patch adds explicit checks both for device_name and node-name so that the same rules will still apply even if QemuOpts won't be used any more at some point. qemu-img used to use names with spaces in them, which isn't allowed any more. Replace them with underscores. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-09-25 15:24:32 +02:00
Markus Armbruster	d224469d87	block: Improve message for device name clashing with node name Suggested-by: Benoit Canet <benoit.canet@nodalink.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Benoît Canet <benoit.canet@nodalink.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-09-25 15:24:14 +02:00
Markus Armbruster	3ae59580a0	block: Keep DriveInfo alive until BlockDriverState dies If the BDS's refcnt > 0, drive_del() destroys the DriveInfo, but not the BDS. This can happen in three places: * Device model destruction during unplug: blockdev_auto_del() * Xen IDE unplug: pci_piix3_xen_ide_unplug() * drive_del command when no device model is attached: do_drive_del() The other callers of drive_del are on error paths where refcnt == 1. If the user somehow manages to plug in a device model using a BDS that has gone through drive_del(), the legacy configuration passed in DriveInfo doesn't reach the device model, and automatic deletion on unplug doesn't work. Worse, some device models such as scsi-disk crash when DriveInfo doesn't exist. This is theoretical; I didn't research an actual reproducer. The problem was introduced when we replaced DriveInfo reference counting by BDS reference counting in commit a94a3fa..fa510eb. Fix by keeping DriveInfo alive until its BDS dies. This affects qemu_drive_opts: now you can't reuse the same ID for new drive options until the BDS dies. Before, you could, but since the code always attempts to create a BDS with the same ID next, the enclosing operation "create a new drive" failed anyway. Different error path, same result. Unfortunately, the fix involves use of blockdev.c stuff from block.c, which is a layering violation. Fortunately, my forthcoming BlockBackend work will get rid of it again. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Benoît Canet <benoit.canet@nodalink.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-09-25 15:24:14 +02:00
Fam Zheng	8007429a99	block: Rename qemu_aio_release -> qemu_aio_unref Suggested-by: Benoît Canet <benoit.canet@irqsave.net> Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-09-22 11:39:17 +01:00
Fam Zheng	ca5fd113b8	block: Drop AIOCBInfo.cancel Now that all the implementations are converted to asynchronous version and we can emulate synchronous cancellation with it. Let's drop the unused member. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-09-22 11:39:16 +01:00
Fam Zheng	f600ac1902	block: Drop bdrv_em_aiocb_info.cancel Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-09-22 11:39:00 +01:00
Fam Zheng	3acabd685e	block: Drop bdrv_em_co_aiocb_info.cancel Also drop the now unused ->done pointer. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-09-22 11:38:59 +01:00
Fam Zheng	02c50efe08	block: Add bdrv_aio_cancel_async This is the async version of bdrv_aio_cancel, which doesn't block the caller. It guarantees that the cb is called either before returning or some time later. bdrv_aio_cancel can base on bdrv_aio_cancel_async, later we can convert all .io_cancel implementations to .io_cancel_async, and the aio_poll is the common logic. In the end, .io_cancel can be dropped. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-09-22 11:38:58 +01:00
Fam Zheng	f197fe2b2c	block: Add refcnt in BlockDriverAIOCB This will be useful in synchronous cancel emulation with bdrv_aio_cancel_async. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-09-22 11:38:57 +01:00
Luiz Capitulino	624ff5736e	block: extend BLOCK_IO_ERROR with reason string BLOCK_IO_ERROR events are logged by libvirt, which helps with post mortem analysis of guests. However, one information that we miss today is a human readable string describing the cause of the I/O error. This commit adds that string it to BLOCK_IO_ERROR. Note that this string is a debugging aid for humans, meaning that it should not parsed by applications. Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-09-11 17:14:13 +02:00
Benoît Canet	5366d0c8bc	block: Make the block accounting functions operate on BlockAcctStats This is the next step for decoupling block accounting functions from BlockDriverState. In a future commit the BlockAcctStats structure will be moved from BlockDriverState to the device models structures. Note that bdrv_get_stats was introduced so device models can retrieve the BlockAcctStats structure of a BlockDriverState without being aware of it's layout. This function should go away when BlockAcctStats will be embedded in the device models structures. CC: Kevin Wolf <kwolf@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> CC: Keith Busch <keith.busch@intel.com> CC: Anthony Liguori <aliguori@amazon.com> CC: "Michael S. Tsirkin" <mst@redhat.com> CC: Paolo Bonzini <pbonzini@redhat.com> CC: Eric Blake <eblake@redhat.com> CC: Peter Maydell <peter.maydell@linaro.org> CC: Michael Tokarev <mjt@tls.msk.ru> CC: John Snow <jsnow@redhat.com> CC: Markus Armbruster <armbru@redhat.com> CC: Alexander Graf <agraf@suse.de> CC: Max Reitz <mreitz@redhat.com> Signed-off-by: Benoît Canet <benoit.canet@nodalink.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-09-10 10:41:29 +02:00
Benoît Canet	5e5a94b605	block: Extract the block accounting code The plan is to add new accounting metrics (latency, invalid requests, failed requests, queue depth) and block.c is overpopulated so it will be better to work in a separate module. Moreover the long term plan is to have statistics in each of the BDS of the graph for metrology purpose; this means that the device model statistics must move from the topmost BDS to the device model. So we need to decouple the statistic code from BlockDriverState. This is another argument for the extraction of the code in a separate module. CC: Kevin Wolf <kwolf@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> CC: Max Reitz <mreitz@redhat.com> CC: Eric Blake <eblake@redhat.com> CC: Benoit Canet <benoit@irqsave.net> CC: Fam Zheng <famz@redhat.com> CC: Peter Crosthwaite <peter.crosthwaite@xilinx.com> CC: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Benoît Canet <benoit.canet@nodalink.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-09-10 10:41:29 +02:00
Benoît Canet	0ddd0ad96a	block: Extract the BlockAcctStats structure Extract the block accounting statistics into a structure so the block device models can hold them in the future. CC: Kevin Wolf <kwolf@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> CC: Max Reitz <mreitz@redhat.com> CC: Eric Blake <eblake@redhat.com> Signed-off-by: Benoît Canet <benoit.canet@nodalink.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-09-10 10:41:29 +02:00
Luiz Capitulino	c7c2ff0c7e	block: extend BLOCK_IO_ERROR event with nospace indicator Management software, such as RHEV's vdsm, want to be able to allocate disk space on demand. The basic use case is to start a VM with a small disk and then the disk is enlarged when QEMU hits a ENOSPC condition. To this end, the management software has to be notified when QEMU encounters ENOSPC. The solution implemented by this commit is simple: it extends the BLOCK_IO_ERROR with a 'nospace' key, which is true when QEMU is stopped due to ENOSPC. Note that support for querying this event is already present in query-block by means of the 'io-status' key. Also, the new 'nospace' BLOCK_IO_ERROR field shares the same semantics with 'io-status', which basically means that werror= has to be set to either 'stop' or 'enospc' to enable 'nospace'. Finally, this commit also updates the 'io-status' key doc in the schema with a list of supported device models. Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-09-10 10:41:29 +02:00
Liu Yuan	6bb4515849	block: kill tail whitespace in block.c Cc: Kevin Wolf <kwolf@redhat.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Liu Yuan <namei.unix@gmail.com> Reviewed-by: Benoît Canet <benoit.canet@nodalink.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-09-08 11:12:42 +01:00
Stefan Hajnoczi	391827eb10	block: fix overlapping multiwrite requests When request A is a strict superset of request B: AAAAAAAA BBBB multiwrite_merge() merges them as follows: AABBBB The tail of request A should have been included: AABBBBAA This patch fixes data loss but this code path is probably rare. Since guests cannot assume ordering between in-flight requests, few applications submit overlapping write requests. Reported-by: Slava Pestov <sviatoslav.pestov@gmail.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Benoit Canet <benoit@irqsave.net>	2014-08-29 14:09:43 +01:00
Max Reitz	33384421b3	block: Add AIO context notifiers If a long-running operation on a BDS wants to always remain in the same AIO context, it somehow needs to keep track of the BDS changing its context. This adds a function for registering callbacks on a BDS which are called whenever the BDS is attached or detached from an AIO context. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-08-29 10:48:45 +01:00
Stefan Hajnoczi	ada4240103	block: sort formats alphabetically in bdrv_iterate_format() Format names are best consumed in alphabetical order. This makes human-readable output easy to produce. bdrv_iterate_format() already has an array of format strings. Sort them before invoking the iteration callback. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Benoît Canet <benoit.canet@nodalink.com>	2014-08-28 13:42:25 +01:00
Max Reitz	91af701412	block: Add bdrv_refresh_filename() Some block devices may not have a filename in their BDS; and for some, there may not even be a normal filename at all. To work around this, add a function which tries to construct a valid filename for the BDS.filename field. If a filename exists or a block driver is able to reconstruct a valid filename (which is placed in BDS.exact_filename), this can directly be used. If no filename can be constructed, we can still construct an options QDict which is then converted to a JSON object and prefixed with the "json:" pseudo protocol prefix. The QDict is placed in BDS.full_open_options. For most block drivers, this process can be done automatically; those that need special handling may define a .bdrv_refresh_filename() method to fill BDS.exact_filename and BDS.full_open_options themselves. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-08-20 14:31:56 +02:00
Markus Armbruster	5839e53bbc	block: Use g_new() & friends where that makes obvious sense g_new(T, n) is neater than g_malloc(sizeof(T) * n). It's also safer, for two reasons. One, it catches multiplication overflowing size_t. Two, it returns T * rather than void , which lets the compiler catch more type errors. Patch created with Coccinelle, with two manual changes on top: Add const to bdrv_iterate_format() to keep the types straight * Convert the allocation in bdrv_drop_intermediate(), which Coccinelle inexplicably misses Coccinelle semantic patch: @@ type T; @@ -g_malloc(sizeof(T)) +g_new(T, 1) @@ type T; @@ -g_try_malloc(sizeof(T)) +g_try_new(T, 1) @@ type T; @@ -g_malloc0(sizeof(T)) +g_new0(T, 1) @@ type T; @@ -g_try_malloc0(sizeof(T)) +g_try_new0(T, 1) @@ type T; expression n; @@ -g_malloc(sizeof(T) * (n)) +g_new(T, n) @@ type T; expression n; @@ -g_try_malloc(sizeof(T) * (n)) +g_try_new(T, n) @@ type T; expression n; @@ -g_malloc0(sizeof(T) * (n)) +g_new0(T, n) @@ type T; expression n; @@ -g_try_malloc0(sizeof(T) * (n)) +g_try_new0(T, n) @@ type T; expression p, n; @@ -g_realloc(p, sizeof(T) * (n)) +g_renew(T, p, n) @@ type T; expression p, n; @@ -g_try_realloc(p, sizeof(T) * (n)) +g_try_renew(T, p, n) Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-08-20 11:51:28 +02:00
Max Reitz	908bcd540f	block: Catch !bs->drv in bdrv_check() qemu-img check calls bdrv_check() twice if the first run repaired some inconsistencies. If the first run however again triggered corruption prevention (on qcow2) due to very bad inconsistencies, bs->drv may be NULL afterwards. Thus, bdrv_check() should check whether bs->drv is set. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-08-15 15:07:16 +02:00
Kevin Wolf	857d4f46c3	block: Handle failure for potentially large allocations Some code in the block layer makes potentially huge allocations. Failure is not completely unexpected there, so avoid aborting qemu and handle out-of-memory situations gracefully. This patch addresses bounce buffer allocations in block.c. While at it, convert bdrv_commit() from plain g_malloc() to qemu_try_blockalign(). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-08-15 15:07:15 +02:00

1 2 3 4 5 ...

770 Commits