qemu-e2k

Commit Graph

Author	SHA1	Message	Date
Daniel P. Berrange	4ed3d478c6	i386: rewrite way CPUID index is validated Change the nested if statements into a flat format, to make it clearer what validation / capping is being performed on different CPUID index values. NB this changes behaviour when "index > env->cpuid_xlevel2". This won't have any guest-visible effect because no there is no CPUID[0xC0000001] feature supported by TCG, and KVM code will never call cpu_x86_cpuid() with such an index value. Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Message-Id: <20170509132736.10071-2-berrange@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2017-05-11 10:54:04 -03:00
Kevin Wolf	d541e201bd	Block patches for the block queue. -----BEGIN PGP SIGNATURE----- iQFGBAABCAAwFiEEkb62CjDbPohX0Rgp9AfbAGHVz0AFAlkUWPkSHG1yZWl0ekBy ZWRoYXQuY29tAAoJEPQH2wBh1c9AQuYIAKu7UHbUc3rL4FOErA2kor/Lp55ApCfi WbBaeh06jNPXQ5/S7jie30WAlc0DT+wwfWlFTl7gnYsNmXI3yrBkbEbnZMA2uSEz qz+MEEx81v3BS7MDM5sKjoJVEt76zCS3f7MbrLYLrkdHg3AGo2tfMtIlABFM+I6T lkfxYqosR0pvN8bSPAyoAQvVKYefLdi+O0poG6WruCMcF58dmZn8GzwVfnncGbqz vsd2wcm983S+a7PgGT2VBNXfyBqZ0tHqn/gl5Bc5leZTEhV9DQSh8Z0DKBh+o5Cv 8iyrmklpGdr7sI+YHHkB1zecQxI854HaB/4+Hv8KudHXI9hVmqXyXaQ= =JcaJ -----END PGP SIGNATURE----- Merge remote-tracking branch 'mreitz/tags/pull-block-2017-05-11' into queue-block Block patches for the block queue. # gpg: Signature made Thu May 11 14:28:41 2017 CEST # gpg: using RSA key 0xF407DB0061D5CF40 # gpg: Good signature from "Max Reitz <mreitz@redhat.com>" # Primary key fingerprint: 91BE B60A 30DB 3E88 57D1 1829 F407 DB00 61D5 CF40 * mreitz/tags/pull-block-2017-05-11: (22 commits) MAINTAINERS: Add qemu-progress to the block layer qcow2: Discard/zero clusters by byte count qcow2: Assert that cluster operations are aligned qcow2: Optimize write zero of unaligned tail cluster iotests: Add test 179 to cover write zeroes with unmap iotests: Improve _filter_qemu_img_map qcow2: Optimize zero_single_l2() to minimize L2 churn qcow2: Make distinction between zero cluster types obvious qcow2: Name typedef for cluster type qcow2: Correctly report status of preallocated zero clusters block: Update comments on BDRV_BLOCK_* meanings qcow2: Use consistent switch indentation qcow2: Nicer variable names in qcow2_update_snapshot_refcount() tests: Add coverage for recent block geometry fixes blkdebug: Add ability to override unmap geometries blkdebug: Simplify override logic blkdebug: Add pass-through write_zero and discard support blkdebug: Refactor error injection blkdebug: Sanity check block layer guarantees qemu-io: Switch 'map' output to byte-based reporting ... Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-05-11 14:34:56 +02:00
Max Reitz	8dd30c86dd	MAINTAINERS: Add qemu-progress to the block layer util/qemu-progress.c is currently unmaintained. The only user of its functionality is qemu-img, so it effectively is part of the block layer. Suggested-by: Fam Zheng <famz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20170428165517.30341-1-mreitz@redhat.com Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-05-11 14:28:07 +02:00
Eric Blake	d2cb36af2b	qcow2: Discard/zero clusters by byte count Passing a byte offset, but sector count, when we ultimately want to operate on cluster granularity, is madness. Clean up the external interfaces to take both offset and count as bytes, while still keeping the assertion added previously that the caller must align the values to a cluster. Then rename things to make sure backports don't get confused by changed units: instead of qcow2_discard_clusters() and qcow2_zero_clusters(), we now have qcow2_cluster_discard() and qcow2_cluster_zeroize(). The internal functions still operate on clusters at a time, and return an int for number of cleared clusters; but on an image with 2M clusters, a single L2 table holds 256k entries that each represent a 2M cluster, totalling well over INT_MAX bytes if we ever had a request for that many bytes at once. All our callers currently limit themselves to 32-bit bytes (and therefore fewer clusters), but by making this function 64-bit clean, we have one less place to clean up if we later improve the block layer to support 64-bit bytes through all operations (with the block layer auto-fragmenting on behalf of more-limited drivers), rather than the current state where some interfaces are artificially limited to INT_MAX at a time. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20170507000552.20847-13-eblake@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-05-11 14:28:07 +02:00
Eric Blake	f10ee139ad	qcow2: Assert that cluster operations are aligned We already audited (in commit `0c1bd469`) that qcow2_discard_clusters() is only passed cluster-aligned start values; but we can further tighten the assertion that the only unaligned end value is at EOF. Recent commits have taken advantage of an unaligned tail cluster, for both discard and write zeroes. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20170507000552.20847-12-eblake@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-05-11 14:28:07 +02:00
Eric Blake	fbaa6bb3d3	qcow2: Optimize write zero of unaligned tail cluster We've already improved discards to operate efficiently on the tail of an unaligned qcow2 image; it's time to make a similar improvement to write zeroes. The special case is only valid at the tail cluster of a file, where we must recognize that any sectors beyond the image end would implicitly read as zero, and therefore should not penalize our logic for widening a partial cluster into writing the whole cluster as zero. However, note that for now, the special case of end-of-file is only recognized if there is no backing file, or if the backing file has the same length; that's because when the backing file is shorter than the active layer, we don't have code in place to recognize that reads of a sector unallocated at the top and beyond the backing end-of-file are implicitly zero. It's not much of a real loss, because most people don't use images that aren't cluster-aligned, or where the active layer is a different size than the backing layer (especially where the difference falls within a single cluster). Update test 154 to cover the new scenarios, using two images of intentionally differing length. While at it, fix the test to gracefully skip when run as ./check -qcow2 -o compat=0.10 154 since the older format lacks zero clusters already required earlier in the test. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20170507000552.20847-11-eblake@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-05-11 14:28:07 +02:00
Eric Blake	e249d51952	iotests: Add test 179 to cover write zeroes with unmap No tests were covering write zeroes with unmap. Additionally, I needed to prove that my previous patches for correct status reporting and write zeroes optimizations actually had an impact. The test works for cluster_size between 8k and 2M (for smaller sizes, it fails because our allocation patterns are not contiguous with small clusters - in part, the largest consecutive allocation we tend to get is often bounded by the size covered by one L2 table). Note that testing for zero clusters is tricky: 'qemu-io map' reports whether data comes from the current layer of the image (useful for sniffing out which regions of the file have QCOW_OFLAG_ZERO) - but doesn't show which clusters have mappings; while 'qemu-img map' sees "zero":true for both unallocated and zero clusters for any qcow2 with no backing layer (so less useful at detecting true zero clusters), but reliably shows mappings. So we have to rely on both queries side-by-side at each point of the test. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20170507000552.20847-10-eblake@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-05-11 14:28:07 +02:00
Eric Blake	d9ca2214bd	iotests: Improve _filter_qemu_img_map Although _filter_qemu_img_map documents that it scrubs offsets, it was only doing so for human mode. Of the existing tests using the filter (97, 122, 150, 154, 176), two of them are affected, but it does not hurt the validity of the tests to not require particular mappings (another test, 66, uses offsets but intentionally does not pass through _filter_qemu_img_map, because it checks that offsets are unchanged before and after an operation). Another justification for this patch is that it will allow a future patch to utilize 'qemu-img map --output=json' to check the status of preallocated zero clusters without regards to the mapping (since the qcow2 mapping can be very sensitive to the chosen cluster size, when preallocation is not in use). Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20170507000552.20847-9-eblake@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-05-11 14:28:07 +02:00
Eric Blake	06cc5e2b2d	qcow2: Optimize zero_single_l2() to minimize L2 churn Similar to discard_single_l2(), we should try to avoid dirtying the L2 cache when the cluster we are changing already has the right characteristics. Note that by the time we get to zero_single_l2(), BDRV_REQ_MAY_UNMAP is a requirement to unallocate a cluster (this is because the block layer clears that flag if discard.* flags during open requested that we never punch holes - see the conversation around commit `170f4b2e`, https://lists.gnu.org/archive/html/qemu-devel/2016-09/msg07306.html). Therefore, this patch can only reuse a zero cluster as-is if either unmapping is not requested, or if the zero cluster was not associated with an allocation. Technically, there are some cases where an unallocated cluster already reads as all zeroes (namely, when there is no backing file [easy: check bs->backing], or when the backing file also reads as zeroes [harder: we can't check bdrv_get_block_status since we are already holding the lock]), where the guest would not immediately see a difference if we left that cluster unallocated. But if the user did not request unmapping, leaving an unallocated cluster is wrong; and even if the user DID request unmapping, keeping a cluster unallocated risks a subtle semantic change of guest-visible contents if a backing file is later added, and it is not worth auditing whether all internal uses such as mirror properly avoid an unmap request. Thus, this patch is intentionally limited to just clusters that are already marked as zero. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20170507000552.20847-8-eblake@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-05-11 14:28:07 +02:00
Eric Blake	fdfab37dfe	qcow2: Make distinction between zero cluster types obvious Treat plain zero clusters differently from allocated ones, so that we can simplify the logic of checking whether an offset is present. Do this by splitting QCOW2_CLUSTER_ZERO into two new enums, QCOW2_CLUSTER_ZERO_PLAIN and QCOW2_CLUSTER_ZERO_ALLOC. I tried to arrange the enum so that we could use 'ret <= QCOW2_CLUSTER_ZERO_PLAIN' for all unallocated types, and 'ret >= QCOW2_CLUSTER_ZERO_ALLOC' for allocated types, although I didn't actually end up taking advantage of the layout. In many cases, this leads to simpler code, by properly combining cases (sometimes, both zero types pair together, other times, plain zero is more like unallocated while allocated zero is more like normal). Signed-off-by: Eric Blake <eblake@redhat.com> Message-id: 20170507000552.20847-7-eblake@redhat.com Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-05-11 14:28:07 +02:00
Eric Blake	3ef9521893	qcow2: Name typedef for cluster type Although it doesn't add all that much type safety (this is C, after all), it does add a bit of legibility to use the name QCow2ClusterType instead of a plain int. In particular, qcow2_get_cluster_offset() has an overloaded return type; a QCow2ClusterType on success, and -errno on failure; keeping the cluster type in a separate variable makes it slightly easier for the next patch to make further computations based on the type. Suggested-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com> Message-id: 20170507000552.20847-6-eblake@redhat.com [mreitz: Use the new type in two more places (one of them pulled from the next patch)] Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-05-11 14:28:06 +02:00
Eric Blake	4341df8a83	qcow2: Correctly report status of preallocated zero clusters We were throwing away the preallocation information associated with zero clusters. But we should be matching the well-defined semantics in bdrv_get_block_status(), where (BDRV_BLOCK_ZERO \| BDRV_BLOCK_OFFSET_VALID) informs the user which offset is reserved, while still reminding the user that reading from that offset is likely to read garbage. count_contiguous_clusters_by_type() is now used only for unallocated cluster runs, hence it gets renamed and tightened. Making this change lets us see which portions of an image are zero but preallocated, when using qemu-img map --output=json. The --output=human side intentionally ignores all zero clusters, whether or not they are preallocated. The fact that there is no change to qemu-iotests './check -qcow2' merely means that we aren't yet testing this aspect of qemu-img; a later patch will add a test. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20170507000552.20847-5-eblake@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-05-11 14:28:06 +02:00
Eric Blake	4c41cb4955	block: Update comments on BDRV_BLOCK_* meanings We had some conflicting documentation: a nice 8-way table that described all possible combinations of DATA, ZERO, and OFFSET_VALID, contrasted with text that implied that OFFSET_VALID always meant raw data could be read directly. Furthermore, the text refers a lot to bs->file, even though the interface was updated back in `67a0fd2a` to let the driver pass back a specific BDS (not necessarily bs->file). As the 8-way table is the intended semantics, simplify the rest of the text to get rid of the confusion. ALLOCATED is always set by the block layer for convenience (drivers do not have to worry about it). RAW is used only internally, but by more than the raw driver. Document these additional items on the driver callback. Suggested-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20170507000552.20847-4-eblake@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-05-11 14:28:06 +02:00
Eric Blake	bbd995d830	qcow2: Use consistent switch indentation Fix a couple of inconsistent indentations, before an upcoming patch further tweaks the switch statements. (best viewed with 'git diff -b'). Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20170507000552.20847-3-eblake@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-05-11 14:28:06 +02:00
Eric Blake	b32cbae111	qcow2: Nicer variable names in qcow2_update_snapshot_refcount() In order to keep checkpatch happy when the next patch changes indentation, we first have to shorten some long lines. The easiest approach is to use a new variable in place of 'offset & L2E_OFFSET_MASK', except that 'offset' is the best name for that variable. Change '[old_]offset' to '[old_]entry' to make room. While touching things, also fix checkpatch warnings about unusual 'for' statements. Suggested by Max Reitz <mreitz@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com> Message-id: 20170507000552.20847-2-eblake@redhat.com Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-05-11 14:28:06 +02:00
Eric Blake	40812d9373	tests: Add coverage for recent block geometry fixes Use blkdebug's new geometry constraints to emulate setups that have needed past regression fixes: write zeroes asserting when running through a loopback block device with max-transfer smaller than cluster size, and discard rounding away portions of requests not aligned to preferred boundaries. Also, add coverage that the block layer is honoring max transfer limits. For now, a single iotest performs all actions, with the idea that we can add future blkdebug constraint test cases in the same file; but it can be split into multiple iotests if we find reason to run one portion of the test in more setups than what are possible in the other. For reference, the final portion of the test (checking whether discard passes as much as possible to the lowest layers of the stack) works as follows: qemu-io: discard 30M at 80000001, passed to blkdebug blkdebug: discard 511 bytes at 80000001, -ENOTSUP (smaller than blkdebug's 512 align) blkdebug: discard 14371328 bytes at 80000512, passed to qcow2 qcow2: discard 739840 bytes at 80000512, -ENOTSUP (smaller than qcow2's 1M align) qcow2: discard 13M bytes at 77M, succeeds blkdebug: discard 15M bytes at 90M, passed to qcow2 qcow2: discard 15M bytes at 90M, succeeds blkdebug: discard 1356800 bytes at 105M, passed to qcow2 qcow2: discard 1M at 105M, succeeds qcow2: discard 308224 bytes at 106M, -ENOTSUP (smaller than qcow2's 1M align) blkdebug: discard 1 byte at 111457280, -ENOTSUP (smaller than blkdebug's 512 align) Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20170429191419.30051-10-eblake@redhat.com [mreitz: For cooperation with image locking, add -r to the qemu-io invocation which verifies the image content] Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-05-11 14:28:06 +02:00
Eric Blake	430b26a82d	blkdebug: Add ability to override unmap geometries Make it easier to simulate various unusual hardware setups (for example, recent commits `3482b9b` and `b8d0a98` affect the Dell Equallogic iSCSI with its 15M preferred and maximum unmap and write zero sizing, or `b2f95fe` deals with the Linux loopback block device having a max_transfer of 64k), by allowing blkdebug to wrap any other device with further restrictions on various alignments. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20170429191419.30051-9-eblake@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-05-11 14:28:06 +02:00
Eric Blake	3dc834f879	blkdebug: Simplify override logic Rather than store into a local variable, then copy to the struct if the value is valid, then reporting errors otherwise, it is simpler to just store into the struct and report errors if the value is invalid. This however requires that the struct store a 64-bit number, rather than a narrower type. Likewise, setting a sane errno value in ret prior to the sequence of parsing and jumping to out: on error makes it easier for the next patch to add a chain of similar checks. Signed-off-by: Eric Blake <eblake@redhat.com> Message-id: 20170429191419.30051-8-eblake@redhat.com Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-05-11 14:28:06 +02:00
Eric Blake	63188c2450	blkdebug: Add pass-through write_zero and discard support In order to test the effects of artificial geometry constraints on operations like write zero or discard, we first need blkdebug to manage these actions. It also allows us to inject errors on those operations, just like we can for read/write/flush. We can also test the contract promised by the block layer; namely, if a device has specified limits on alignment or maximum size, then those limits must be obeyed (for now, the blkdebug driver merely inherits limits from whatever it is wrapping, but the next patch will further enhance it to allow specific limit overrides). This patch intentionally refuses to service requests smaller than the requested alignments; this is because an upcoming patch adds a qemu-iotest to prove that the block layer is correctly handling fragmentation, but the test only works if there is a way to tell the difference at artificial alignment boundaries when blkdebug is using a larger-than-default alignment. If we let the blkdebug layer always defer to the underlying layer, which potentially has a smaller granularity, the iotest will be thwarted. Tested by setting up an NBD server with export 'foo', then invoking: $ ./qemu-io qemu-io> open -o driver=blkdebug blkdebug::nbd://localhost:10809/foo qemu-io> d 0 15M qemu-io> w -z 0 15M Pre-patch, the server never sees the discard (it was silently eaten by the block layer); post-patch it is passed across the wire. Likewise, pre-patch the write is always passed with NBD_WRITE (with 15M of zeroes on the wire), while post-patch it can utilize NBD_WRITE_ZEROES (for less traffic). Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20170429191419.30051-7-eblake@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-05-11 14:28:06 +02:00
Eric Blake	d157ed5f72	blkdebug: Refactor error injection Rather than repeat the logic at each caller of checking if a Rule exists that warrants an error injection, fold that logic into inject_error(); and rename it to rule_check() for legibility. This will help the next patch, which adds two more callers that need to check rules for the potential of injecting errors. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20170429191419.30051-6-eblake@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-05-11 14:28:06 +02:00
Eric Blake	e0ef439588	blkdebug: Sanity check block layer guarantees Commits `04ed95f4` and `1a62d0ac` updated the block layer to auto-fragment any I/O to fit within device boundaries. Additionally, when using a minimum alignment of 4k, we want to ensure the block layer does proper read-modify-write rather than requesting I/O on a slice of a sector. Let's enforce that the contract is obeyed when using blkdebug. For now, blkdebug only allows alignment overrides, and just inherits other limits from whatever device it is wrapping, but a future patch will further enhance things. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20170429191419.30051-5-eblake@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-05-11 14:28:06 +02:00
Eric Blake	6f3c90af3c	qemu-io: Switch 'map' output to byte-based reporting Mixing byte offset and sector allocation counts is a bit confusing. Also, reporting n/m sectors, where m decreases according to the remaining size of the file, isn't really adding any useful information; and reporting an offset at both the front and end of the line, with large amounts of whitespace, is pointless. Update the output to use byte counts and shorter lines, then adjust the affected tests (./check -qcow2 102, ./check -vpc 146). Note that 'qemu-io map' is MUCH weaker than 'qemu-img map'; the former only shows which regions of the active layer are allocated, without regards to where the allocation comes from or whether the allocated portion is known to read as zero (because it is using the weaker bdrv_is_allocated()); while the latter (especially in --output=json mode) reports more details from bdrv_get_block_status(). Signed-off-by: Eric Blake <eblake@redhat.com> Message-id: 20170429191419.30051-4-eblake@redhat.com Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-05-11 14:28:06 +02:00
Eric Blake	4401fdc77c	qemu-io: Switch 'alloc' command to byte-based length For the 'alloc' command, accepting an offset in bytes but a length in sectors, and reporting output in sectors, is confusing. Do everything in bytes, and adjust the expected output accordingly. Signed-off-by: Eric Blake <eblake@redhat.com> Message-id: 20170429191419.30051-3-eblake@redhat.com Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-05-11 14:28:05 +02:00
Eric Blake	1bce6b4ce3	qemu-io: Improve alignment checks Several copy-and-pasted alignment checks exist in qemu-io, which could use some minor improvements: - Manual comparison against 0x1ff is not as clean as using our alignment macros (QEMU_IS_ALIGNED) from osdep.h. - The error messages aren't quite grammatically correct. Suggested-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Suggested-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com> Message-id: 20170429191419.30051-2-eblake@redhat.com Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-05-11 14:28:05 +02:00
John Snow	698bdfa07d	blockdev: use drained_begin/end for qmp_block_resize Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1447551 If one tries to issue a block_resize while a guest is busy accessing the disk, it is possible that qemu may deadlock when invoking aio_poll from both the main loop and the iothread. Replace another instance of bdrv_drain_all that doesn't quite belong. Cc: qemu-stable@nongnu.org Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-05-11 12:08:24 +02:00
Christoph Hellwig	c03e7ef12a	nvme: Implement Write Zeroes Signed-off-by: Keith Busch <keith.busch@intel.com> [hch: ported over from qemu-nvme.git to mainline] Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-05-11 12:08:24 +02:00
Anton Nefedov	b91127edd0	qemu-img: wait for convert coroutines to complete On error path (like i/o error in one of the coroutines), it's required to - wait for coroutines completion before cleaning the common structures - reenter dependent coroutines so they ever finish Introduced in `2d9187bc65`. Cc: qemu-stable@nongnu.org Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com> Reviewed-by: Peter Lieven <pl@kamp.de> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-05-11 12:08:24 +02:00
Kevin Wolf	22d5cd82e9	file-posix: Remove .bdrv_inactivate/invalidate_cache Now that the block layer takes care to request a lot less permissions for inactive nodes, the special-casing in file-posix isn't necessary any more. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2017-05-11 12:08:24 +02:00
Kevin Wolf	9c5e6594f1	block: Fix write/resize permissions for inactive images Format drivers for inactive nodes don't need write/resize permissions on their bs->file and can share write/resize with another VM (in fact, this is the whole point of keeping images inactive). Represent this fact in the op blocker system, so that image locking does the right thing without special-casing inactive images. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2017-05-11 12:08:24 +02:00
Kevin Wolf	38701b6aef	block: Inactivate parents before children The proper order for inactivating block nodes is that first the parents get inactivated and then the children. If we do things in this order, we can assert that we didn't accidentally leave a parent activated when one of its child nodes is inactive. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2017-05-11 12:08:24 +02:00
Kevin Wolf	cfa1a5723f	block: Drop permissions when migration completes With image locking, permissions affect other qemu processes as well. We want to be sure that the destination can run, so let's drop permissions on the source when migration completes. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2017-05-11 12:08:24 +02:00
Kevin Wolf	4417ab7adf	block: New BdrvChildRole.activate() for blk_resume_after_migration() Instead of manually calling blk_resume_after_migration() in migration code after doing bdrv_invalidate_cache_all(), integrate the BlockBackend activation with cache invalidation into a single function. This is achieved with a new callback in BdrvChildRole that is called by bdrv_invalidate_cache_all(). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2017-05-11 12:08:24 +02:00
Kevin Wolf	ace21a5875	migration: Unify block node activation error handling Migration code activates all block driver nodes on the destination when the migration completes. It does so by calling bdrv_invalidate_cache_all() and blk_resume_after_migration(). There is one code path for precopy and one for postcopy migration, resulting in four function calls, which used to have three different failure modes. This patch unifies the behaviour so that failure to activate all block nodes is non-fatal, but the error message is logged and the VM isn't automatically started. 'cont' will retry activating the block nodes. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2017-05-11 12:08:24 +02:00
Max Reitz	aa93c834f9	iotests: Extend test 066 066 was supposed to be a test "for discarding preallocated zero clusters", but it did so incompletely: While it did check the image file's integrity after the operation, it did not confirm that the clusters are indeed freed. This patch adds this test. In addition, new cases for writing to preallocated zero clusters are added. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-05-11 12:08:24 +02:00
Max Reitz	293073a56c	qcow2: Discard preallocated zero clusters In discard_single_l2(), we completely discard normal clusters instead of simply turning them into preallocated zero clusters. That means we should probably do the same with such preallocated zero clusters: Discard them instead of keeping them allocated. Reported-by: Eric Blake <eblake@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-05-11 12:08:24 +02:00
Max Reitz	564a6b6938	qcow2: Reuse preallocated zero clusters Instead of just freeing preallocated zero clusters and completely allocating them from scratch, reuse them. We cannot do this in handle_copied(), however, since this is a COW operation. Therefore, we have to add the new logic to handle_alloc() and simply return the existing offset if it exists. The only catch is that we have to convince qcow2_alloc_cluster_link_l2() not to free the old clusters (because we have reused them). Reported-by: Eric Blake <eblake@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-05-11 12:08:24 +02:00
Max Reitz	92413c16be	qcow2: Fix preallocation size formula When calculating the number of reftable entries, we should actually use the number of refblocks and not (wrongly[1]) re-calculate it. [1] "Wrongly" means: Dividing the number of clusters by the number of entries per refblock and rounding down instead of up. Reported-by: Eric Blake <eblake@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-05-11 12:08:24 +02:00
Fam Zheng	de9efdb334	tests: Add POSIX image locking test case 182 Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-05-11 12:08:20 +02:00
Fam Zheng	ba8980784d	qemu-iotests: Add test case 153 for image locking Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-05-11 11:15:32 +02:00
Fam Zheng	244a566810	file-posix: Add image locking to perm operations This extends the permission bits of op blocker API to external using Linux OFD locks. Each permission in @perm and @shared_perm is represented by a locked byte in the image file. Requesting a permission in @perm is translated to a shared lock of the corresponding byte; rejecting to share the same permission is translated to a shared lock of a separate byte. With that, we use 2x number of bytes of distinct permission types. virtlockd in libvirt locks the first byte, so we do locking from a higher offset. Suggested-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-05-11 11:15:32 +02:00
Fam Zheng	e8c1094a0e	osdep: Fall back to posix lock when OFD lock is unavailable Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-05-11 11:15:32 +02:00
Fam Zheng	13461fdba6	osdep: Add qemu_lock_fd and qemu_unlock_fd They are wrappers of POSIX fcntl "file private locking", with a convenient "try lock" wrapper implemented with F_OFD_GETLK. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-05-11 11:15:32 +02:00
Fam Zheng	fc0932fdcf	block: Reuse bs as backing hd for drive-backup sync=none Opening the backing image for the second time is bad, especially here when it is also in use as the active image as the source. The drive-backup job itself doesn't read from target->backing for COW, instead it gets data from the write notifier, so it's not a big problem. However, exporting the target to NBD etc. won't work, because of the likely stale metadata cache. Use BDRV_O_NO_BACKING in this case and manually set up the backing BdrvChild. Cc: qemu-stable@nongnu.org Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-05-11 11:15:32 +02:00
Fam Zheng	9c77fec2d3	tests: Disable image lock in test-replication The COLO block replication architecture requires one disk to be shared between primary and secondary, in the test both processes use posix file protocol (instead of over NBD) so it is affected by image locking. Disable the lock. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-05-11 11:08:41 +02:00
Fam Zheng	1c3a555c35	file-win32: Error out if locking=on We share the same set of QAPI options with file-posix, but locking is not supported here. So error out if it is specified as 'on' for now. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-05-11 11:08:41 +02:00
Fam Zheng	16b48d5d66	file-posix: Add 'locking' option Making this option available even before implementing it will let converting tests easier: in coming patches they can specify the option already when necessary, before we actually write code to lock the images. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-05-11 11:08:40 +02:00
Fam Zheng	2420d369a2	tests: Use null-co:// instead of /dev/null as the dummy image Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-05-11 11:08:40 +02:00
Fam Zheng	7ceb4fc114	iotests: 172: Use separate images for multiple devices To avoid image lock failures. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-05-11 11:08:40 +02:00
Fam Zheng	8b084489b0	iotests: 091: Quit QEMU before checking image Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-05-11 11:08:40 +02:00
Fam Zheng	d5b8336a62	iotests: 087: Don't attach test image twice The test scenario doesn't require the same image, instead it focuses on the duplicated node-name, so use null-co to avoid locking conflict. Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-05-11 11:08:40 +02:00

... 2 3 4 5 6 ...

53360 Commits All Branches Search

53360 Commits

All Branches