qemu-e2k

Author	SHA1	Message	Date
Kevin Wolf	6c3944dc62	qcow2: Implement data-file-raw create option Provide an option to force QEMU to always keep the external data file consistent as a standalone read-only raw image. At the moment, this means making sure that write_zeroes requests are forwarded to the data file instead of just updating the metadata, and checking that no backing file is used. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-03-08 12:26:46 +01:00
Kevin Wolf	9b890bdcb6	qcow2: Store data file name in the image Rather than requiring that the external data file node is passed explicitly when creating the qcow2 node, store the filename in the designated header extension during .bdrv_create and read it from there as a default during .bdrv_open. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-03-08 12:26:46 +01:00
Kevin Wolf	dcc98687f8	qcow2: Creating images with external data file This adds a .bdrv_create option to use an external data file. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-03-08 12:26:46 +01:00
Kevin Wolf	0e8c08be27	qcow2: Add basic data-file infrastructure This adds a .bdrv_open option to specify the external data file node. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-03-08 12:26:46 +01:00
Kevin Wolf	e9f5b6deaa	qcow2: Support external data file in qemu-img check For external data files, data clusters must be excluded from the refcount calculations. Instead, an implicit refcount of 1 is assumed for the COPIED flag. Compressed clusters and internal snapshots are incompatible with external data files, so print an error if they are in use for images with an external data file. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-03-08 12:26:46 +01:00
Kevin Wolf	aa8b34c1b2	qcow2: Return error for snapshot operation with data file Internal snapshots and an external data file are incompatible because snapshots require refcounting and non-linear mapping. Return an error for all of the snapshot operations if an external data file is in use. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-03-08 12:26:46 +01:00
Kevin Wolf	966b000f49	qcow2: External file I/O This changes the qcow2 implementation to direct all guest data I/O to s->data_file rather than bs->file, while metadata I/O still uses bs->file. At the moment, this is still always the same, but soon we'll add options to set s->data_file to an external data file. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-03-08 12:26:46 +01:00
Kevin Wolf	37be14036b	qcow2: Prepare qcow2_co_block_status() for data file Offset 0 cannot be assumed to mean an unallocated cluster any more. Instead, the cluster type needs to be checked. *file must refer to the data file instead of the image file if a valid offset is returned from qcow2_co_block_status(). Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-03-08 12:26:45 +01:00
Kevin Wolf	77e023ff79	qcow2: Return 0/-errno in qcow2_alloc_compressed_cluster_offset() qcow2_alloc_compressed_cluster_offset() used to return the cluster offset for success and 0 for error. This doesn't only conflict with 0 as a valid host offset, but also loses the error code. Similar to the change made to qcow2_alloc_cluster_offset() for uncompressed clusters in commit `148da7ea9d`, make the function return 0/-errno and return the allocated cluster offset in a by-reference parameter. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-03-08 12:26:45 +01:00
Kevin Wolf	c6d619cc12	qcow2: Don't assume 0 is an invalid cluster offset The cluster allocation code uses 0 as an invalid offset that is used in case of errors or as "offset not yet determined". With external data files, a host cluster offset of 0 becomes valid, though. Define a constant INV_OFFSET (which is not cluster aligned and will therefore never be a valid offset) that can be used for such purposes. This removes the additional host_offset == 0 check that commit `ff52aab2df` introduced; the confusion between an invalid offset and (erroneous) allocation at offset 0 is removed with this change. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-03-08 12:26:45 +01:00
Kevin Wolf	b8c8353a38	qcow2: Prepare count_contiguous_clusters() for external data file Offset 0 can be valid for normal (allocated) clusters now, so use qcow2_get_cluster_type() instead. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-03-08 12:26:45 +01:00
Kevin Wolf	a4ea184d8a	qcow2: Prepare qcow2_get_cluster_type() for external data file Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-03-08 12:26:45 +01:00
Kevin Wolf	808c2bb4c4	qcow2: Pass bs to qcow2_get_cluster_type() Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-03-08 12:26:45 +01:00
Kevin Wolf	93c2493646	qcow2: Basic definitions for external data files This adds basic constants, struct fields and helper function for external data file support to the implementation. QCOW2_INCOMPAT_MASK and QCOW2_AUTOCLEAR_MASK are not updated yet so that opening images with an external data file still fails (we don't handle them correctly yet). Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-03-08 12:26:45 +01:00
Kevin Wolf	c5e86ebc11	qcow2: Simplify preallocation code Image creation already involves a bdrv_co_truncate() call, which allows to specify a preallocation mode. Just pass the right mode there and remove the code that is made redundant by this. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-03-08 12:26:45 +01:00
Alberto Garcia	af39bd0d9a	qcow2: Default to 4KB for the qcow2 cache entry size QEMU 2.12 (commit `1221fe6f63`) introduced a new setting called l2-cache-entry-size that allows making entries on the qcow2 L2 cache smaller than the cluster size. I have been performing several tests with different cluster and entry sizes and all of them show that reducing the entry size (aka L2 slice) consistently improves I/O performance, notably during random I/O (all tests done with sequential I/O show similar results). This is to be expected because loading and evicting an L2 slice is more expensive the larger the slice is. Here are some numbers on fully populated 40GB qcow2 images. The rightmost column represents the maximum L2 cache size in both cases. Cluster size = 64 KB \|-------------+--------------+--------------+--------------\| \| \| 1MB L2 cache \| 3MB L2 cache \| 5MB L2 cache \| \|-------------+--------------+--------------+--------------\| \| 4KB slices \| 6545 IOPS \| 12045 IOPS \| 55680 IOPS \| \| 16KB slices \| 5177 IOPS \| 9798 IOPS \| 56278 IOPS \| \| 64KB slices \| 2718 IOPS \| 5326 IOPS \| 57355 IOPS \| \|-------------+--------------+--------------+--------------\| Cluster size = 256 KB \|--------------+----------------+--------------+-----------------\| \| \| 512KB L2 cache \| 1MB L2 cache \| 1280KB L2 cache \| \|--------------+----------------+--------------+-----------------\| \| 4KB slices \| 8539 IOPS \| 21071 IOPS \| 55417 IOPS \| \| 64KB slices \| 3598 IOPS \| 9772 IOPS \| 57687 IOPS \| \| 256KB slices \| 1415 IOPS \| 4120 IOPS \| 58001 IOPS \| \|--------------+----------------+--------------+-----------------\| As can be seen in the numbers, the only exception to the rule is when the cache is large enough to hold all L2 tables. This is also to be expected because in this case no cache entry is ever evicted so reducing its size doesn't bring any benefit. This patch sets the default L2 cache entry size to 4KB except when the cache is large enough for the whole disk. Signed-off-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-03-08 12:26:45 +01:00
Peter Maydell	adf2e451f3	Block layer patches: - Block graph change fixes (avoid loops, cope with non-tree graphs) - bdrv_set_aio_context() related fixes - HMP snapshot commands: Use only tag, not the ID to identify snapshots - qmeu-img, commit: Error path fixes - block/nvme: Build fix for gcc 9 - MAINTAINERS updates - Fix various issues with bdrv_refresh_filename() - Fix various iotests - Include LUKS overhead in qemu-img measure for qcow2 - A fix for vmdk's image creation interface -----BEGIN PGP SIGNATURE----- iQIcBAABAgAGBQJcc/knAAoJEH8JsnLIjy/WptQP/3F8Lh52H4egXaP7NUUuDjQM AhqhuDAp/EZBS+xim9kLTogNJADe/rMWdSX/YB5aLpSPYbjasC66NgaLhd6QewgQ VIcsLUdlYAyZ5ZjJytimfMTLwm1X02RmVIe55y52DTY8LlfViZzOlf3qwqPm00ao EJB2cl8UJLM+PVEu59cCw3R0/06LY+WIJRB32d3tnCBRTkaJwfR9h4lrp/juVcFZ U+2eWU68KMbUHSYiWANowN+KRV3uPY4HVA98v3F0vDmcBxlVHOeBg6S+PcT7tK8p huzCMwcdwUyPMJgVs/+WBtUnbG0jN6SHUYmFLz859UMVgBnCw5tzBMf8qw1wOA4A Iw+zor27Pxj4IlxcLPp5f97YZ8k9acdMR2VKPH6xLJZ1JF+sKa54RfzESd5EJeIj Mfcp773H0lIaWcFJ6RY1F0L1E1ta7QigwNBiWMdYfh0a0EWHnDvGyYeaSPYEQ+rl e8bZOcfrYwVI7DTDiZOIkGA9D8DXEPDNp+sl6s1DxeY69D0NNaXTtCPqFNNAbFbd 20uD7yDRZlWq32cQB/K9D5cSkZRSOzdUpLfLU3nQU2+dz11x6OpM6m7DVboSrztD 1HtPPDzDEvH5dOP7ibd60s+ntjkSiNfNkUgnuVrBE/d/PocC1eHHpZt5V7f43Ofb RxVwH5+smzQ9nsNBfQR0 =gaah -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging Block layer patches: - Block graph change fixes (avoid loops, cope with non-tree graphs) - bdrv_set_aio_context() related fixes - HMP snapshot commands: Use only tag, not the ID to identify snapshots - qmeu-img, commit: Error path fixes - block/nvme: Build fix for gcc 9 - MAINTAINERS updates - Fix various issues with bdrv_refresh_filename() - Fix various iotests - Include LUKS overhead in qemu-img measure for qcow2 - A fix for vmdk's image creation interface # gpg: Signature made Mon 25 Feb 2019 14:18:15 GMT # gpg: using RSA key 7F09B272C88F2FD6 # gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" [full] # Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6 * remotes/kevin/tags/for-upstream: (71 commits) iotests: Skip 211 on insufficient memory vmdk: false positive of compat6 with hwversion not set iotests: add LUKS payload overhead to 178 qemu-img measure test qcow2: include LUKS payload overhead in qemu-img measure iotests.py: s/_/-/g on keys in qmp_log() iotests: Let 045 be run concurrently iotests: Filter SSH paths iotests.py: Filter filename in any string value iotests.py: Add is_str() iotests: Fix 207 to use QMP filters for qmp_log iotests: Fix 232 for LUKS iotests: Remove superfluous rm from 232 iotests: Fix 237 for Python 2.x iotests: Re-add filename filters iotests: Test json:{} filenames of internal BDSs block: BDS options may lack the "driver" option block/null: Generate filename even with latency-ns block/curl: Implement bdrv_refresh_filename() block/curl: Harmonize option defaults block/nvme: Fix bdrv_refresh_filename() ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2019-02-26 19:04:47 +00:00
yuchenlin	26c9296c31	vmdk: false positive of compat6 with hwversion not set In vmdk_co_create_opts, when it finds hw_version is undefined, it will set it to 4, which misleading the compat6 and hwversion in vmdk_co_do_create. Simply set hw_version to NULL after free, let the logic in vmdk_co_do_create to decide the value of hw_version. This bug can be reproduced by: $ qemu-img convert -O vmdk -o subformat=streamOptimized,compat6 /home/yuchenlin/syno.qcow2 /home/yuchenlin/syno.vmdk qemu-img: /home/yuchenlin/syno.vmdk: error while converting vmdk: compat6 cannot be enabled with hwversion set Signed-off-by: yuchenlin <yuchenlin@synology.com> Message-id: 20190221110805.28239-1-yuchenlin@synology.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-02-25 15:11:28 +01:00
Stefan Hajnoczi	61914f8906	qcow2: include LUKS payload overhead in qemu-img measure LUKS encryption reserves clusters for its own payload data. The size of this area must be included in the qemu-img measure calculation so that we arrive at the correct minimum required image size. (Ab)use the qcrypto_block_create() API to determine the payload overhead. We discard the payload data that qcrypto thinks will be written to the image. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20190218104525.23674-2-stefanha@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-02-25 15:11:28 +01:00
Max Reitz	1e47cb7f52	block/null: Generate filename even with latency-ns While we cannot represent the latency-ns option in a filename, it is not a strong option so not being able to should not stop us from generating a filename nonetheless. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-id: 20190201192935.18394-30-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-02-25 15:11:27 +01:00
Max Reitz	937c007b6e	block/curl: Implement bdrv_refresh_filename() Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-id: 20190201192935.18394-29-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-02-25 15:11:27 +01:00
Max Reitz	712b64e8f3	block/curl: Harmonize option defaults Both of the defaults we currently have in the curl driver are named based on a slightly different schema, let's unify that and call both CURL_BLOCK_OPT_${NAME}_DEFAULT. While at it, we can add a macro for the third option for which a default exists, namely "sslverify". Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-id: 20190201192935.18394-28-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-02-25 15:11:27 +01:00
Max Reitz	cc61b0740f	block/nvme: Fix bdrv_refresh_filename() Currently, nvme's bdrv_refresh_filename() is an exact copy of null's implementation. However, for null, "null-co://" and "null-aio://" are indeed valid filenames -- for nvme, they are not, as a device address is still required. The correct implementation should generate a filename of the form "nvme://[PCI address]/[namespace]" (as the comment above nvme_parse_filename() describes). Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-id: 20190201192935.18394-27-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-02-25 15:11:27 +01:00
Max Reitz	998b3a1e5a	block: Purify .bdrv_refresh_filename() Currently, BlockDriver.bdrv_refresh_filename() is supposed to both refresh the filename (BDS.exact_filename) and set BDS.full_open_options. Now that we have generic code in the central bdrv_refresh_filename() for creating BDS.full_open_options, we can drop the latter part from all BlockDriver.bdrv_refresh_filename() implementations. This also means that we can drop all of the existing default code for this from the global bdrv_refresh_filename() itself. Furthermore, we now have to call BlockDriver.bdrv_refresh_filename() after having set BDS.full_open_options, because the block driver's implementation should now be allowed to depend on BDS.full_open_options being set correctly. Finally, with this patch we can drop the @options parameter from BlockDriver.bdrv_refresh_filename(); also, add a comment on this function's purpose in block/block_int.h while touching its interface. This completely obsoletes blklogwrite's implementation of .bdrv_refresh_filename(). Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20190201192935.18394-25-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-02-25 15:11:27 +01:00
Max Reitz	abc521a9aa	block: Add BlockDriver.bdrv_gather_child_options Some follow-up patches will rework the way bs->full_open_options is refreshed in bdrv_refresh_filename(). The new implementation will remove the need for the block drivers' bdrv_refresh_filename() implementations to set bs->full_open_options; instead, it will be generic and use static information from each block driver. However, by implementing bdrv_gather_child_options(), block drivers will still be able to override the way the full_open_options of their children are incorporated into their own. We need to implement this function for VMDK because we have to prevent the generic implementation from gathering the options of all children: It is not possible to specify options for the extents through the runtime options. For quorum, the child names that would be used by the generic implementation and the ones that we actually (currently) want to use differ. See quorum_gather_child_options() for more information. Note that both of these are cases which are not ideal: In case of VMDK it would probably be nice to be able to specify options for all extents. In case of quorum, the current runtime option structure is simply broken and needs to be fixed (but that is left for another patch). Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-id: 20190201192935.18394-23-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-02-25 15:11:27 +01:00
Max Reitz	2654267cc1	block: Add strong_runtime_opts to BlockDriver This new field can be set by block drivers to list the runtime options they accept that may influence the contents of the respective BDS. As of a follow-up patch, this list will be used by the common bdrv_refresh_filename() implementation to decide which options to put into BDS.full_open_options (and consequently whether a JSON filename has to be created), thus freeing the drivers of having to implement that logic themselves. Additionally, this patch adds the field to all of the block drivers that need it and sets it accordingly. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-id: 20190201192935.18394-22-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-02-25 15:11:27 +01:00
Max Reitz	0dcbc54a95	block/nfs: Implement bdrv_dirname() While the basic idea is obvious and could be handled by the default bdrv_dirname() implementation, we cannot generate a directory name if the gid or uid are set, so we have to explicitly return NULL in those cases. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-id: 20190201192935.18394-19-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-02-25 15:11:26 +01:00
Max Reitz	8a6239c071	block/nbd: Make bdrv_dirname() return NULL The generic bdrv_dirname() implementation would be able to generate some form of directory name for many NBD nodes, but it would be always wrong. Therefore, we have to explicitly make it an error (until NBD has some form of specification for export paths, if it ever will). Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 20190201192935.18394-18-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-02-25 15:11:26 +01:00
Max Reitz	f3037bd254	quorum: Make bdrv_dirname() return NULL While the common implementation for bdrv_dirname() should return NULL for quorum BDSs already (because they do not have a file node and their exact_filename field should be empty), there is no reason not to make that explicit. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-id: 20190201192935.18394-17-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-02-25 15:11:26 +01:00
Max Reitz	27953572a5	blkverify: Make bdrv_dirname() return NULL blkverify's BDSs have a file BDS, but we do not want this to be preferred over the raw node. There is no way to decide between the two (and not really a reason to, either), so just return NULL in blkverify's implementation of bdrv_dirname(). Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-id: 20190201192935.18394-16-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-02-25 15:11:26 +01:00
Max Reitz	6b6833c1b4	block: bdrv_get_full_backing_filename's ret. val. Make bdrv_get_full_backing_filename() return an allocated string instead of placing the result in a caller-provided buffer. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-id: 20190201192935.18394-12-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-02-25 15:11:26 +01:00
Max Reitz	645ae7d88e	block: bdrv_get_full_backing_filename_from_...'s ret. val. Make bdrv_get_full_backing_filename_from_filename() return an allocated string instead of placing the result in a caller-provided buffer. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20190201192935.18394-11-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-02-25 15:11:26 +01:00
Max Reitz	009b03aaa2	block: Make path_combine() return the path Besides being safe for arbitrary path lengths, after some follow-up patches all callers will want a freshly allocated buffer anyway. In the meantime, path_combine_deprecated() is added which has the same interface as path_combine() had before this patch. All callers to that function will be converted in follow-up patches. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-id: 20190201192935.18394-10-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-02-25 15:11:26 +01:00
Max Reitz	998c201923	block: Add BDS.auto_backing_file If the backing file is overridden, this most probably does change the guest-visible data of a BDS. Therefore, we will need to consider this in bdrv_refresh_filename(). To see whether it has been overridden, we might want to compare bs->backing_file and bs->backing->bs->filename. However, bs->backing_file is changed by bdrv_set_backing_hd() (which is just used to change the backing child at runtime, without modifying the image header), so bs->backing_file most of the time simply contains a copy of bs->backing->bs->filename anyway, so it is useless for such a comparison. This patch adds an auto_backing_file BDS field which contains the backing file path as indicated by the image header, which is not changed by bdrv_set_backing_hd(). Because of bdrv_refresh_filename() magic, however, a BDS's filename may differ from what has been specified during bdrv_open(). Then, the comparison between bs->auto_backing_file and bs->backing->bs->filename may fail even though bs->backing was opened from bs->auto_backing_file. To mitigate this, we can copy the real BDS's filename (after the whole bdrv_open() and bdrv_refresh_filename() process) into bs->auto_backing_file, if we know the former has been opened based on the latter. This is only possible if no options modifying the backing file's behavior have been specified, though. To simplify things, this patch only copies the filename from the backing file if no options have been specified for it at all. Furthermore, there are cases where an overlay is created by qemu which already contains a BDS's filename (e.g. in blockdev-snapshot-sync). We do not need to worry about updating the overlay's bs->auto_backing_file there, because we actually wrote a post-bdrv_refresh_filename() filename into the image header. So all in all, there will be false negatives where (as of a future patch) bdrv_refresh_filename() will assume that the backing file differs from what was specified in the image header, even though it really does not. However, these cases should be limited to where (1) the user actually did override something in the backing chain (e.g. by specifying options for the backing file), or (2) the user executed a QMP command to change some node's backing file (e.g. change-backing-file or block-commit with @backing-file given) where the given filename does not happen to coincide with qemu's idea of the backing BDS's filename. Then again, (1) really is limited to -drive. With -blockdev or blockdev-add, you have to adhere to the schema, so a user cannot give partial "unimportant" options (e.g. by just setting backing.node-name and leaving the rest to the image header). Therefore, trying to fix this would mean trying to fix something for -drive only. To improve on (2), we would need a full infrastructure to "canonicalize" an arbitrary filename (+ options), so it can be compared against another. That seems a bit over the top, considering that filenames nowadays are there mostly for the user's entertainment. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-id: 20190201192935.18394-5-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-02-25 15:11:25 +01:00
Max Reitz	e24518e303	block: Use children list in bdrv_refresh_filename bdrv_refresh_filename() should invoke itself recursively on all children, not just on file. With that change, we can remove the manual invocations in blkverify, quorum, commit, mirror, and blklogwrites. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-id: 20190201192935.18394-3-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-02-25 15:11:25 +01:00
Max Reitz	f30c66ba6e	block: Use bdrv_refresh_filename() to pull Before this patch, bdrv_refresh_filename() is used in a pushing manner: Whenever the BDS graph is modified, the parents of the modified edges are supposed to be updated (recursively upwards). However, that is nonviable, considering that we want child changes not to concern parents. Also, in the long run we want a pull model anyway: Here, we would have a bdrv_filename() function which returns a BDS's filename, freshly constructed. This patch is an intermediate step. It adds bdrv_refresh_filename() calls before every place a BDS.filename value is used. The only exceptions are protocol drivers that use their own filename, which clearly would not profit from refreshing that filename before. Also, bdrv_get_encrypted_filename() is removed along the way (as a user of BDS.filename), since it is completely unused. In turn, all of the calls to bdrv_refresh_filename() before this patch are removed, because we no longer have to call this function on graph changes. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20190201192935.18394-2-mreitz@redhat.com Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-02-25 15:11:25 +01:00
Thomas Huth	83c68e149a	block/nvme: Remove QEMU_PACKED from naturally aligned NVMeRegs struct The QEMU_PACKED is causing a compiler warning/error with GCC 9: CC block/nvme.o block/nvme.c: In function ‘nvme_create_queue_pair’: block/nvme.c:209:22: error: taking address of packed member of ‘struct <anonymous>’ may result in an unaligned pointer value [-Werror=address-of-packed-member] 209 \| q->sq.doorbell = &s->regs->doorbells[idx * 2 * s->doorbell_scale]; All members of the struct are naturally aligned, so there should not be the need for QEMU_PACKED here, and the following QEMU_BUILD_BUG_ON also ensures that there is no padding. Thus simply remove the QEMU_PACKED here. Buglink: https://bugs.launchpad.net/qemu/+bug/1817525 Reported-by: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com> Signed-off-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-02-25 15:09:48 +01:00
Alberto Garcia	c1c4399084	qcow2: Assert that L2 table offsets fit in the L1 table L1 table entries have a field to store the offset of an L2 table. The rest of the bits of the entry are currently reserved except from bit 63, which stores the COPIED flag. The offset is always taken from the entry using L1E_OFFSET_MASK to ensure that we only use the bits that belong to that field. While that mask is used every time we read from the L1 table, it is never used when we write to it. Due to the limits set elsewhere in the code QEMU can never produce L2 table offsets that don't fit in that field so any such offset when allocating an L2 table would indicate a bug in QEMU. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-02-25 15:05:23 +01:00
Kevin Wolf	28e0b2d2e1	nbd: Increase bs->in_flight during AioContext switch bdrv_drain() must not leave connection_co scheduled, so bs->in_flight needs to be increased while the coroutine is waiting to be scheduled in the new AioContext after nbd_client_attach_aio_context(). Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-02-25 15:03:19 +01:00
Kevin Wolf	d3bd5b9089	nbd: Use low-level QIOChannel API in nbd_read_eof() Instead of using the convenience wrapper qio_channel_read_all_eof(), use the lower level QIOChannel API. This means duplicating some code, but we'll need this because this coroutine yield is special: We want it to be interruptible so that nbd_client_attach_aio_context() can correctly reenter the coroutine. This moves the bdrv_dec/inc_in_flight() pair into nbd_read_eof(), so that connection_co will always sit in this exact qio_channel_yield() call when bdrv_drain() returns. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2019-02-25 15:03:19 +01:00
Kevin Wolf	5ad81b4946	nbd: Restrict connection_co reentrance nbd_client_attach_aio_context() schedules connection_co in the new AioContext and this way reenters it in any arbitrary place that has yielded. We can restrict this a bit to the function call where the coroutine actually sits waiting when it's idle. This doesn't solve any bug yet, but it shows where in the code we need to support this random reentrance and where we don't have to care. Add FIXME comments for the existing bugs that the rest of this series will fix. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2019-02-25 15:03:19 +01:00
Kevin Wolf	c90e2a9cfd	block-backend: Make blk_inc/dec_in_flight public For some users of BlockBackends, just increasing the in_flight counter is easier than implementing separate handlers in BlockDevOps. Make the helper functions for this public. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-02-25 15:03:19 +01:00
Alberto Garcia	2468eed3be	commit: Replace commit_top_bs on failure after deleting the block job If there's an error in commit_start() then the block job must be deleted before replacing commit_top_bs, otherwise it will fail because of lack of permissions. This happens since the permission system was introduced in `8dfba27977`. Fortunately this bug doesn't seem to be possible to reproduce at the moment without changing the code. Signed-off-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-02-25 15:03:19 +01:00
Daniel Henrique Barboza	161e612d20	qcow2-snapshot: remove redundant find_snapshot_by_id_and_name call In qcow2_snapshot_create there is the following code block: /* Generate an ID / find_new_snapshot_id(bs, sn_info->id_str, sizeof(sn_info->id_str)); / Check that the ID is unique */ if (find_snapshot_by_id_and_name(bs, sn_info->id_str, NULL) >= 0) { return -EEXIST; } find_new_snapshot_id cycles through all snapshots, getting the id_str as an unsigned long int, calculating the max id_max value of all the existing id_strs and writing in the id_str pointer id_max + 1: for(i = 0; i < s->nb_snapshots; i++) { sn = s->snapshots + i; id = strtoul(sn->id_str, NULL, 10); if (id > id_max) id_max = id; } snprintf(id_str, id_str_size, "%lu", id_max + 1); Here, sn_info->id_str will have the unique value id_max + 1. Right after that, find_snapshot_by_id_and_name is called with id = sn_info->id_str and name = NULL. This will cause the function to execute the following: } else if (id) { for (i = 0; i < s->nb_snapshots; i++) { if (!strcmp(s->snapshots[i].id_str, id)) { return i; } } } In short, we're searching the existing snapshots to see if sn_info->id_str matches any existing id, right after we set in the previous line a sn_info->id_str value that is already unique. The first code block goes way back to commit `585f8587ad`, a 2006 commit from Fabrice Bellard that simply says "new qcow2 disk image format". No more info is provided about this logic in any subsequent commits that moved this code block around. I can't say about the original design, but the current logic is redundant. bdrv_snapshot_create is called in aio_context lock, forbidding any concurrent call to accidentally create a new snapshot between the find_new_snapshot_id and find_snapshot_by_id_and_name calls. What we're ending up doing is to cycle through the snapshots two times for no viable reason. This patch eliminates the redundancy by removing the 'id is unique' check that calls find_snapshot_by_id_and_name. Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-02-25 15:03:19 +01:00
Daniel Henrique Barboza	8c04093c8c	block/snapshot: remove bdrv_snapshot_delete_by_id_or_name After the previous patch, the only instance of this function left is inside qemu-img.c. qemu-img is using it inside the 'img_snapshot' function to delete snapshots in the SNAPSHOT_DELETE case, based on a "snapshot_name" string that refers to the tag, not ID, of the QEMUSnapshotInfo struct. This can be verified by checking the SNAPSHOT_CREATE case that comes shortly before SNAPSHOT_DELETE. In that case, the same "snapshot_name" variable is being strcpy to the 'name' field of the QEMUSnapshotInfo struct sn: pstrcpy(sn.name, sizeof(sn.name), snapshot_name); Based on that, it is unlikely that "snapshot_name" might contain an "id" in SNAPSHOT_DELETE. This patch changes SNAPSHOT_DELETE to use snapshot_find() and snapshot_delete() instead of bdrv_snapshot_delete_by_id_or_name. After that, there is no instances left of bdrv_snapshot_delete_by_id_or_name in the code, so it is safe to remove it entirely. Suggested-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-02-25 15:03:18 +01:00
Daniel Henrique Barboza	6ca080453e	block/snapshot.c: eliminate use of ID input in snapshot operations At this moment, QEMU attempts to create/load/delete snapshots by using either an ID (id_str) or a name. The problem is that the code isn't consistent of whether the entered argument is an ID or a name, causing unexpected behaviors. For example, when creating snapshots via savevm <arg>, what happens is that "arg" is treated as both name and id_str. In a guest without snapshots, create a single snapshot via savevm: (qemu) savevm 0 (qemu) info snapshots List of snapshots present on all disks: ID TAG VM SIZE DATE VM CLOCK -- 0 741M 2018-07-31 13:39:56 00:41:25.313 A snapshot with name "0" is created. ID is hidden from the user, but the ID is a non-zero integer that starts at "1". Thus, this snapshot has id_str=1, TAG="0". Creating a second snapshot with arg = 1, the first one is deleted: (qemu) savevm 1 (qemu) info snapshots List of snapshots present on all disks: ID TAG VM SIZE DATE VM CLOCK -- 1 741M 2018-07-31 13:42:14 00:41:55.252 What happened? - when creating the second snapshot, a verification is done inside bdrv_all_delete_snapshot to delete any existing snapshots that matches an string argument. Here, the code calls bdrv_all_delete_snapshot("1", ...); - bdrv_all_delete_snapshot calls bdrv_snapshot_find(..., "1") for each BlockDriverState of the guest. And this is where things goes tilting: bdrv_snapshot_find does a search by both id_str and name. It finds out that there is a snapshot that has id_str = 1, stores a reference to the snapshot in the sn_info pointer and then returns match found; - since a match was found, a call to bdrv_snapshot_delete_by_id_or_name() is made. This function ignores the pointer written by bdrv_snapshot_find. Instead, it deletes the snapshot using bdrv_snapshot_delete() calling it first with id_str = 1. If it fails to delete, then it calls it again with name = 1. - after all that, QEMU creates the new snapshot, that has id_str = 1 and name = 1. The user is left wondering that happened with the first snapshot created. Similar bugs can be triggered when using loadvm and delvm. Before contemplating discarding the use of ID input in these operations, I've searched the code of what would be the implications. My findings are: - the RBD and Sheepdog drivers don't care. Both uses the 'name' field as key in their logic, making id_str = name when appropriate. replay-snapshot.c does not make any special use of id_str; - qcow2 uses id_str as an unique identifier but it is automatically calculated, not being influenced by user input. Other than that, there are no distinguish operations made only with id_str; - in blockdev.c, the delete operation uses a match of both id_str AND name. Given that id_str is either a copy of 'name' or auto-generated, we're fine here. This gives motivation to not consider ID as a valid user input in HMP commands - sticking with 'name' input only is more consistent. To accomplish that, the following changes were made in this patch: - bdrv_snapshot_find() does not match for id_str anymore, only 'name'. The function is called in save_snapshot(), load_snapshot(), bdrv_all_delete_snapshot() and bdrv_all_find_snapshot(). This change makes the search function more predictable and does not change the behavior of any underlying code that uses these affected functions, which are related to HMP (which is fine) and the main loop inside vl.c (which doesn't care about it anyways); - bdrv_all_delete_snapshot() does not call bdrv_snapshot_delete_by_id_or_name anymore. Instead, it uses the pointer returned by bdrv_snapshot_find to erase the snapshot with the exact match of id_str an name. This function is called in save_snapshot and hmp_delvm, thus this change produces the intended effect; - documentation changes to reflect the new behavior. I consider this to be an API fix instead of an API change - the user was already creating snapshots using 'name', but now he/she will also enjoy a consistent behavior. Ideally we would get rid of the id_str field entirely, but this would have repercussions on existing snapshots. Another day perhaps. Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-02-25 15:03:18 +01:00
Vladimir Sementsov-Ogievskiy	199d95b043	block/vmdk: use qemu_iovec_init_buf Use new qemu_iovec_init_buf() instead of qemu_iovec_init_external( ... , 1), which simplifies the code. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20190218140926.333779-12-vsementsov@virtuozzo.com Message-Id: <20190218140926.333779-12-vsementsov@virtuozzo.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2019-02-22 09:42:13 +00:00
Vladimir Sementsov-Ogievskiy	342544f98b	block/qed: use qemu_iovec_init_buf Use new qemu_iovec_init_buf() instead of qemu_iovec_init_external( ... , 1), which simplifies the code. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20190218140926.333779-11-vsementsov@virtuozzo.com Message-Id: <20190218140926.333779-11-vsementsov@virtuozzo.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2019-02-22 09:42:13 +00:00
Vladimir Sementsov-Ogievskiy	c793d4ff20	block/qcow2: use qemu_iovec_init_buf Use new qemu_iovec_init_buf() instead of qemu_iovec_init_external( ... , 1), which simplifies the code. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20190218140926.333779-10-vsementsov@virtuozzo.com Message-Id: <20190218140926.333779-10-vsementsov@virtuozzo.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2019-02-22 09:42:13 +00:00
Vladimir Sementsov-Ogievskiy	30d780f8fe	block/qcow: use qemu_iovec_init_buf Use new qemu_iovec_init_buf() instead of qemu_iovec_init_external( ... , 1), which simplifies the code. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20190218140926.333779-9-vsementsov@virtuozzo.com Message-Id: <20190218140926.333779-9-vsementsov@virtuozzo.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2019-02-22 09:42:13 +00:00

1 2 3 4 5 ...

4184 Commits