qemu-e2k

Commit Graph

Author	SHA1	Message	Date
Philippe Mathieu-Daudé	9406e0d97e	block/nvme: Drop NVMeRegs structure, directly use NvmeBar NVMeRegs only contains NvmeBar. Simplify the code by using NvmeBar directly. This triggers a checkpatch.pl error: ERROR: Use of volatile is usually wrong, please add a comment #30: FILE: block/nvme.c:691: + volatile NvmeBar *regs; This is a false positive as in our case we are using I/O registers, so the 'volatile' use is justified. Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20200922083821.578519-5-philmd@redhat.com>	2020-10-05 09:35:52 +01:00
Philippe Mathieu-Daudé	37d7a45abd	block/nvme: Reduce I/O registers scope We only access the I/O register in nvme_init(). Remove the reference in BDRVNVMeState and reduce its scope. Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20200922083821.578519-4-philmd@redhat.com>	2020-10-05 09:35:52 +01:00
Philippe Mathieu-Daudé	f68453237b	block/nvme: Map doorbells pages write-only Per the datasheet sections 3.1.13/3.1.14: "The host should not read the doorbell registers." As we don't need read access, map the doorbells with write-only permission. We keep a reference to this mapped address in the BDRVNVMeState structure. Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20200922083821.578519-3-philmd@redhat.com>	2020-10-05 09:35:52 +01:00
Philippe Mathieu-Daudé	b02c01a513	util/vfio-helpers: Pass page protections to qemu_vfio_pci_map_bar() Pages are currently mapped READ/WRITE. To be able to use different protections, add a new argument to qemu_vfio_pci_map_bar(). Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20200922083821.578519-2-philmd@redhat.com>	2020-10-05 09:35:52 +01:00
Alberto Garcia	c508c73dca	qcow2: Use L1E_SIZE in qcow2_write_l1_entry() We overlooked these in `02b1ecfa10` Signed-off-by: Alberto Garcia <berto@igalia.com> Message-Id: <20200928162333.14998-1-berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-10-02 15:46:40 +02:00
Kevin Wolf	30dbc81d31	block/export: Move writable to BlockExportOptions The 'writable' option is a basic option that will probably be applicable to most if not all export types that we will implement. Move it from NBD to the generic BlockExport layer. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200924152717.287415-26-kwolf@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-10-02 15:46:40 +02:00
Kevin Wolf	8cade320c8	block/export: Add query-block-exports This adds a simple QMP command to query the list of block exports. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200924152717.287415-25-kwolf@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-10-02 15:46:40 +02:00
Kevin Wolf	331170e073	block/export: Create BlockBackend in blk_exp_add() Every export type will need a BlockBackend, so creating it centrally in blk_exp_add() instead of the .create driver callback avoids duplication. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200924152717.287415-24-kwolf@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-10-02 15:46:40 +02:00
Kevin Wolf	37a4f70cea	block/export: Move blk to BlockExport Every block export has a BlockBackend representing the disk that is exported. It should live in BlockExport therefore. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200924152717.287415-23-kwolf@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-10-02 15:46:40 +02:00
Kevin Wolf	1a9f7a804f	block/export: Add BLOCK_EXPORT_DELETED event Clients may want to know when an export has finally disappeard (block-export-del returns earlier than that in the general case), so add a QAPI event for it. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20200924152717.287415-22-kwolf@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-10-02 15:46:40 +02:00
Kevin Wolf	3c3bc462ad	block/export: Add block-export-del Implement a new QMP command block-export-del and make nbd-server-remove a wrapper around it. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200924152717.287415-21-kwolf@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-10-02 15:46:40 +02:00
Kevin Wolf	3859ad36f0	block/export: Move strong user reference to block_exports The reference owned by the user/monitor that is created when adding the export and dropped when removing it was tied to the 'exports' list in nbd/server.c. Every block export will have a user reference, so move it to the block export level and tie it to the 'block_exports' list in block/export/export.c instead. This is necessary for introducing a QMP command for removing exports. Note that exports are present in block_exports even after the user has requested shutdown. This is different from NBD's exports where exports are immediately removed on a shutdown request, even if they are still in the process of shutting down. In order to avoid that the user still interacts with an export that is shutting down (and possibly removes it a second time), we need to remember if the user actually still owns it. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200924152717.287415-20-kwolf@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-10-02 15:46:40 +02:00
Kevin Wolf	d53be9ce55	block/export: Add 'id' option to block-export-add We'll need an id to identify block exports in monitor commands. This adds one. Note that this is different from the 'name' option in the NBD server, which is the externally visible export name. While block export ids need to be unique in the whole process, export names must be unique only for the same server. Different export types or (potentially in the future) multiple NBD servers can have the same export name externally, but still need different block export ids internally. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200924152717.287415-19-kwolf@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-10-02 15:46:40 +02:00
Kevin Wolf	bc4ee65b8c	block/export: Add blk_exp_close_all(_type) This adds a function to shut down all block exports, and another one to shut down the block exports of a single type. The latter is used for now when stopping the NBD server. As soon as we implement support for multiple NBD servers, we'll need a per-server list of exports and it will be replaced by a function using that. As a side effect, the BlockExport layer has a list tracking all existing exports now. closed_exports loses its only user and can go away. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200924152717.287415-18-kwolf@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-10-02 15:46:40 +02:00
Kevin Wolf	a6ff798966	block/export: Allocate BlockExport in blk_exp_add() Instead of letting the driver allocate and return the BlockExport object, allocate it already in blk_exp_add() and pass it. This allows us to initialise the generic part before calling into the driver so that the driver can just use these values instead of having to parse the options a second time. For symmetry, move freeing the BlockExport to blk_exp_unref(). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200924152717.287415-17-kwolf@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-10-02 15:46:40 +02:00
Kevin Wolf	b6076afcab	block/export: Add node-name to BlockExportOptions Every block export needs a block node to export, so add a 'node-name' option to BlockExportOptions and remove the replaced option 'device' from BlockExportOptionsNbd. To maintain compatibility in nbd-server-add, BlockExportOptionsNbd needs to be wrapped by a new type NbdServerAddOptions that adds 'device' back because nbd-server-add doesn't use the BlockExportOptions base type at all (so even without changing it to a 'node-name' option in block-export-add, this compatibility code would be necessary). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200924152717.287415-16-kwolf@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-10-02 15:46:40 +02:00
Kevin Wolf	8612c68673	block/export: Move AioContext from NBDExport to BlockExport Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200924152717.287415-15-kwolf@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-10-02 15:46:40 +02:00
Kevin Wolf	c69de1bef5	block/export: Move refcount from NBDExport to BlockExport Having a refcount makes sense for all types of block exports. It is also a prerequisite for keeping a list of all exports at the BlockExport level. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200924152717.287415-14-kwolf@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-10-02 15:46:40 +02:00
Kevin Wolf	1c8222b014	nbd: Add max-connections to nbd-server-start This is a QMP equivalent of qemu-nbd's --shared option, limiting the maximum number of clients that can attach at the same time. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200924152717.287415-9-kwolf@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-10-02 15:46:40 +02:00
Kevin Wolf	9b562c646b	block/export: Remove magic from block-export-add nbd-server-add tries to be convenient and adds two questionable features that we don't want to share in block-export-add, even for NBD exports: 1. When requesting a writable export of a read-only device, the export is silently downgraded to read-only. This should be an error in the context of block-export-add. 2. When using a BlockBackend name, unplugging the device from the guest will automatically stop the NBD server, too. This may sometimes be what you want, but it could also be very surprising. Let's keep things explicit with block-export-add. If the user wants to stop the export, they should tell us so. Move these things into the nbd-server-add QMP command handler so that they apply only there. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200924152717.287415-8-kwolf@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-10-02 15:46:40 +02:00
Kevin Wolf	56ee86261e	block/export: Add BlockExport infrastructure and block-export-add We want to have a common set of commands for all types of block exports. Currently, this is only NBD, but we're going to add more types. This patch adds the basic BlockExport and BlockExportDriver structs and a QMP command block-export-add that creates a new export based on the given BlockExportOptions. qmp_nbd_server_add() becomes a wrapper around qmp_block_export_add(). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200924152717.287415-5-kwolf@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-10-02 15:46:40 +02:00
Kevin Wolf	143ea7670c	qapi: Rename BlockExport to BlockExportOptions The name BlockExport will be used for the struct containing the runtime state of block exports, so change the name of export creation options. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200924152717.287415-4-kwolf@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-10-02 15:46:40 +02:00
Kevin Wolf	5daa6bfd8e	qapi: Create block-export module Move all block export related types and commands from block-core to the new QAPI module block-export. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200924152717.287415-3-kwolf@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-10-02 15:46:40 +02:00
Philippe Mathieu-Daudé	74f2e02766	block/sheepdog: Replace magic val by NANOSECONDS_PER_SECOND definition Use self-explicit NANOSECONDS_PER_SECOND definition instead of magic value. Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200921110145.520944-1-philmd@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-10-02 15:46:40 +02:00
Philippe Mathieu-Daudé	f68c01470b	qapi: Restrict query-uuid command to machine code Only qemu-system-FOO and qemu-storage-daemon provide QMP monitors, therefore such declarations and definitions are irrelevant for user-mode emulation. Restricting the query-uuid command to machine.json pulls less QAPI-generated code into user-mode. Acked-by: Markus Armbruster <armbru@redhat.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200913195348.1064154-6-philmd@redhat.com> [Commit message tweaked] Signed-off-by: Markus Armbruster <armbru@redhat.com>	2020-09-29 15:41:35 +02:00
Stefan Hajnoczi	d73415a315	qemu/atomic.h: rename atomic_ to qatomic_ clang's C11 atomic_fetch_() functions only take a C11 atomic type pointer argument. QEMU uses direct types (int, etc) and this causes a compiler error when a QEMU code calls these functions in a source file that also included <stdatomic.h> via a system header file: $ CC=clang CXX=clang++ ./configure ... && make ../util/async.c:79:17: error: address argument to atomic operation must be a pointer to _Atomic type ('unsigned int ' invalid) Avoid using atomic_*() names in QEMU's atomic.h since that namespace is used by <stdatomic.h>. Prefix QEMU's APIs with 'q' so that atomic.h and <stdatomic.h> can co-exist. I checked /usr/include on my machine and searched GitHub for existing "qatomic_" users but there seem to be none. This patch was generated using: $ git grep -h -o '\<atomic$64$\?_[a-z0-9_]\+' include/qemu/atomic.h \| \ sort -u >/tmp/changed_identifiers $ for identifier in $(</tmp/changed_identifiers); do sed -i "s%\<$identifier\>%q$identifier%g" \ $(git grep -I -l "\<$identifier\>") done I manually fixed line-wrap issues and misaligned rST tables. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20200923105646.47864-1-stefanha@redhat.com>	2020-09-23 16:07:44 +01:00
Daniel P. Berrangé	b18a24a9f8	block/file: switch to use qemu_open/qemu_create for improved errors Currently at startup if using cache=none on a filesystem lacking O_DIRECT such as tmpfs, at startup QEMU prints qemu-system-x86_64: -drive file=/tmp/foo.img,cache=none: file system may not support O_DIRECT qemu-system-x86_64: -drive file=/tmp/foo.img,cache=none: Could not open '/tmp/foo.img': Invalid argument while at QMP level the hint is missing, so QEMU reports just "error": { "class": "GenericError", "desc": "Could not open '/tmp/foo.img': Invalid argument" } which is close to useless for the end user trying to figure out what they did wrong. With this change at startup QEMU prints qemu-system-x86_64: -drive file=/tmp/foo.img,cache=none: Unable to open '/tmp/foo.img': filesystem does not support O_DIRECT while at the QMP level QEMU reports a massively more informative "error": { "class": "GenericError", "desc": "Unable to open '/tmp/foo.img': filesystem does not support O_DIRECT" } Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>	2020-09-16 10:33:48 +01:00
Daniel P. Berrangé	448058aa99	util: rename qemu_open() to qemu_open_old() We want to introduce a new version of qemu_open() that uses an Error object for reporting problems and make this it the preferred interface. Rename the existing method to release the namespace for the new impl. Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>	2020-09-16 10:33:48 +01:00
Stefano Garzarella	7bae7c805d	block/rbd: add 'namespace' to qemu_rbd_strong_runtime_opts[] Commit `19ae9ae014` ("block/rbd: Add support for ceph namespaces") introduced namespace support for RBD, but we forgot to add the new 'namespace' options to qemu_rbd_strong_runtime_opts[]. The 'namespace' is used to identify the image, so it is a strong option since it can changes the data of a BDS. Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1821528 Fixes: `19ae9ae014` ("block/rbd: Add support for ceph namespaces") Cc: Florian Florensa <fflorensa@online.net> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Message-Id: <20200914190553.74871-1-sgarzare@redhat.com> Reviewed-by: Jason Dillaman <dillaman@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-09-15 11:31:10 +02:00
Alberto Garcia	bfd0989acf	qcow2: Convert qcow2_alloc_cluster_offset() into qcow2_alloc_host_offset() qcow2_alloc_cluster_offset() takes an (unaligned) guest offset and returns the (aligned) offset of the corresponding cluster in the qcow2 image. In practice none of the callers need to know where the cluster starts so this patch makes the function calculate and return the final host offset directly. The function is also renamed accordingly. See `388e581615` for a similar change to qcow2_get_cluster_offset(). Signed-off-by: Alberto Garcia <berto@igalia.com> Message-Id: <9bfef50ec9200d752413be4fc2aeb22a28378817.1599833007.git.berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-09-15 11:31:10 +02:00
Alberto Garcia	8e958260c5	qcow2: Make preallocate_co() resize the image to the correct size This function preallocates metadata structures and then extends the image to its new size, but that new size calculation is wrong because it doesn't take into account that the host_offset variable is always cluster-aligned. This problem can be reproduced with preallocation=metadata when the original size is not cluster-aligned but the new size is. In this case the final image size will be shorter than expected. qemu-img create -f qcow2 img.qcow2 31k qemu-img resize --preallocation=metadata img.qcow2 128k Signed-off-by: Alberto Garcia <berto@igalia.com> Message-Id: <adeb8b059917b141d5f5b3bd2a016262d3052c79.1599833007.git.berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> [mreitz: Mark compat=0.10 unsupported for iotest 125] Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-09-15 11:30:36 +02:00
John Snow	c1dadda02c	block/qcow: remove runtime opts Introduced by `d85f4222b4`, These were seemingly never used at all. Signed-off-by: John Snow <jsnow@redhat.com> Message-Id: <20200806211345.2925343-3-jsnow@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-09-15 11:05:13 +02:00
John Snow	30b70f070f	block/rbd: remove runtime_opts This saw its last use in `4bfb274165`. Signed-off-by: John Snow <jsnow@redhat.com> Message-Id: <20200806211345.2925343-2-jsnow@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-09-15 11:05:13 +02:00
Alberto Garcia	580384d637	qcow2: Return the original error code in qcow2_co_pwrite_zeroes() This function checks the current status of a (sub)cluster in order to see if an unaligned 'write zeroes' request can be done efficiently by simply updating the L2 metadata and without having to write actual zeroes to disk. If the situation does not allow using the fast path then the function returns -ENOTSUP and the caller falls back to writing zeroes. If can happen however that the aforementioned check returns an actual error code so in this case we should pass it to the caller. Signed-off-by: Alberto Garcia <berto@igalia.com> Message-Id: <20200909123739.719-1-berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-09-15 11:05:13 +02:00
Alberto Garcia	3fec237fca	qcow2: Make qcow2_free_any_clusters() free only one cluster This function takes an L2 entry and a number of clusters to free. Although in principle it can free any type of cluster (using the L2 entry to determine its type) in practice the API is broken because compressed clusters have a variable size and there is no way to free more than one without having the L2 entry of each one of them. The good news all callers are passing nb_clusters=1 so we can simply get rid of that parameter. Signed-off-by: Alberto Garcia <berto@igalia.com> Message-Id: <77cea0f4616f921d37e971b3c5b18a2faa24b173.1599573989.git.berto@igalia.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-09-15 11:05:13 +02:00
Alberto Garcia	1a52b73dba	qcow2: Handle QCowL2Meta on error in preallocate_co() If qcow2_alloc_cluster_offset() or qcow2_alloc_cluster_link_l2() fail then this function simply returns the error code, potentially leaking the QCowL2Meta structure and leaving stale items in s->cluster_allocs. A second problem is that this function calls qcow2_free_any_clusters() on failure but passing a host cluster offset instead of an L2 entry. Luckily for normal uncompressed clusters a raw offset also works like a valid L2 entry so it works just the same, but we should be using qcow2_free_clusters() instead. This patch fixes both problems by using qcow2_handle_l2meta(). Signed-off-by: Alberto Garcia <berto@igalia.com> Message-Id: <cd3a6b9abd43f9c0b60be413d760f0cacc67eb66.1599573989.git.berto@igalia.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-09-15 11:05:13 +02:00
Swapnil Ingle	83a6a90009	block/vhdx: Support vhdx image only with 512 bytes logical sector size block/vhdx uses qemu block layer where sector size is always 512 bytes. This may have issues with 4K logical sector sized vhdx image. For e.g qemu-img convert on such images fails with following assert: $qemu-img convert -f vhdx -O raw 4KTest1.vhdx test.raw qemu-img: util/iov.c:388: qiov_slice: Assertion `offset + len <= qiov->size' failed. Aborted This patch adds an check to return ENOTSUP for vhdx images which have logical sector size other than 512 bytes. Signed-off-by: Swapnil Ingle <swapnil.ingle@nutanix.com> Message-Id: <1596794594-44531-1-git-send-email-swapnil.ingle@nutanix.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-09-15 11:05:13 +02:00
Alberto Garcia	2b60c5b996	qcow2: Rewrite the documentation of qcow2_alloc_cluster_offset() The current text corresponds to an earlier, simpler version of this function and it does not explain how it works now. Signed-off-by: Alberto Garcia <berto@igalia.com> Message-Id: <bb5bd06f07c5a05b0818611de0d06ec5b66c8df3.1599150873.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-09-15 11:05:13 +02:00
Alberto Garcia	f7bd5bba1b	qcow2: Don't check nb_clusters when removing l2meta from the list In the past, when a new cluster was allocated the l2meta structure was a variable in the stack so it was necessary to have a way to tell whether it had been initialized and contained valid data or not. The nb_clusters field was used for this purpose. Since commit `f50f88b9fe` this is no longer the case, l2meta (nowadays a pointer to a list) is only allocated when needed and nb_clusters is guaranteed to be > 0 so this check is unnecessary. Signed-off-by: Alberto Garcia <berto@igalia.com> Message-Id: <ab0b67c29c7ba26e598db35f12aa5ab5982539c1.1599150873.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-09-15 11:05:13 +02:00
Alberto Garcia	184581fa4d	qcow2: Fix removal of list members from BDRVQcow2State.cluster_allocs When a write request needs to allocate new clusters (or change the L2 bitmap of existing ones) a QCowL2Meta structure is created so the L2 metadata can be later updated and any copy-on-write can be performed if necessary. A write request can span a region consisting of an arbitrary combination of previously unallocated and allocated clusters, and if the unallocated ones can be put contiguous to the existing ones then QEMU will do so in order to minimize the number of write operations. In practice this means that a write request has not just one but a number of QCowL2Meta structures. All of them are added to the cluster_allocs list that is stored in BDRVQcow2State and is used to detect overlapping requests. After the write request finishes all its associated QCowL2Meta are removed from that list. calculate_l2_meta() takes care of creating and putting those structures in the list, and qcow2_handle_l2meta() takes care of removing them. The problem is that the error path in handle_alloc() also tries to remove an item in that list, a remnant from the time when this was handled there (that code would not even be correct anymore because it only removes one struct and not all the ones from the same write request). This can trigger a double removal of the same item from the list, causing a crash. This is not easy to reproduce in practice because it requires that do_alloc_cluster_offset() fails after a successful previous allocation during the same write request, but it can be reproduced with the included test case. Signed-off-by: Alberto Garcia <berto@igalia.com> Message-Id: <3440a1c4d53c4fe48312b478c96accb338cbef7c.1599150873.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-09-15 11:05:13 +02:00
Alberto Garcia	02b1ecfa10	qcow2: Use macros for the L1, refcount and bitmap table entry sizes This patch replaces instances of sizeof(uint64_t) in the qcow2 driver with macros that indicate what those sizes are actually referring to. Signed-off-by: Alberto Garcia <berto@igalia.com> Message-Id: <20200828110828.13833-1-berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-09-15 11:05:12 +02:00
Lukas Straub	5eb9a3c7b0	block/quorum.c: stable children names If we remove the child with the highest index from the quorum, decrement s->next_child_index. This way we get stable children names as long as we only remove the last child. Signed-off-by: Lukas Straub <lukasstraub2@web.de> Fixes: https://bugs.launchpad.net/bugs/1881231 Reviewed-by: Zhang Chen <chen.zhang@intel.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-Id: <5d5f930424c1c770754041aa8ad6421dc4e2b58e.1596536719.git.lukasstraub2@web.de> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-09-15 11:05:12 +02:00
Peter Maydell	2499453eb1	Block layer patches: - qemu-img create: Fail gracefully when backing file is an empty string - Fixes related to filter block nodes ("Deal with filters" series) - block/nvme: Various cleanups required to use multiple queues - block/nvme: Use NvmeBar structure from "block/nvme.h" - file-win32: Fix "locking" option - iotests: Allow running from different directory -----BEGIN PGP SIGNATURE----- iQJFBAABCAAvFiEE3D3rFZqa+V09dFb+fwmycsiPL9YFAl9Z7bcRHGt3b2xmQHJl ZGhhdC5jb20ACgkQfwmycsiPL9ZS6w//bos+A0RfRRF0YFWkIBLQWxqzKcGvMJ8W XWv3mFzd47UaDgRYwVnCC3CR6bLYEINISngZ3geA4jI1+w7AtYKDOO0HN32dUg+D ZrNMn02701CA6qkmpxJ+yjsrl9ltR3jYe0me4Wr39Pvdexa2pl/e+M4Vas6FhkYL ghAwNThypscGCrFjAlz3ru2Sc/K+sPWrGoqkzr+SWvsm9wy4vb8aLxr8Yy50x/zc CqALS9SQ/YA93BCVi9CzPkVyV3ioA0kg/y38WvLtAQ9GZ3m/ekMro3WvdYsRsFCN LGXsuwFig+U7Kd7lJrCS9TLnlTJstNGqPq9jEoV5cThPvGknFfMvVOzRmmP7tzqT YRcPRy39z44OoLKa3kyg3aF38BTxt+9gPqBnivKMr9j9EecMvPsXXHRvF+lP+LsP j753Ih561hX6FurcjX8pc9GOM2cQA0GjlyL77UTTAmLZyFXP/8e55oQbBuYTylc/ Xlvmc/T+yEGiEGTnK+FxgDAiUaxbCCM9cDVStJjTvsIq43dwXb48g1onDsGZ5eDf j9lmAD6TJxHNOB5ErNsDPODf4/D1wJ9t9WVF8UZp9ArfPHRdxMzT7Q4LvetaDmVl +hQC9cgTq8Qd8LwSqbKEYua4L6iGbmLAT7/N6htq5L1eVLg76/tLg/tKSwh/vKAY yzPmyHaVK84= =gaaW -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging Block layer patches: - qemu-img create: Fail gracefully when backing file is an empty string - Fixes related to filter block nodes ("Deal with filters" series) - block/nvme: Various cleanups required to use multiple queues - block/nvme: Use NvmeBar structure from "block/nvme.h" - file-win32: Fix "locking" option - iotests: Allow running from different directory # gpg: Signature made Thu 10 Sep 2020 10:11:19 BST # gpg: using RSA key DC3DEB159A9AF95D3D7456FE7F09B272C88F2FD6 # gpg: issuer "kwolf@redhat.com" # gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" [full] # Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6 * remotes/kevin/tags/for-upstream: (65 commits) block/qcow2-cluster: Add missing "fallthrough" annotation block/nvme: Pair doorbell registers block/nvme: Use generic NvmeBar structure block/nvme: Group controller registers in NVMeRegs structure file-win32: Fix "locking" option iotests: Allow running from different directory iotests: Test committing to overridden backing iotests: Add test for commit in sub directory iotests: Add filter mirror test cases iotests: Add filter commit test cases iotests: Let complete_and_wait() work with commit iotests: Test that qcow2's data-file is flushed block: Leave BDS.backing_{file,format} constant block: Inline bdrv_co_block_status_from_*() blockdev: Fix active commit choice block: Drop backing_bs() qemu-img: Use child access functions nbd: Use CAF when looking for dirty bitmap commit: Deal with filters backup: Deal with filters ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2020-09-11 14:47:49 +01:00
Thomas Huth	b9be6faed1	block/qcow2-cluster: Add missing "fallthrough" annotation When compiling with -Werror=implicit-fallthrough, the compiler currently complains: ../../devel/qemu/block/qcow2-cluster.c: In function ‘cluster_needs_new_alloc’: ../../devel/qemu/block/qcow2-cluster.c:1320:12: error: this statement may fall through [-Werror=implicit-fallthrough=] if (l2_entry & QCOW_OFLAG_COPIED) { ^ ../../devel/qemu/block/qcow2-cluster.c:1323:5: note: here case QCOW2_CLUSTER_UNALLOCATED: ^~~~ It's quite obvious that the fallthrough is intended here, so let's add a comment to silence the compiler warning. Signed-off-by: Thomas Huth <thuth@redhat.com> Message-Id: <20200908070028.193298-1-thuth@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-09-10 11:11:13 +02:00
Philippe Mathieu-Daudé	e5ff22ba9f	block/nvme: Pair doorbell registers For each queue doorbell registers are paired as: - Submission Queue Tail Doorbell - Completion Queue Head Doorbell Reflect that in the NVMeRegs structure, and adapt nvme_create_queue_pair() accordingly. Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200904124130.583838-4-philmd@redhat.com> Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Fam Zheng <fam@euphon.net> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-09-10 11:11:13 +02:00
Philippe Mathieu-Daudé	c7100f0a0b	block/nvme: Use generic NvmeBar structure Commit `f3c507adcd` ("NVMe: Initial commit for new storage interface") introduced the NvmeBar structure. Unfortunately in commit `bdd6a90a9e` ("block: Add VFIO based NVMe driver") we duplicated it. Apparently in commit `a3d9a352d4` ("block: Move NVMe constants to a separate header") we tried to unify headers but forgot to remove the structure declared in the block/nvme.c source file. Do it now, and remove the structure size check which is redundant with the header check added in commit `74e18435c0` ("hw/block/nvme: Align I/O BAR to 4 KiB"). Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200904124130.583838-3-philmd@redhat.com> Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Fam Zheng <fam@euphon.net> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-09-10 11:11:13 +02:00
Philippe Mathieu-Daudé	0ea32f34ce	block/nvme: Group controller registers in NVMeRegs structure We want to use the NvmeBar structure from "block/nvme.h" in the next commit. As a preliminary step, group all the NVMe controller registers in the 'ctrl' field, keeping the doorbells registers out of it. Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200904124130.583838-2-philmd@redhat.com> Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Fam Zheng <fam@euphon.net> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-09-10 11:11:12 +02:00
Kevin Wolf	3b079ac0ff	file-win32: Fix "locking" option The intended behaviour was that locking=off/auto work and have no effect (to remain compatible with file-posix), whereas locking=on would return an error. Unfortunately, the code forgot to remove "locking" from the options QDict, so any attempt to use the option would fail. Replace the option parsing code for "locking" with something that is part of the raw_runtime_opts QemuOptsList (so it is properly removed from the QDict) and looks more like file-posix. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20200907092739.9988-1-kwolf@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-09-10 11:11:12 +02:00
Markus Armbruster	b15e402fc8	trace-events: Fix attribution of trace points to source Some trace points are attributed to the wrong source file. Happens when we neglect to update trace-events for code motion, or add events in the wrong place, or misspell the file name. Clean up with help of scripts/cleanup-trace-events.pl. Funnies requiring manual post-processing: * accel/tcg/cputlb.c trace points are in trace-events. * block.c and blockdev.c trace points are in block/trace-events. * hw/block/nvme.c uses the preprocessor to hide its trace point use from cleanup-trace-events.pl. * hw/tpm/tpm_spapr.c uses pseudo trace point tpm_spapr_show_buffer to guard debug code. * include/hw/xen/xen_common.h trace points are in hw/xen/trace-events. * linux-user/trace-events abbreviates a tedious list of filenames to /signal.c. net/colo-compare and net/filter-rewriter.c use pseudo trace points colo_compare_miscompare and colo_filter_rewriter_debug to guard debug code. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20200806141334.3646302-5-armbru@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-09-09 17:17:58 +01:00
Markus Armbruster	6ec9379870	trace-events: Delete unused trace points Tracked down with the help of scripts/cleanup-trace-events.pl. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-id: 20200806141334.3646302-4-armbru@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-09-09 17:17:02 +01:00
Max Reitz	0b877d09df	block: Leave BDS.backing_{file,format} constant Parts of the block layer treat BDS.backing_file as if it were whatever the image header says (i.e., if it is a relative path, it is relative to the overlay), other parts treat it like a cache for bs->backing->bs->filename (relative paths are relative to the CWD). Considering bs->backing->bs->filename exists, let us make it mean the former. Among other things, this now allows the user to specify a base when using qemu-img to commit an image file in a directory that is not the CWD (assuming, everything uses relative filenames). Before this patch: $ ./qemu-img create -f qcow2 foo/bot.qcow2 1M $ ./qemu-img create -f qcow2 -b bot.qcow2 foo/mid.qcow2 $ ./qemu-img create -f qcow2 -b mid.qcow2 foo/top.qcow2 $ ./qemu-img commit -b mid.qcow2 foo/top.qcow2 qemu-img: Did not find 'mid.qcow2' in the backing chain of 'foo/top.qcow2' $ ./qemu-img commit -b foo/mid.qcow2 foo/top.qcow2 qemu-img: Did not find 'foo/mid.qcow2' in the backing chain of 'foo/top.qcow2' $ ./qemu-img commit -b $PWD/foo/mid.qcow2 foo/top.qcow2 qemu-img: Did not find '[...]/foo/mid.qcow2' in the backing chain of 'foo/top.qcow2' After this patch: $ ./qemu-img commit -b mid.qcow2 foo/top.qcow2 Image committed. $ ./qemu-img commit -b foo/mid.qcow2 foo/top.qcow2 qemu-img: Did not find 'foo/mid.qcow2' in the backing chain of 'foo/top.qcow2' $ ./qemu-img commit -b $PWD/foo/mid.qcow2 foo/top.qcow2 Image committed. With this change, bdrv_find_backing_image() must look at whether the user has overridden a BDS's backing file. If so, it can no longer use bs->backing_file, but must instead compare the given filename against the backing node's filename directly. Note that this changes the QAPI output for a node's backing_file. We had very inconsistent output there (sometimes what the image header said, sometimes the actual filename of the backing image). This inconsistent output was effectively useless, so we have to decide one way or the other. Considering that bs->backing_file usually at runtime contained the path to the image relative to qemu's CWD (or absolute), this patch changes QAPI's backing_file to always report the bs->backing->bs->filename from now on. If you want to receive the image header information, you have to refer to full-backing-filename. This necessitates a change to iotest 228. The interesting information it really wanted is the image header, and it can get that now, but it has to use full-backing-filename instead of backing_file. Because of this patch's changes to bs->backing_file's behavior, we also need some reference output changes. Along with the changes to bs->backing_file, stop updating BDS.backing_format in bdrv_backing_attach() as well. This way, ImageInfo's backing-filename and backing-filename-format fields will represent what the image header says and nothing else. iotest 245 changes in behavior: With the backing node no longer overriding the parent node's backing_file string, you can now omit the @backing option when reopening a node with neither a default nor a current backing file even if it used to have a backing node at some point. 273 also changes: The base image is opened without a format layer, so ImageInfo.backing-filename-format used to report "file" for the base image's overlay after blockdev-snapshot. However, the image header never says "file" anywhere, so it now reports $IMGFMT. Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-09-07 12:31:31 +02:00
Max Reitz	549ec0d978	block: Inline bdrv_co_block_status_from_*() With bdrv_filter_bs(), we can easily handle this default filter behavior in bdrv_co_block_status(). blkdebug wants to have an additional assertion, so it keeps its own implementation, except bdrv_co_block_status_from_file() needs to be inlined there. Suggested-by: Eric Blake <eblake@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com>	2020-09-07 12:31:31 +02:00
Max Reitz	9a71b9de3f	commit: Deal with filters This includes some permission limiting (for example, we only need to take the RESIZE permission if the base is smaller than the top). Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-09-07 12:31:31 +02:00
Max Reitz	2b088c60bb	backup: Deal with filters Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com>	2020-09-07 12:31:31 +02:00
Max Reitz	3f072a7fb7	mirror: Deal with filters This includes some permission limiting (for example, we only need to take the RESIZE permission for active commits where the base is smaller than the top). base_overlay is introduced so we can query bdrv_is_allocated_above() on it - we cannot do that with base itself, because a filter's block_status is the same as its child node, so if there are filters on base, bdrv_is_allocated_above() on base would return information including base. Use this opportunity to rename qmp_drive_mirror()'s "source" BDS to "target_backing_bs", because that is what it really refers to. Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-09-07 12:31:31 +02:00
Max Reitz	c6f6d8462c	block-copy: Use CAF to find sync=top base Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com>	2020-09-07 12:31:31 +02:00
Max Reitz	0a7585dbba	block: Use child access functions for QAPI queries query-block, query-named-block-nodes, and query-blockstats now return any filtered child under "backing", not just bs->backing or COW children. This is so that filters do not interrupt the reported backing chain. This changes the output for iotest 184, as the throttled node now appears as a backing child. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com>	2020-09-07 12:31:31 +02:00
Max Reitz	3f26191c73	block: Report data child for query-blockstats It makes no sense to report the block stats of a purely metadata-storing child in query-blockstats. So if the primary child does not have any data, try to find a unique data-storing child. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com>	2020-09-07 12:31:31 +02:00
Max Reitz	07cd7b659a	block/null: Implement bdrv_get_allocated_file_size It is trivial, so we might as well do it. Remove _filter_actual_image_size from iotest 184, so we get to see the result in its reference output. Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-09-07 12:31:31 +02:00
Max Reitz	c8af87573f	block/snapshot: Fix fallback If the top node's driver does not provide snapshot functionality and we want to fall back to a node down the chain, we need to snapshot all non-COW children. For simplicity's sake, just do not fall back if there is more than one such child. Furthermore, we really only can fall back to bs->file and bs->backing, because bdrv_snapshot_goto() has to modify the child link (notably, set it to NULL). Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com>	2020-09-07 12:31:31 +02:00
Max Reitz	c4db2e25df	block: Use CAF in bdrv_co_rw_vmstate() If a node whose driver does not provide VM state functions has a metadata child, the VM state should probably go there; if it is a filter, the VM state should probably go there. It follows that we should generally go down to the primary child. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com>	2020-09-07 12:31:31 +02:00
Max Reitz	66b129ac5e	block: Iterate over children in refresh_limits Instead of looking at just bs->file and bs->backing, we should look at all children that could end up receiving forwarded requests. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com>	2020-09-07 12:31:31 +02:00
Max Reitz	fb787f02a6	vmdk: Drop vmdk_co_flush() Before HEAD^, we needed this because bdrv_co_flush() by itself would only flush bs->file. With HEAD^, bdrv_co_flush() will flush all children on which a WRITE or WRITE_UNCHANGED permission has been taken. Thus, vmdk no longer needs to do it itself. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com>	2020-09-07 12:31:31 +02:00
Max Reitz	883833e29c	block: Flush all children in generic code If the driver does not support .bdrv_co_flush() so bdrv_co_flush() itself has to flush the children of the given node, it should not flush just bs->file->bs, but in fact all children that might have been written to (judging from the permissions taken on them). This is a bug fix for qcow2 images with an external data file, as they so far did not flush that data_file node. In any case, the BLKDBG_EVENT() should be emitted on the primary child, because that is where a blkdebug node would be if there is any. Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com>	2020-09-07 12:31:31 +02:00
Max Reitz	23b93525a2	block: Use bdrv_cow_child() in bdrv_co_truncate() The condition modified here is not about potentially filtered children, but only about COW sources (i.e. traditional backing files). Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com>	2020-09-07 12:31:31 +02:00
Max Reitz	67acfd2188	stream: Deal with filters Because of the (not so recent anymore) changes that make the stream job independent of the base node and instead track the node above it, we have to split that "bottom" node into two cases: The bottom COW node, and the node directly above the base node (which may be an R/W filter or the bottom COW node). Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-09-07 12:31:31 +02:00
Max Reitz	cb8503159a	block: Use CAFs in block status functions Use the child access functions in the block status inquiry functions as appropriate. Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-09-07 12:31:31 +02:00
Max Reitz	93393e698c	block: Use bdrv_filter_(bs\|child) where obvious Places that use patterns like if (bs->drv->is_filter && bs->file) { ... something about bs->file->bs ... } should be BlockDriverState *filtered = bdrv_filter_bs(bs); if (filtered) { ... something about @filtered ... } instead. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com>	2020-09-07 12:31:31 +02:00
Max Reitz	4935e8be22	copy-on-read: Support compressed writes Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com>	2020-09-07 12:31:31 +02:00
Max Reitz	e7e754aec3	throttle: Support compressed writes Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com>	2020-09-07 12:31:31 +02:00
Max Reitz	8b8277cdb0	block: Drop bdrv_is_encrypted() The original purpose of bdrv_is_encrypted() was to inquire whether a BDS can be used without the user entering a password or not. It has not been used for that purpose for quite some time. Actually, it is not even fit for that purpose, because to answer that question, it would have recursively query all of the given node's children. So now we have to decide in which direction we want to fix bdrv_is_encrypted(): Recursively query all children, or drop it and just use bs->encrypted to get the current node's status? Nowadays, its only purpose is to report through bdrv_query_image_info() whether the given image is encrypted or not. For this purpose, it is probably more interesting to see whether a given node itself is encrypted or not (otherwise, a management application cannot discern for certain which nodes are really encrypted and which just have encrypted children). Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com>	2020-09-07 12:31:30 +02:00
Philippe Mathieu-Daudé	b111b3fcde	block/nvme: Use an array of EventNotifier In preparation of using multiple IRQ (thus multiple eventfds) make BDRVNVMeState::irq_notifier an array (for now of a single element, the admin queue notifier). Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200821195359.1285345-16-philmd@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-09-07 12:31:30 +02:00
Philippe Mathieu-Daudé	7a1fb2ef40	block/nvme: Extract nvme_poll_queue() As we want to do per-queue polling, extract the nvme_poll_queue() method which operates on a single queue. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200821195359.1285345-15-philmd@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-09-07 12:31:30 +02:00
Philippe Mathieu-Daudé	0a28b02ef9	block/nvme: Simplify nvme_create_queue_pair() arguments nvme_create_queue_pair() doesn't require BlockDriverState anymore. Replace it by BDRVNVMeState and AioContext to simplify. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200821195359.1285345-14-philmd@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-09-07 12:31:30 +02:00
Philippe Mathieu-Daudé	073a06978c	block/nvme: Replace BDRV_POLL_WHILE by AIO_WAIT_WHILE BDRV_POLL_WHILE() is defined as: #define BDRV_POLL_WHILE(bs, cond) ({ \ BlockDriverState *bs_ = (bs); \ AIO_WAIT_WHILE(bdrv_get_aio_context(bs_), \ cond); }) As we will remove the BlockDriverState use in the next commit, start by using the exploded version of BDRV_POLL_WHILE(). Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200821195359.1285345-13-philmd@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-09-07 12:31:30 +02:00
Philippe Mathieu-Daudé	3a6d34d066	block/nvme: Simplify nvme_init_queue() arguments nvme_init_queue() doesn't require BlockDriverState anymore. Replace it by BDRVNVMeState to simplify. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200821195359.1285345-12-philmd@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-09-07 12:31:30 +02:00
Philippe Mathieu-Daudé	38e1f8186f	block/nvme: Replace qemu_try_blockalign(bs) by qemu_try_memalign(pg_sz) qemu_try_blockalign() is a generic API that call back to the block driver to return its page alignment. As we call from within the very same driver, we already know to page alignment stored in our state. Remove indirections and use the value from BDRVNVMeState. This change is required to later remove the BlockDriverState argument, to make nvme_init_queue() per hardware, and not per block driver. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200821195359.1285345-11-philmd@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-09-07 12:31:30 +02:00
Philippe Mathieu-Daudé	2ed846930d	block/nvme: Replace qemu_try_blockalign0 by qemu_try_blockalign/memset In the next commit we'll get rid of qemu_try_blockalign(). To ease review, first replace qemu_try_blockalign0() by explicit calls to qemu_try_blockalign() and memset(). Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200821195359.1285345-10-philmd@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-09-07 12:31:30 +02:00
Philippe Mathieu-Daudé	7d3b214ae4	block/nvme: Use union of NvmeIdCtrl / NvmeIdNs structures We allocate an unique chunk of memory then use it for two different structures. By using an union, we make it clear the data is overlapping (and we can remove the casts). Suggested-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200821195359.1285345-9-philmd@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-09-07 12:24:53 +02:00
Philippe Mathieu-Daudé	4d98093937	block/nvme: Rename local variable We are going to modify the code in the next commit. Renaming the 'resp' variable to 'id' first makes the next commit easier to review. No logical changes. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200821195359.1285345-8-philmd@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-09-07 12:23:55 +02:00
Philippe Mathieu-Daudé	c8edbfb2cc	block/nvme: Use common error path in nvme_add_io_queue() Rearrange nvme_add_io_queue() by using a common error path. This will be proven useful in few commits where we add IRQ notification to the IO queues. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200821195359.1285345-7-philmd@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-09-07 12:23:55 +02:00
Philippe Mathieu-Daudé	bf6ce5ec6d	block/nvme: Improve error message when IO queue creation failed Do not use the same error message for different failures. Display a different error whether it is the CQ or the SQ. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200821195359.1285345-6-philmd@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-09-07 12:23:55 +02:00
Philippe Mathieu-Daudé	73159e52e6	block/nvme: Define INDEX macros to ease code review Use definitions instead of '0' or '1' indexes. Also this will be useful when using multi-queues later. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200821195359.1285345-5-philmd@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-09-07 12:23:55 +02:00
Philippe Mathieu-Daudé	0ea45f76eb	block/nvme: Let nvme_create_queue_pair() fail gracefully As nvme_create_queue_pair() is allowed to fail, replace the alloc() calls by try_alloc() to avoid aborting QEMU. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200821195359.1285345-4-philmd@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-09-07 12:23:55 +02:00
Philippe Mathieu-Daudé	e266f52cfb	block/nvme: Avoid further processing if trace event not enabled Avoid further processing if TRACE_NVME_SUBMIT_COMMAND_RAW is not enabled. This is an untested intend of performance optimization. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200821195359.1285345-3-philmd@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-09-07 12:23:55 +02:00
Philippe Mathieu-Daudé	e4f310fe7f	block/nvme: Replace magic value by SCALE_MS definition Use self-explicit SCALE_MS definition instead of magic value. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200821195359.1285345-2-philmd@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-09-07 12:23:55 +02:00
Peter Maydell	df8176274a	nbd patches for 2020-09-02 - fix a few iotests affected by earlier nbd changes - avoid blocking qemu by nbd client in connect() - build qemu-nbd for mingw -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEccLMIrHEYCkn0vOqp6FrSiUnQ2oFAl9QFB8ACgkQp6FrSiUn Q2pTwggArPN7dRwm1KD9jL8X6PV01uhLuLRFzrrofFX22Blroj5XPbR24BdKeN8V uDSFGzzoz2Vx3vKPHyZdx7bbeEL0pF9dzjZX6JwzT0McbVuge5aG2zC/ARdfc4PN 1Yf2FB/nY8Xt5G12usu3FIz7JBoQNm4mlPnqVqf7t0LQxgUFvO7F2LernyqEOYKS uSpXHNFqddZcax7etXeldJOlSGJBQTaCmplrJbw2ilVhLZJD+0OglY4SAsrrenN+ gb/KD4REjhtQsmoTaqthGCnGXipEoJYDfEOMhbkl0UK+9Mx0t+3cj3tflG0eGldM ERk1d4d7VSlyqIy7w43v3+IB8M99ag== =7Cn5 -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/ericb/tags/pull-nbd-2020-09-02' into staging nbd patches for 2020-09-02 - fix a few iotests affected by earlier nbd changes - avoid blocking qemu by nbd client in connect() - build qemu-nbd for mingw # gpg: Signature made Wed 02 Sep 2020 22:52:31 BST # gpg: using RSA key 71C2CC22B1C4602927D2F3AAA7A16B4A2527436A # gpg: Good signature from "Eric Blake <eblake@redhat.com>" [full] # gpg: aka "Eric Blake (Free Software Programmer) <ebb9@byu.net>" [full] # gpg: aka "[jpeg image of size 6874]" [full] # Primary key fingerprint: 71C2 CC22 B1C4 6029 27D2 F3AA A7A1 6B4A 2527 436A * remotes/ericb/tags/pull-nbd-2020-09-02: nbd: disable signals and forking on Windows builds nbd: skip SIGTERM handler if NBD device support is not built block: add missing socket_init() calls to tools block/nbd: use non-blocking connect: fix vm hang on connect() iotests/259: Fix reference output iotests/059: Fix reference output Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2020-09-03 21:35:01 +01:00
Vladimir Sementsov-Ogievskiy	1dc4718d84	block/nbd: use non-blocking connect: fix vm hang on connect() This makes nbd's connection_co yield during reconnects, so that reconnect doesn't block the main thread. This is very important in case of an unavailable nbd server host: connect() call may take a long time, blocking the main thread (and due to reconnect, it will hang again and again with small gaps of working time during pauses between connection attempts). Realization notes: - We don't want to implement non-blocking connect() over non-blocking socket, because getaddrinfo() doesn't have portable non-blocking realization anyway, so let's just use a thread for both getaddrinfo() and connect(). - We can't use qio_channel_socket_connect_async (which behaves similarly and starts a thread to execute connect() call), as it's relying on someone iterating main loop (g_main_loop_run() or something like this), which is not always the case. - We can't use thread_pool_submit_co API, as thread pool waits for all threads to finish (but we don't want to wait for blocking reconnect attempt on shutdown. So, we just create the thread by hand. Some additional difficulties are: - We want our connect to avoid blocking drained sections and aio context switches. To achieve this, we make it possible to "cancel" synchronous wait for the connect (which is a coroutine yield actually), still, the thread continues in background, and if successful, its result may be reused on next reconnect attempt. - We don't want to wait for reconnect on shutdown, so there is CONNECT_THREAD_RUNNING_DETACHED thread state, which means that the block layer is no longer interested in a result, and thread should close new connected socket on finish and free the state. How to reproduce the bug, fixed with this commit: 1. Create an image on node1: qemu-img create -f qcow2 xx 100M 2. Start NBD server on node1: qemu-nbd xx 3. Start vm with second nbd disk on node2, like this: ./x86_64-softmmu/qemu-system-x86_64 -nodefaults -drive \ file=/work/images/cent7.qcow2 -drive file=nbd+tcp://192.168.100.2 \ -vnc :0 -qmp stdio -m 2G -enable-kvm -vga std 4. Access the vm through vnc (or some other way?), and check that NBD drive works: dd if=/dev/sdb of=/dev/null bs=1M count=10 - the command should succeed. 5. Now, let's trigger nbd-reconnect loop in Qemu process. For this: 5.1 Kill NBD server on node1 5.2 run "dd if=/dev/sdb of=/dev/null bs=1M count=10" in the guest again. The command should fail and a lot of error messages about failing disk may appear as well. Now NBD client driver in Qemu tries to reconnect. Still, VM works well. 6. Make node1 unavailable on NBD port, so connect() from node2 will last for a long time: On node1 (Note, that 10809 is just a default NBD port): sudo iptables -A INPUT -p tcp --dport 10809 -j DROP After some time the guest hangs, and you may check in gdb that Qemu hangs in connect() call, issued from the main thread. This is the BUG. 7. Don't forget to drop iptables rule from your node1: sudo iptables -D INPUT -p tcp --dport 10809 -j DROP Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20200812145237.4396-1-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> [eblake: minor wording and formatting tweaks] Signed-off-by: Eric Blake <eblake@redhat.com>	2020-09-02 16:47:23 -05:00
Peter Maydell	e4d8b7c1a9	qemu-nvme -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE28EdLTc7SjdV9QLsYlFWYQpPbMAFAl9Pro4ACgkQYlFWYQpP bMC1CRAAkawTI4mMcOfI3smFoMeiY8kZJWXJUBXfHbMJ4asaoIjTkH/lXRXBw7KQ sH5tB9CuOums3VjagkZ0Sw6R/kP1LbyJTAwq/pwOXwYRDc/E3zQpMblkIHH1boIM Bxl814hw3hBqV+D0wgeKpl83lbiOpd10Cbpdb/xNKat6qVquLGurSGKgA7jNuF4s oPTPtfZpyH9LUr4DV+sL+fGX6vaCdSFZPZUhJqwFfx79+r3+YiHGLAE6fgsdGDJt 2RSSKMqBe2gg0BY5ToW9L55BsLnwMMrAZnGzEkeZvRKqm0JZBXQsERa61p4VEAJf uYkSEqOwsKjXQNTdDEekyH67AkgXaoqG0hiiOcgoLsla7C0zROtoKcfVM/+WC0LT T0/bfgubmoDV8kLzPuOV8xOGxjfbu4Qlxy1JsIC6BU4zBQvpDwOeTx3MUWaCUfvk YmDMEhZWGcZ3RBLrgQmzm4ZKMtGdYXnGQz5dwVkRRfghQs2fl5ZmUjGR7MqKe18n 4K0nzhPiXbOTlqvLVvzVlrBzdc8ECAs1kVoJF7C3LwRmXbT2N/fUhZP/nYpeM2Hj DQNmA8KpXMKae2+2iDnQNWbvdpz3SiHD6dK7A1bEsdoG0L60xfyeAF+JuPiESUnd OAhf+muxKiInv2k5GNh7mDZPWM6nDepf/PZP6ohc7dKxVam7N2M= =Y23H -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/nvme/tags/pull-nvme-20200902' into staging qemu-nvme # gpg: Signature made Wed 02 Sep 2020 15:39:10 BST # gpg: using RSA key DBC11D2D373B4A3755F502EC625156610A4F6CC0 # gpg: Good signature from "Keith Busch <kbusch@kernel.org>" [unknown] # gpg: aka "Keith Busch <keith.busch@gmail.com>" [unknown] # gpg: aka "Keith Busch <keith.busch@intel.com>" [unknown] # gpg: WARNING: This key is not certified with a trusted signature! # gpg: There is no indication that the signature belongs to the owner. # Primary key fingerprint: DBC1 1D2D 373B 4A37 55F5 02EC 6251 5661 0A4F 6CC0 * remotes/nvme/tags/pull-nvme-20200902: (39 commits) hw/block/nvme: remove explicit qsg/iov parameters hw/block/nvme: use preallocated qsg/iov in nvme_dma_prp hw/block/nvme: consolidate qsg/iov clearing hw/block/nvme: add ns/cmd references in NvmeRequest hw/block/nvme: be consistent about zeros vs zeroes hw/block/nvme: add check for mdts hw/block/nvme: refactor request bounds checking hw/block/nvme: verify validity of prp lists in the cmb hw/block/nvme: add request mapping helper hw/block/nvme: add tracing to nvme_map_prp hw/block/nvme: refactor dma read/write hw/block/nvme: destroy request iov before reuse hw/block/nvme: remove redundant has_sg member hw/block/nvme: replace dma_acct with blk_acct equivalent hw/block/nvme: add mapping helpers hw/block/nvme: memset preallocated requests structures hw/block/nvme: bump supported version to v1.3 hw/block/nvme: provide the mandatory subnqn field hw/block/nvme: enforce valid queue creation sequence hw/block/nvme: reject invalid nsid values in active namespace id list ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2020-09-02 21:20:20 +01:00
Klaus Jensen	69265150aa	hw/block/nvme: be consistent about zeros vs zeroes The NVM Express specification generally uses 'zeroes' and not 'zeros', so let us align with it. Cc: Fam Zheng <fam@euphon.net> Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>	2020-09-02 08:48:50 +02:00
Klaus Jensen	c26f217370	hw/block/nvme: bump spec data structures to v1.3 Add missing fields in the Identify Controller and Identify Namespace data structures to bring them in line with NVMe v1.3. This also adds data structures and defines for SGL support which requires a couple of trivial changes to the nvme block driver as well. Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Acked-by: Fam Zheng <fam@euphon.net> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com> Message-Id: <20200706061303.246057-2-its@irrelevant.dk>	2020-09-02 08:48:50 +02:00
Peter Maydell	887adde81d	meson fixes: * bump submodule to 0.55.1 * SDL, pixman and zlib fixes * firmwarepath fix * fix firmware builds meson related: * move install to Meson * move NSIS to Meson * do not make meson use cmake * add description to options -----BEGIN PGP SIGNATURE----- iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAl9OcpcUHHBib256aW5p QHJlZGhhdC5jb20ACgkQv/vSX3jHroM9+Af+InfjEFtsoubgQA2L7B1sHeksINOI nMAw9plmJSX0Qabp0PJDrMcBiLZlbdFGCm/88heTyDGRtbhIXLUrE2J0dyV4R4nR OSyuTgDna75QLxy6k1dIh6qVAtcj2hg+CxaJrUf2Ix6A1d1PAfWoweNXSF1LBmJf pyQ39eZXStI7/bkmwgTY3qK1gwjEaskvf68fTp1hgTN0VHwUb23/nucPaizbVF+a A0nWprPQaTjWhAFQ2jJesBbhN6FtD3EnOZs56JOSQ9J7W6uDnw8a+tOdEDbSYKe2 DdEUEjcz2cqAMCiWrzOMgxB8T885H+8yE6jgEmPHWaUUORQADMcBDMihNA== =NGLI -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/bonzini-gitlab/tags/for-upstream' into staging meson fixes: * bump submodule to 0.55.1 * SDL, pixman and zlib fixes * firmwarepath fix * fix firmware builds meson related: * move install to Meson * move NSIS to Meson * do not make meson use cmake * add description to options # gpg: Signature made Tue 01 Sep 2020 17:11:03 BST # gpg: using RSA key F13338574B662389866C7682BFFBD25F78C7AE83 # gpg: issuer "pbonzini@redhat.com" # gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [full] # gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" [full] # Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1 # Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83 * remotes/bonzini-gitlab/tags/for-upstream: (26 commits) Makefile: Fix in-tree clean/distclean Makefile: Add back TAGS/ctags/cscope rules meson: add description to options build: fix recurse-all target meson: use pkg-config method to find dependencies configure: do not include ${prefix} in firmwarepath meson: add pixman dependency to UI modules meson: add pixman dependency to chardev/baum module meson: add NSIS building meson: use meson mandir instead of qemu_mandir meson: pass docdir option meson: use meson datadir instead of qemu_datadir meson: pass qemu_suffix option configure: build docdir like other suffixed directories configure: always /-seperate directory from qemu_suffix configure: rename confsuffix option meson: move zlib detection to meson build-sys: remove install target from Makefile meson: install $localstatedir/run for qga meson: install desktop file ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2020-09-01 22:50:23 +01:00
Liao Pingfang	f181ab4ba5	block/vmdk: Remove superfluous breaks Remove superfluous breaks, as there is a "return" before them. Signed-off-by: Liao Pingfang <liao.pingfang@zte.com.cn> Signed-off-by: Yi Wang <wang.yi59@zte.com.cn> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Thomas Huth <thuth@redhat.com> Message-Id: <1594631107-36574-1-git-send-email-wang.yi59@zte.com.cn> Signed-off-by: Laurent Vivier <laurent@vivier.eu>	2020-09-01 08:37:28 +02:00
Paolo Bonzini	a10c8516ed	block: always link with zlib The qcow2 driver needs the zlib dependency. While emulators provided it through the migration code, this is not true of the tools. Move the dependency from the qcow1 rule directly into block_ss so that it is included unconditionally. Fixes build with --disable-qcow1. Reported-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Cc: qemu-block@nongnu.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-09-01 01:51:51 -04:00
Eduardo Habkost	7c9dcd6cab	throttle-groups: Move ThrottleGroup typedef to header Move typedef closer to the type check macros, to make it easier to convert the code to OBJECT_DEFINE_TYPE() in the future. Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Tested-By: Roman Bolshakov <r.bolshakov@yadro.com> Message-Id: <20200825192110.3528606-17-ehabkost@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2020-08-27 14:04:54 -04:00
Alberto Garcia	7bbb59202a	qcow2: Assert that expand_zero_clusters_in_l1() does not support subclusters This function is only used by qcow2_expand_zero_clusters() to downgrade a qcow2 image to a previous version. This would require transforming all extended L2 entries into normal L2 entries but this is not a simple task and there are no plans to implement this at the moment. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <15e65112b4144381b4d8c0bdf8fb76b0d813e3d1.1594396418.git.berto@igalia.com> [mreitz: Fixed comment style] Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-25 10:20:15 +02:00
Alberto Garcia	2118771ddf	qcow2: Allow preallocation and backing files if extended_l2 is set Traditional qcow2 images don't allow preallocation if a backing file is set. This is because once a cluster is allocated there is no way to tell that its data should be read from the backing file. Extended L2 entries have individual allocation bits for each subcluster, and therefore it is perfectly possible to have an allocated cluster with all its subclusters unallocated. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <6d5b0f38e7dc5f2f31d8cab1cb92044e9909aece.1594396418.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-25 09:20:04 +02:00
Alberto Garcia	7be2025258	qcow2: Add the 'extended_l2' option and the QCOW2_INCOMPAT_EXTL2 bit Now that the implementation of subclusters is complete we can finally add the necessary options to create and read images with this feature, which we call "extended L2 entries". Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <6476caaa73216bd05b7bb2d504a20415e1665176.1594396418.git.berto@igalia.com> [mreitz: %s/5\.1/5.2/; fixed 302's and 303's reference output] Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-25 09:19:55 +02:00
Alberto Garcia	40dee94320	qcow2: Add prealloc field to QCowL2Meta This field allows us to indicate that the L2 metadata update does not come from a write request with actual data but from a preallocation request. For traditional images this does not make any difference, but for images with extended L2 entries this means that the clusters are allocated normally in the L2 table but individual subclusters are marked as unallocated. This will allow preallocating images that have a backing file. There is one special case: when we resize an existing image we can also request that the new clusters are preallocated. If the image already had a backing file then we have to hide any possible stale data and zero out the new clusters (see commit `955c7d6687` for more details). In this case the subclusters cannot be left as unallocated so the L2 bitmap must be updated. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <960d4c444a4f5a870e2b47e5da322a73cd9a2f5a.1594396418.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-25 08:33:20 +02:00
Alberto Garcia	0dd07b298f	qcow2: Add subcluster support to qcow2_measure() Extended L2 entries are bigger than normal L2 entries so this has an impact on the amount of metadata needed for a qcow2 file. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <7efae2efd5e36b42d2570743a12576d68ce53685.1594396418.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-25 08:33:20 +02:00
Alberto Garcia	a6841a2de6	qcow2: Add subcluster support to qcow2_co_pwrite_zeroes() This works now at the subcluster level and pwrite_zeroes_alignment is updated accordingly. qcow2_cluster_zeroize() is turned into qcow2_subcluster_zeroize() with the following changes: - The request can now be subcluster-aligned. - The cluster-aligned body of the request is still zeroized using zero_in_l2_slice() as before. - The subcluster-aligned head and tail of the request are zeroized with the new zero_l2_subclusters() function. There is just one thing to take into account for a possible future improvement: compressed clusters cannot be partially zeroized so zero_l2_subclusters() on the head or the tail can return -ENOTSUP. This makes the caller repeat the complete request and write actual zeroes to disk. This is sub-optimal because 1) if the head area was compressed we would still be able to use the fast path for the body and possibly the tail. 2) if the tail area was compressed we are writing zeroes to the head and the body areas, which are already zeroized. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <17e05e2ee7e12f10dcf012da81e83ebe27eb3bef.1594396418.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-25 08:33:20 +02:00
Alberto Garcia	bf4a66eed4	qcow2: Add subcluster support to handle_alloc_space() The bdrv_co_pwrite_zeroes() call here fills complete clusters with zeroes, but it can happen that some subclusters are not part of the write request or the copy-on-write. This patch makes sure that only the affected subclusters are overwritten. A potential improvement would be to also fill with zeroes the other subclusters if we can guarantee that we are not overwriting existing data. However this would waste more disk space, so we should first evaluate if it's really worth doing. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <b3dc97e8e2240ddb5191a4f930e8fc9653f94621.1594396418.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-25 08:33:20 +02:00
Alberto Garcia	ff4cdec7f6	qcow2: Clear the L2 bitmap when allocating a compressed cluster Compressed clusters always have the bitmap part of the extended L2 entry set to 0. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <04455b3de5dfeb9d1cfe1fc7b02d7060a6e09710.1594396418.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-25 08:33:20 +02:00
Alberto Garcia	aca00cd971	qcow2: Update L2 bitmap in qcow2_alloc_cluster_link_l2() The L2 bitmap needs to be updated after each write to indicate what new subclusters are now allocated. This needs to happen even if the cluster was already allocated and the L2 entry was otherwise valid. In some cases however a write operation doesn't need change the L2 bitmap (because all affected subclusters were already allocated). This is detected in calculate_l2_meta(), and qcow2_alloc_cluster_link_l2() is never called in those cases. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <0875620d49f44320334b6a91c73b3f301f975f38.1594396418.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-25 08:33:20 +02:00
Alberto Garcia	fc2e6528d5	qcow2: Add subcluster support to check_refcounts_l2() The offset field of an uncompressed cluster's L2 entry must be aligned to the cluster size, otherwise it is invalid. If the cluster has no data then it means that the offset points to a preallocation, so we can clear the offset field without affecting the guest-visible data. This is what 'qemu-img check' does when run in repair mode. On traditional qcow2 images this can only happen when QCOW_OFLAG_ZERO is set, and repairing such entries turns the clusters from ZERO_ALLOC into ZERO_PLAIN. Extended L2 entries have no ZERO_ALLOC clusters and no QCOW_OFLAG_ZERO but the idea is the same: if none of the subclusters are allocated then we can clear the offset field and leave the bitmap untouched. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <9f4ed1d0a34b0a545b032c31ecd8c14734065342.1594396418.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-25 08:33:20 +02:00
Alberto Garcia	a68cd70326	qcow2: Add subcluster support to discard_in_l2_slice() Two things need to be taken into account here: 1) With full_discard == true the L2 entry must be cleared completely. This also includes the L2 bitmap if the image has extended L2 entries. 2) With full_discard == false we have to make the discarded cluster read back as zeroes. With normal L2 entries this is done with the QCOW_OFLAG_ZERO bit, whereas with extended L2 entries this is done with the individual 'all zeroes' bits for each subcluster. Note however that QCOW_OFLAG_ZERO is not supported in v2 qcow2 images so, if there is a backing file, discard cannot guarantee that the image will read back as zeroes. If this is important for the caller it should forbid it as qcow2_co_pdiscard() does (see `80f5c01183` for more details). Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <5ef8274e628aa3ab559bfac467abf488534f2b76.1594396418.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-25 08:33:20 +02:00
Alberto Garcia	205fa50750	qcow2: Add subcluster support to zero_in_l2_slice() The QCOW_OFLAG_ZERO bit that indicates that a cluster reads as zeroes is only used in standard L2 entries. Extended L2 entries use individual 'all zeroes' bits for each subcluster. This must be taken into account when updating the L2 entry and also when deciding that an existing entry does not need to be updated. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <b61d61606d8c9b367bd641ab37351ddb9172799a.1594396418.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-25 08:33:20 +02:00
Alberto Garcia	3f9c6b3b1f	qcow2: Add subcluster support to qcow2_get_host_offset() The logic of this function remains pretty much the same, except that it uses count_contiguous_subclusters(), which combines the logic of count_contiguous_clusters() / count_contiguous_clusters_unallocated() and checks individual subclusters. qcow2_cluster_to_subcluster_type() is not necessary as a separate function anymore so it's inlined into its caller. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <d2193fd48653a350d80f0eca1c67b1d9053fb2f3.1594396418.git.berto@igalia.com> [mreitz: Initialize expected_type to anything] Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-25 08:33:20 +02:00
Alberto Garcia	d53ec3d8d8	qcow2: Add subcluster support to calculate_l2_meta() If an image has subclusters then there are more copy-on-write scenarios that we need to consider. Let's say we have a write request from the middle of subcluster #3 until the end of the cluster: 1) If we are writing to a newly allocated cluster then we need copy-on-write. The previous contents of subclusters #0 to #3 must be copied to the new cluster. We can optimize this process by skipping all leading unallocated or zero subclusters (the status of those skipped subclusters will be reflected in the new L2 bitmap). 2) If we are overwriting an existing cluster: 2.1) If subcluster #3 is unallocated or has the all-zeroes bit set then we need copy-on-write (on subcluster #3 only). 2.2) If subcluster #3 was already allocated then there is no need for any copy-on-write. However we still need to update the L2 bitmap to reflect possible changes in the allocation status of subclusters #4 to #31. Because of this, this function checks if all the overwritten subclusters are already allocated and in this case it returns without creating a new QCowL2Meta structure. After all these changes l2meta_cow_start() and l2meta_cow_end() are not necessarily cluster-aligned anymore. We need to update the calculation of old_start and old_end in handle_dependencies() to guarantee that no two requests try to write on the same cluster. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <4292dd56e4446d386a2fe307311737a711c00708.1594396418.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-25 08:33:20 +02:00
Alberto Garcia	97490a143e	qcow2: Handle QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC When dealing with subcluster types there is a new value called QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC that has no equivalent in QCow2ClusterType. This patch handles that value in all places where subcluster types are processed. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <bf09e2e2439a468a901bb96ace411eed9ee50295.1594396418.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-25 08:33:20 +02:00
Alberto Garcia	10dabdc596	qcow2: Replace QCOW2_CLUSTER_* with QCOW2_SUBCLUSTER_* In order to support extended L2 entries some functions of the qcow2 driver need to start dealing with subclusters instead of clusters. qcow2_get_host_offset() is modified to return the subcluster type instead of the cluster type, and all callers are updated to replace all values of QCow2ClusterType with their QCow2SubclusterType equivalents. This patch only changes the data types, there are no semantic changes. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <f6c29737c295f32cbee74c903c30b01820363b34.1594396418.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-25 08:33:20 +02:00
Alberto Garcia	ca4a0bb81b	qcow2: Add cluster type parameter to qcow2_get_host_offset() This function returns an integer that can be either an error code or a cluster type (a value from the QCow2ClusterType enum). We are going to start using subcluster types instead of cluster types in some functions so it's better to use the exact data types instead of integers for clarity and in order to detect errors more easily. This patch makes qcow2_get_host_offset() return 0 on success and puts the returned cluster type in a separate parameter. There are no semantic changes. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <396b6eab1859a271551dcd7dcba77f8934aa3c3f.1594396418.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-25 08:33:20 +02:00
Alberto Garcia	c94d037825	qcow2: Add qcow2_cluster_is_allocated() This helper function tells us if a cluster is allocated (that is, there is an associated host offset for it). Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <6d8771c5c79cbdc6c519875a5078e1cc85856d63.1594396418.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-25 08:33:20 +02:00
Alberto Garcia	70d1cbae03	qcow2: Add qcow2_get_subcluster_range_type() There are situations in which we want to know how many contiguous subclusters of the same type there are in a given cluster. This can be done by simply iterating over the subclusters and repeatedly calling qcow2_get_subcluster_type() for each one of them. However once we determined the type of a subcluster we can check the rest efficiently by counting the number of adjacent ones (or zeroes) in the bitmap. This is what this function does. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <db917263d568ec6ffb4a41cac3c9100f96bf6c18.1594396418.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-25 08:33:20 +02:00
Alberto Garcia	34905d8eb1	qcow2: Add QCow2SubclusterType and qcow2_get_subcluster_type() This patch adds QCow2SubclusterType, which is the subcluster-level version of QCow2ClusterType. All QCOW2_SUBCLUSTER_* values have the the same meaning as their QCOW2_CLUSTER_* equivalents (when they exist). See below for details and caveats. In images without extended L2 entries clusters are treated as having exactly one subcluster so it is possible to replace one data type with the other while keeping the exact same semantics. With extended L2 entries there are new possible values, and every subcluster in the same cluster can obviously have a different QCow2SubclusterType so functions need to be adapted to work on the subcluster level. There are several things that have to be taken into account: a) QCOW2_SUBCLUSTER_COMPRESSED means that the whole cluster is compressed. We do not support compression at the subcluster level. b) There are two different values for unallocated subclusters: QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN which means that the whole cluster is unallocated, and QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC which means that the cluster is allocated but the subcluster is not. The latter can only happen in images with extended L2 entries. c) QCOW2_SUBCLUSTER_INVALID is used to detect the cases where an L2 entry has a value that violates the specification. The caller is responsible for handling these situations. To prevent compatibility problems with images that have invalid values but are currently being read by QEMU without causing side effects, QCOW2_SUBCLUSTER_INVALID is only returned for images with extended L2 entries. qcow2_cluster_to_subcluster_type() is added as a separate function from qcow2_get_subcluster_type(), but this is only temporary and both will be merged in a subsequent patch. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <26ef38e270f25851c98b51278852b4c4a7f97e69.1594396418.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-25 08:33:20 +02:00
Alberto Garcia	39a9f0a50e	qcow2: Update get/set_l2_entry() and add get/set_l2_bitmap() Extended L2 entries are 128-bit wide: 64 bits for the entry itself and 64 bits for the subcluster allocation bitmap. In order to support them correctly get/set_l2_entry() need to be updated so they take the entry width into account in order to calculate the correct offset. This patch also adds the get/set_l2_bitmap() functions that are used to access the bitmaps. For convenience we allow calling get_l2_bitmap() on images without subclusters. In this case the returned value is always 0 and has no meaning. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <6ee0f81ae3329c991de125618b3675e1e46acdbb.1594396418.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-25 08:33:20 +02:00
Alberto Garcia	c8fd8554d9	qcow2: Add l2_entry_size() qcow2 images with subclusters have 128-bit L2 entries. The first 64 bits contain the same information as traditional images and the last 64 bits form a bitmap with the status of each individual subcluster. Because of that we cannot assume that L2 entries are sizeof(uint64_t) anymore. This function returns the proper value for the image. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <d34d578bd0380e739e2dde3e8dd6187d3d249fa9.1594396418.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-25 08:33:20 +02:00
Alberto Garcia	3e71981592	qcow2: Add offset_into_subcluster() and size_to_subclusters() Like offset_into_cluster() and size_to_clusters(), but for subclusters. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <3cc2390dcdef3d234d47c741b708bd8734490862.1594396418.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-25 08:33:20 +02:00
Alberto Garcia	a53e8b7202	qcow2: Add offset_to_sc_index() For a given offset, return the subcluster number within its cluster (i.e. with 32 subclusters per cluster it returns a number between 0 and 31). Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <56e3e4ac0d827c6a2f5f259106c5ddb7c4ca2653.1594396418.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-25 08:33:20 +02:00
Alberto Garcia	d0346b5591	qcow2: Add subcluster-related fields to BDRVQcow2State This patch adds the following new fields to BDRVQcow2State: - subclusters_per_cluster: Number of subclusters in a cluster - subcluster_size: The size of each subcluster, in bytes - subcluster_bits: No. of bits so 1 << subcluster_bits = subcluster_size Images without subclusters are treated as if they had exactly one subcluster per cluster (i.e. subcluster_size = cluster_size). Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <55bfeac86b092fa2c9d182a95cbeb479ff7eca4f.1594396418.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-25 08:33:20 +02:00
Alberto Garcia	a3c7d91625	qcow2: Add dummy has_subclusters() function This function will be used by the qcow2 code to check if an image has subclusters or not. At the moment this simply returns false. Once all patches needed for subcluster support are ready then QEMU will be able to create and read images with subclusters and this function will return the actual value. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <905526221083581a1b7057bca1585487661c5c13.1594396418.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-25 08:33:20 +02:00
Alberto Garcia	12c6aebedf	qcow2: Add get_l2_entry() and set_l2_entry() The size of an L2 entry is 64 bits, but if we want to have subclusters we need extended L2 entries. This means that we have to access L2 tables and slices differently depending on whether an image has extended L2 entries or not. This patch replaces all l2_slice[] accesses with calls to get_l2_entry() and set_l2_entry(). Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <9586363531fec125ba1386e561762d3e4224e9fc.1594396418.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-25 08:33:20 +02:00
Alberto Garcia	57538c864f	qcow2: Process QCOW2_CLUSTER_ZERO_ALLOC clusters in handle_copied() When writing to a qcow2 file there are two functions that take a virtual offset and return a host offset, possibly allocating new clusters if necessary: - handle_copied() looks for normal data clusters that are already allocated and have a reference count of 1. In those clusters we can simply write the data and there is no need to perform any copy-on-write. - handle_alloc() looks for clusters that do need copy-on-write, either because they haven't been allocated yet, because their reference count is != 1 or because they are ZERO_ALLOC clusters. The ZERO_ALLOC case is a bit special because those are clusters that are already allocated and they could perfectly be dealt with in handle_copied() (as long as copy-on-write is performed when required). In fact, there is extra code specifically for them in handle_alloc() that tries to reuse the existing allocation if possible and frees them otherwise. This patch changes the handling of ZERO_ALLOC clusters so the semantics of these two functions are now like this: - handle_copied() looks for clusters that are already allocated and which we can overwrite (NORMAL and ZERO_ALLOC clusters with a reference count of 1). - handle_alloc() looks for clusters for which we need a new allocation (all other cases). One important difference after this change is that clusters found in handle_copied() may now require copy-on-write, but this will be necessary anyway once we add support for subclusters. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <eb17fc938f6be7be2e8d8ff42763d2c19241f866.1594396418.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-25 08:33:20 +02:00
Alberto Garcia	c1587d877e	qcow2: Split cluster_needs_cow() out of count_cow_clusters() We are going to need it in other places. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <65e5d9627ca2ebe7e62deaeddf60949c33067d9d.1594396418.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-25 08:33:20 +02:00
Alberto Garcia	8f91d6906c	qcow2: Add calculate_l2_meta() handle_alloc() creates a QCowL2Meta structure in order to update the image metadata and perform the necessary copy-on-write operations. This patch moves that code to a separate function so it can be used from other places. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <e5bc4a648dac31972bfa7a0e554be8064be78799.1594396418.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-25 08:33:20 +02:00
Alberto Garcia	388e581615	qcow2: Convert qcow2_get_cluster_offset() into qcow2_get_host_offset() qcow2_get_cluster_offset() takes an (unaligned) guest offset and returns the (aligned) offset of the corresponding cluster in the qcow2 image. In practice none of the callers need to know where the cluster starts so this patch makes the function calculate and return the final host offset directly. The function is also renamed accordingly. There is a pre-existing exception with compressed clusters: in this case the function returns the complete cluster descriptor (containing the offset and size of the compressed data). This does not change with this patch but it is now documented. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <ffae6cdc5ca8950e8280ac0f696dcc376cb07095.1594396418.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-25 08:33:20 +02:00
Alberto Garcia	9c4269d54b	qcow2: Make Qcow2AioTask store the full host offset The file_cluster_offset field of Qcow2AioTask stores a cluster-aligned host offset. In practice this is not very useful because all users() of this structure need the final host offset into the cluster, which they calculate using host_offset = file_cluster_offset + offset_into_cluster(s, offset) There is no reason why Qcow2AioTask cannot store host_offset directly and that is what this patch does. () compressed clusters are the exception: in this case what file_cluster_offset was storing was the full compressed cluster descriptor (offset + size). This does not change with this patch but it is documented now. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <07c4b15c644dcf06c9459f98846ac1c4ea96e26f.1594396418.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-25 08:33:20 +02:00
Marc-André Lureau	5e5733e599	meson: convert block Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-08-21 06:30:18 -04:00
Paolo Bonzini	243af0225a	trace: switch position of headers to what Meson requires Meson doesn't enjoy the same flexibility we have with Make in choosing the include path. In particular the tracing headers are using $(build_root)/$(<D). In order to keep the include directives unchanged, the simplest solution is to generate headers with patterns like "trace/trace-audio.h" and place forwarding headers in the source tree such that for example "audio/trace.h" includes "trace/trace-audio.h". This patch is too ugly to be applied to the Makefiles now. It's only a way to separate the changes to the tracing header files from the Meson rewrite of the tracing logic. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-08-21 06:18:24 -04:00
Stefan Reiter	7661a886a1	block/block-copy: always align copied region to cluster size Since commit `42ac214406` (block/block-copy: refactor task creation) block_copy_task_create calculates the area to be copied via bdrv_dirty_bitmap_next_dirty_area, but that can return an unaligned byte count if the image's last cluster end is not aligned to the bitmap's granularity. Always ALIGN_UP the resulting bytes value to satisfy block_copy_do_copy, which requires the 'bytes' parameter to be aligned to cluster size. Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Stefan Reiter <s.reiter@proxmox.com> Message-Id: <20200810095523.15071-1-s.reiter@proxmox.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-10 17:12:46 +02:00
Tuguoyi	348fcc4f7a	qcow2-cluster: Fix integer left shift error in qcow2_alloc_cluster_link_l2() When calculating the offset, the result of left shift operation will be promoted to type int64 automatically because the left operand of + operator is uint64_t. but the result after integer promotion may be produce an error value for us and trigger the following asserting error. For example, consider i=0x2000, cluster_bits=18, the result of left shift operation will be 0x80000000. Cause argument i is of signed integer type, the result is automatically promoted to 0xffffffff80000000 which is not we expected The way to trigger the assertion error: qemu-img create -f qcow2 -o preallocation=full,cluster_size=256k tmpdisk 10G This patch fix it by casting @i to uint64_t before doing left shift operation Signed-off-by: Guoyi Tu <tu.guoyi@h3c.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-id: 81ba90fe0c014f269621c283269b42ad@h3c.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2020-08-05 14:56:11 +01:00
Max Reitz	fe16c7ddf8	qcow2: Release read-only bitmaps when inactivated During migration, we release all bitmaps after storing them on disk, as long as they are (1) stored on disk, (2) not read-only, and (3) consistent. (2) seems arbitrary, though. The reason we do not release them is because we do not write them, as there is no need to; and then we just forget about all bitmaps that we have not written to the file. However, read-only persistent bitmaps are still in the file and in sync with their in-memory representation, so we may as well release them just like any R/W bitmap that we have updated. It leads to actual problems, too: After migration, letting the source continue may result in an error if there were any bitmaps on read-only nodes (such as backing images), because those have not been released by bdrv_inactive_all(), but bdrv_invalidate_cache_all() attempts to reload them (which fails, because they are still present in memory). Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200730120234.49288-2-mreitz@redhat.com> Tested-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2020-08-03 08:59:37 -05:00
Peter Maydell	5045be872d	nbd patches for 2020-07-28 - fix NBD handling of trim/zero requests larger than 2G - allow no-op resizes on NBD (in turn fixing qemu-img convert -c into NBD) - several deadlock fixes when using NBD reconnect -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEccLMIrHEYCkn0vOqp6FrSiUnQ2oFAl8gPV4ACgkQp6FrSiUn Q2ozdQgAiDHaHG2NX4jmduID7677/XhsLoVl1MV7UZnU+y9qQ2p+Mbsw1oMneu8P Dtfgx/mlWVGu68gn31f4xVq74VTZH6p3IGV7PMcYZ50xbESoFs6CYUwUWUp1GeC3 +kPOl0EpLvm1W/V93sKmg8FflGmNiJHNkfl/ddfk0gs6Z3EfjkmGJt7IP/pv1UCs 4icWvCJsqw2z8TnEwtTpMX5HZlWth1x37lUOShlPL5kA5hZqU+zYU/bYB5iKx+16 MebYg7C7CXYCCtH9cDH/swUWhOdQLkywA6yBAwc1zENsKy84aIAJIUls/Ji0q6CY A4s5c0FovLBuMDd9oLr0kJbkJQeVZA== =DD6l -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/ericb/tags/pull-nbd-2020-07-28' into staging nbd patches for 2020-07-28 - fix NBD handling of trim/zero requests larger than 2G - allow no-op resizes on NBD (in turn fixing qemu-img convert -c into NBD) - several deadlock fixes when using NBD reconnect # gpg: Signature made Tue 28 Jul 2020 15:59:42 BST # gpg: using RSA key 71C2CC22B1C4602927D2F3AAA7A16B4A2527436A # gpg: Good signature from "Eric Blake <eblake@redhat.com>" [full] # gpg: aka "Eric Blake (Free Software Programmer) <ebb9@byu.net>" [full] # gpg: aka "[jpeg image of size 6874]" [full] # Primary key fingerprint: 71C2 CC22 B1C4 6029 27D2 F3AA A7A1 6B4A 2527 436A * remotes/ericb/tags/pull-nbd-2020-07-28: block/nbd: nbd_co_reconnect_loop(): don't sleep if drained block/nbd: on shutdown terminate connection attempt block/nbd: allow drain during reconnect attempt block/nbd: split nbd_establish_connection out of nbd_client_connect iotests: Test convert to qcow2 compressed to NBD iotests: Add more qemu_img helpers iotests: Make qemu_nbd_popen() a contextmanager block: nbd: Fix convert qcow2 compressed to nbd nbd: Fix large trim/zero requests Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2020-07-28 20:43:03 +01:00
Peter Maydell	0c4fa5bc1a	Block patches for 5.1.0: - Fix block I/O for split transfers - Fix iotest 197 for non-qcow2 formats -----BEGIN PGP SIGNATURE----- iQFGBAABCAAwFiEEkb62CjDbPohX0Rgp9AfbAGHVz0AFAl8gK/gSHG1yZWl0ekBy ZWRoYXQuY29tAAoJEPQH2wBh1c9AR+kIALv+Z/A6SPpsAHjpyuRbluuhznfqPuiX mIVX0qNhsFBDAUVw1tOkMtfxOIvuaQW/QWzM0UPaHqB/I4ckzE6Dp98ys9uwHPdq ez23blWvBuB3P3y2ZBAYhhRlCqt3w4uI/lIJMu7VZBghXxj3fGcuTnLlWx8gb1IH 74MiBX8XPt532FiFTnpzxgns8NYkZY8mF6zduGqBPx6bPmdNdDfqAhL68Fv8uKJA k4dVH6ffPLZD+RrCz9GL5rsYQ6NR6tfyEoRMPqtJznhtzWwu5h5EF3p46VkcKheI k0axygEBAr9JbeCwbIK3a4hjQ7eaFQ6j9JR+lPZBRaDbLHv/xGNNuvw= =C4Lq -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/maxreitz/tags/pull-block-2020-07-28' into staging Block patches for 5.1.0: - Fix block I/O for split transfers - Fix iotest 197 for non-qcow2 formats # gpg: Signature made Tue 28 Jul 2020 14:45:28 BST # gpg: using RSA key 91BEB60A30DB3E8857D11829F407DB0061D5CF40 # gpg: issuer "mreitz@redhat.com" # gpg: Good signature from "Max Reitz <mreitz@redhat.com>" [full] # Primary key fingerprint: 91BE B60A 30DB 3E88 57D1 1829 F407 DB00 61D5 CF40 * remotes/maxreitz/tags/pull-block-2020-07-28: iotests/197: Fix for non-qcow2 formats iotests/028: Add test for cross-base-EOF reads block: Fix bdrv_aligned_p*v() for qiov_offset != 0 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2020-07-28 18:00:21 +01:00
Vladimir Sementsov-Ogievskiy	12c75e20a2	block/nbd: nbd_co_reconnect_loop(): don't sleep if drained We try to go to wakeable sleep, so that, if drain begins it will break the sleep. But what if nbd_client_co_drain_begin() already called and s->drained is already true? We'll go to sleep, and drain will have to wait for the whole timeout. Let's improve it. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200727184751.15704-5-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2020-07-28 09:54:43 -05:00
Vladimir Sementsov-Ogievskiy	fbeb3e63b3	block/nbd: on shutdown terminate connection attempt On shutdown nbd driver may be in a connecting state. We should shutdown it as well, otherwise we may hang in nbd_teardown_connection, waiting for conneciton_co to finish in BDRV_POLL_WHILE(bs, s->connection_co) loop if remote server is down. How to reproduce the dead lock: 1. Create nbd-fault-injector.conf with the following contents: [inject-error "mega1"] event=data io=readwrite when=before 2. In one terminal run nbd-fault-injector in a loop, like this: n=1; while true; do echo $n; ((n++)); ./nbd-fault-injector.py 127.0.0.1:10000 nbd-fault-injector.conf; done 3. In another terminal run qemu-io in a loop, like this: n=1; while true; do echo $n; ((n++)); ./qemu-io -c 'read 0 512' nbd://127.0.0.1:10000; done After some time, qemu-io will hang. Note, that this hang may be triggered by another bug, so the whole case is fixed only together with commit "block/nbd: allow drain during reconnect attempt". Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200727184751.15704-4-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2020-07-28 09:54:43 -05:00
Vladimir Sementsov-Ogievskiy	dd1ec1a4af	block/nbd: allow drain during reconnect attempt It should be safe to reenter qio_channel_yield() on io/channel read/write path, so it's safe to reduce in_flight and allow attaching new aio context. And no problem to allow drain itself: connection attempt is not a guest request. Moreover, if remote server is down, we can hang in negotiation, blocking drain section and provoking a dead lock. How to reproduce the dead lock: 1. Create nbd-fault-injector.conf with the following contents: [inject-error "mega1"] event=data io=readwrite when=before 2. In one terminal run nbd-fault-injector in a loop, like this: n=1; while true; do echo $n; ((n++)); ./nbd-fault-injector.py 127.0.0.1:10000 nbd-fault-injector.conf; done 3. In another terminal run qemu-io in a loop, like this: n=1; while true; do echo $n; ((n++)); ./qemu-io -c 'read 0 512' nbd://127.0.0.1:10000; done After some time, qemu-io will hang trying to drain, for example, like this: #3 aio_poll (ctx=0x55f006bdd890, blocking=true) at util/aio-posix.c:600 #4 bdrv_do_drained_begin (bs=0x55f006bea710, recursive=false, parent=0x0, ignore_bds_parents=false, poll=true) at block/io.c:427 #5 bdrv_drained_begin (bs=0x55f006bea710) at block/io.c:433 #6 blk_drain (blk=0x55f006befc80) at block/block-backend.c:1710 #7 blk_unref (blk=0x55f006befc80) at block/block-backend.c:498 #8 bdrv_open_inherit (filename=0x7fffba1563bc "nbd+tcp://127.0.0.1:10000", reference=0x0, options=0x55f006be86d0, flags=24578, parent=0x0, child_class=0x0, child_role=0, errp=0x7fffba154620) at block.c:3491 #9 bdrv_open (filename=0x7fffba1563bc "nbd+tcp://127.0.0.1:10000", reference=0x0, options=0x0, flags=16386, errp=0x7fffba154620) at block.c:3513 #10 blk_new_open (filename=0x7fffba1563bc "nbd+tcp://127.0.0.1:10000", reference=0x0, options=0x0, flags=16386, errp=0x7fffba154620) at block/block-backend.c:421 And connection_co stack like this: #0 qemu_coroutine_switch (from_=0x55f006bf2650, to_=0x7fe96e07d918, action=COROUTINE_YIELD) at util/coroutine-ucontext.c:302 #1 qemu_coroutine_yield () at util/qemu-coroutine.c:193 #2 qio_channel_yield (ioc=0x55f006bb3c20, condition=G_IO_IN) at io/channel.c:472 #3 qio_channel_readv_all_eof (ioc=0x55f006bb3c20, iov=0x7fe96d729bf0, niov=1, errp=0x7fe96d729eb0) at io/channel.c:110 #4 qio_channel_readv_all (ioc=0x55f006bb3c20, iov=0x7fe96d729bf0, niov=1, errp=0x7fe96d729eb0) at io/channel.c:143 #5 qio_channel_read_all (ioc=0x55f006bb3c20, buf=0x7fe96d729d28 "\300.\366\004\360U", buflen=8, errp=0x7fe96d729eb0) at io/channel.c:247 #6 nbd_read (ioc=0x55f006bb3c20, buffer=0x7fe96d729d28, size=8, desc=0x55f004f69644 "initial magic", errp=0x7fe96d729eb0) at /work/src/qemu/master/include/block/nbd.h:365 #7 nbd_read64 (ioc=0x55f006bb3c20, val=0x7fe96d729d28, desc=0x55f004f69644 "initial magic", errp=0x7fe96d729eb0) at /work/src/qemu/master/include/block/nbd.h:391 #8 nbd_start_negotiate (aio_context=0x55f006bdd890, ioc=0x55f006bb3c20, tlscreds=0x0, hostname=0x0, outioc=0x55f006bf19f8, structured_reply=true, zeroes=0x7fe96d729dca, errp=0x7fe96d729eb0) at nbd/client.c:904 #9 nbd_receive_negotiate (aio_context=0x55f006bdd890, ioc=0x55f006bb3c20, tlscreds=0x0, hostname=0x0, outioc=0x55f006bf19f8, info=0x55f006bf1a00, errp=0x7fe96d729eb0) at nbd/client.c:1032 #10 nbd_client_connect (bs=0x55f006bea710, errp=0x7fe96d729eb0) at block/nbd.c:1460 #11 nbd_reconnect_attempt (s=0x55f006bf19f0) at block/nbd.c:287 #12 nbd_co_reconnect_loop (s=0x55f006bf19f0) at block/nbd.c:309 #13 nbd_connection_entry (opaque=0x55f006bf19f0) at block/nbd.c:360 #14 coroutine_trampoline (i0=113190480, i1=22000) at util/coroutine-ucontext.c:173 Note, that the hang may be triggered by another bug, so the whole case is fixed only together with commit "block/nbd: on shutdown terminate connection attempt". Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20200727184751.15704-3-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2020-07-28 09:54:43 -05:00
Vladimir Sementsov-Ogievskiy	fa35591b9c	block/nbd: split nbd_establish_connection out of nbd_client_connect We are going to implement non-blocking version of nbd_establish_connection, which for a while will be used only for nbd_reconnect_attempt, not for nbd_open, so we need to call it separately. Refactor nbd_reconnect_attempt in a way which makes next commit simpler. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20200727184751.15704-2-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2020-07-28 09:54:43 -05:00
Nir Soffer	a2b333c018	block: nbd: Fix convert qcow2 compressed to nbd When converting to qcow2 compressed format, the last step is a special zero length compressed write, ending in a call to bdrv_co_truncate(). This call always fails for the nbd driver since it does not implement bdrv_co_truncate(). For block devices, which have the same limits, the call succeeds since the file driver implements bdrv_co_truncate(). If the caller asked to truncate to the same or smaller size with exact=false, the truncate succeeds. Implement the same logic for nbd. Example failing without this change: In one shell start qemu-nbd: $ truncate -s 1g test.tar $ qemu-nbd --socket=/tmp/nbd.sock --persistent --format=raw --offset 1536 test.tar In another shell convert an image to qcow2 compressed via NBD: $ echo "disk data" > disk.raw $ truncate -s 1g disk.raw $ qemu-img convert -f raw -O qcow2 -c disk1.raw nbd+unix:///?socket=/tmp/nbd.sock; echo $? 1 qemu-img failed, but the conversion was successful: $ qemu-img info nbd+unix:///?socket=/tmp/nbd.sock image: nbd+unix://?socket=/tmp/nbd.sock file format: qcow2 virtual size: 1 GiB (1073741824 bytes) ... $ qemu-img check nbd+unix:///?socket=/tmp/nbd.sock No errors were found on the image. 1/16384 = 0.01% allocated, 100.00% fragmented, 100.00% compressed clusters Image end offset: 393216 $ qemu-img compare disk.raw nbd+unix:///?socket=/tmp/nbd.sock Images are identical. Fixes: https://bugzilla.redhat.com/1860627 Signed-off-by: Nir Soffer <nsoffer@redhat.com> Message-Id: <20200727215846.395443-2-nsoffer@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> [eblake: typo fixes] Signed-off-by: Eric Blake <eblake@redhat.com>	2020-07-28 09:54:19 -05:00
Peter Maydell	2649915121	bitmaps patches for 2020-07-27 - Improve handling of various post-copy bitmap migration scenarios. A lost bitmap should merely mean that the next backup must be full rather than incremental, rather than abruptly breaking the entire guest migration. - Associated iotest improvements -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEccLMIrHEYCkn0vOqp6FrSiUnQ2oFAl8fPRkACgkQp6FrSiUn Q2qanQf/dRTrqZ7/hs8aENySf44o0dBzOLZr+FBcrqEj2sd0c6jPzV2X5CVtnA1v gBgKJJGLpti3mSeNQDbaXZIQrsesBAuxvJsc6vZ9npDCdMYnK/qPE3Zfw1bx12qR cb39ba28P4izgs216h92ZACtUewnvjkxyJgN7zfmCJdNcwZINMUItAS183tSbQjn n39Wb7a+umsRgV9HQv/6cXlQIPqFMyAOl5kkzV3evuw7EBoHFnNq4cjPrUnjkqiD xf2pcSomaedYd37SpvoH57JxfL3z/90OBcuXhFvbqFk4FgQ63rJ32nRve2ZbIDI0 XPbohnYjYoFv6Xs/jtTzctZCbZ+jTg== =1dmz -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/ericb/tags/pull-bitmaps-2020-07-27' into staging bitmaps patches for 2020-07-27 - Improve handling of various post-copy bitmap migration scenarios. A lost bitmap should merely mean that the next backup must be full rather than incremental, rather than abruptly breaking the entire guest migration. - Associated iotest improvements # gpg: Signature made Mon 27 Jul 2020 21:46:17 BST # gpg: using RSA key 71C2CC22B1C4602927D2F3AAA7A16B4A2527436A # gpg: Good signature from "Eric Blake <eblake@redhat.com>" [full] # gpg: aka "Eric Blake (Free Software Programmer) <ebb9@byu.net>" [full] # gpg: aka "[jpeg image of size 6874]" [full] # Primary key fingerprint: 71C2 CC22 B1C4 6029 27D2 F3AA A7A1 6B4A 2527 436A * remotes/ericb/tags/pull-bitmaps-2020-07-27: (24 commits) migration: Fix typos in bitmap migration comments iotests: Adjust which migration tests are quick qemu-iotests/199: add source-killed case to bitmaps postcopy qemu-iotests/199: add early shutdown case to bitmaps postcopy qemu-iotests/199: check persistent bitmaps qemu-iotests/199: prepare for new test-cases addition migration/savevm: don't worry if bitmap migration postcopy failed migration/block-dirty-bitmap: cancel migration on shutdown migration/block-dirty-bitmap: relax error handling in incoming part migration/block-dirty-bitmap: keep bitmap state for all bitmaps migration/block-dirty-bitmap: simplify dirty_bitmap_load_complete migration/block-dirty-bitmap: rename finish_lock to just lock migration/block-dirty-bitmap: refactor state global variables migration/block-dirty-bitmap: move mutex init to dirty_bitmap_mig_init migration/block-dirty-bitmap: rename dirty_bitmap_mig_cleanup migration/block-dirty-bitmap: rename state structure types migration/block-dirty-bitmap: fix dirty_bitmap_mig_before_vm_start qemu-iotests/199: increase postcopy period qemu-iotests/199: change discard patterns qemu-iotests/199: improve performance: set bitmap by discard ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2020-07-28 14:38:17 +01:00
Max Reitz	134b7dec6e	block: Fix bdrv_aligned_p*v() for qiov_offset != 0 Since these functions take a @qiov_offset, they must always take it into account when working with @qiov. There are a couple of places where they do not, but they should. Fixes: `65cd4424b9` ("block/io: bdrv_aligned_preadv: use and support qiov_offset") Fixes: `28c4da2869` ("block/io: bdrv_aligned_pwritev: use and support qiov_offset") Reported-by: Claudio Fontana <cfontana@suse.de> Reported-by: Bruce Rogers <brogers@suse.com> Cc: qemu-stable@nongnu.org Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200728120806.265916-2-mreitz@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Tested-by: Claudio Fontana <cfontana@suse.de> Tested-by: Bruce Rogers <brogers@suse.com>	2020-07-28 15:28:47 +02:00
Andrey Shinkevich	8098969cf2	qcow2: Fix capitalization of header extension constant. Make the capitalization of the hexadecimal numbers consistent for the QCOW2 header extension constants in docs/interop/qcow2.txt. Suggested-by: Eric Blake <eblake@redhat.com> Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <1594973699-781898-2-git-send-email-andrey.shinkevich@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2020-07-27 15:39:58 -05:00
Max Reitz	984c367814	block/amend: Check whether the node exists We should check whether the user-specified node-name actually refers to a node. The simplest way to do that is to use bdrv_lookup_bs() instead of bdrv_find_node() (the former wraps the latter, and produces an error message if necessary). Reported-by: Coverity (CID 1430268) Fixes: `ced914d0ab` Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200710095037.10885-1-mreitz@redhat.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>	2020-07-27 12:37:25 +02:00
Peter Maydell	0c1fd2f41f	Block layer patches: - file-posix: Handle `EINVAL` fallocate return value - qemu-img convert -n: Keep qcow2 v2 target sparse -----BEGIN PGP SIGNATURE----- iQJFBAABCAAvFiEE3D3rFZqa+V09dFb+fwmycsiPL9YFAl8XDZgRHGt3b2xmQHJl ZGhhdC5jb20ACgkQfwmycsiPL9bpFQ/9EnS4iV9w0KW/NuJ4FIVdBD/VZFzokDLi 1vXVVEjoxAxxiP8KlGM9HRi5NtvOMgzKhNGias0wOFiBorx8Ppfc+3sqwygc2dnw Vbl/od2D7xQZkddnp4Upo70m+eWRW6xaxX+lAcl6iS3gBPDwExLaYfBN8lFUyRrs T4C0miD+abEEyL3C5A4cEZJ7CIs0n7AqZkqgytWA7clwy79VgDSuMOgP6DOP1tGH 1uK4gMCB0xbn+PHk96lXPORcLwDBOP0PIluo/zBmffzsEZN1Lv5ddVmxMQWSivin UmAbpeEtSw9Py5lRVmLSBYvolVOUleE/Rlzad2iue2be5/G8VP8xiRYMp9mUVpLO +LPMUd9NRkPx7wjUJMPKF0G9FgVO7R0+9J6rC33aKBj2XAlxY6qQlqUN2Jo11/fK 2+9AkU7WVqx3vuW2Zz7wjq3Rjvpg/sK+V3P3Cm6HTwwwPbEwv8GcFe6eKdvJrZ9K hhwiFSUOd90OUAdKOQXKMFSZ/t1TrZhdX882Hvth11/AlQAUY4cxQbSKcc2nrvLu Axk0Va3haOD+ReRTs8W/iYNdrXGZmbr3MCkNiK3QSvnrdj602ompco7xyDTX1/qH 6Hu28q7jUG3p3cApLQIZVjmogfqcGU7SWIY4lp9HZqtGP0z+pmWg46UNqzlKLyv5 Y/fVHHshlRU= =QhOf -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging Block layer patches: - file-posix: Handle `EINVAL` fallocate return value - qemu-img convert -n: Keep qcow2 v2 target sparse # gpg: Signature made Tue 21 Jul 2020 16:45:28 BST # gpg: using RSA key DC3DEB159A9AF95D3D7456FE7F09B272C88F2FD6 # gpg: issuer "kwolf@redhat.com" # gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" [full] # Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6 * remotes/kevin/tags/for-upstream: iotests: Test sparseness for qemu-img convert -n qcow2: Implement v2 zero writes with discard if possible file-posix: Handle `EINVAL` fallocate return value Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2020-07-21 19:25:48 +01:00
Peter Maydell	b50dab9eca	QOM patches for 2020-07-21 -----BEGIN PGP SIGNATURE----- iQJGBAABCAAwFiEENUvIs9frKmtoZ05fOHC0AOuRhlMFAl8XDGsSHGFybWJydUBy ZWRoYXQuY29tAAoJEDhwtADrkYZTPTYP/11AHdREDvz+KDl7AXtndxE9wo6GPGUy GhMSXODgOm6a4/nk4xQQR7MU+57C2IMnYbAAdrNXA3YVHjUPMURVsAeTsl9ruchC 1JiNrtXKgaWv5P8WeNzuXcH9B7n3UZ/mV4FH8v0FhjZhsH6EHP0zOhgNScPyQurz AsmYN5hSoSz8jRFQ8xlNzBhNrYX5dG6fOs4M6yBRUgzaCuRcSn/+cgjngfAmcH4k F0KK+rBQA+KUAKYp0QUaUcbD5oj/6KaQMsfqsFROR8m+AAJrhkP2ffmcDCHfCW8A SQsp8k2SkdHXNgRoDJSea1TYMitCFrTBDv+MfBWOLH4ewQrGAzvsuG5o3FlnVeN7 CKHkkxqOEifJ/vnThVBTIsqxf8/HefDXi1B5BXSKYSvnPKnnh8HCZ7VxvAsNGZiR epr4gEGBCcb9/bXFUmVVPz6H+lUWORzhF4P0spNJwH5BT9FLybWgJt9o4KH+pxYA DL4GF5vOkBgIhgUR+vn535vik7M38u6gsB8m22s2FRkZmIxTpxp9eH2ehHGSoVYM Yl1kVzJmFMPakl0gG1dMmM4+DJGRTHLCfBFS4pzMs9DaCNHUF3CB8FjXQs3NIFCS XGnChbri/wF83DEueTIrUAqF2w0XgEy55aVBOZkmFT6DXPXbx+Y3Q/AomaktBLyv FFUe9SfMn63P =HTNj -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/armbru/tags/pull-qom-2020-07-21' into staging QOM patches for 2020-07-21 # gpg: Signature made Tue 21 Jul 2020 16:40:27 BST # gpg: using RSA key 354BC8B3D7EB2A6B68674E5F3870B400EB918653 # gpg: issuer "armbru@redhat.com" # gpg: Good signature from "Markus Armbruster <armbru@redhat.com>" [full] # gpg: aka "Markus Armbruster <armbru@pond.sub.org>" [full] # Primary key fingerprint: 354B C8B3 D7EB 2A6B 6867 4E5F 3870 B400 EB91 8653 * remotes/armbru/tags/pull-qom-2020-07-21: qom: Make info qom-tree sort children more efficiently qom: Document object_get_canonical_path() returns malloced string qom: Change object_get_canonical_path_component() not to malloc Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2020-07-21 18:31:52 +01:00
Kevin Wolf	61b3043965	qcow2: Implement v2 zero writes with discard if possible qcow2 version 2 images don't support the zero flag for clusters, so for write_zeroes requests, we return -ENOTSUP and get explicit zero buffer writes. If the image doesn't have a backing file, we can do better: Just discard the respective clusters. This is relevant for 'qemu-img convert -O qcow2 -n', where qemu-img has to assume that the existing target image may contain any data, so it has to write zeroes. Without this patch, this results in a fully allocated target image, even if the source image was empty. Reported-by: Nir Soffer <nsoffer@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20200721135520.72355-2-kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-07-21 16:28:57 +02:00
Antoine Damhet	bae127d4dc	file-posix: Handle `EINVAL` fallocate return value The `detect-zeroes=unmap` option may issue unaligned `FALLOC_FL_PUNCH_HOLE` requests, raw block devices can (and will) return `EINVAL`, qemu should then write the zeroes to the blockdev instead of issuing an `IO_ERROR`. The problem can be reprodced like this: $ qemu-io -c 'write -P 0 42 1234' --image-opts driver=host_device,filename=/dev/loop0,detect-zeroes=unmap write failed: Invalid argument Signed-off-by: Antoine Damhet <antoine.damhet@blade-group.com> Message-Id: <20200717135603.51180-1-antoine.damhet@blade-group.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-07-21 16:28:57 +02:00
Markus Armbruster	7a309cc95b	qom: Change object_get_canonical_path_component() not to malloc object_get_canonical_path_component() returns a malloced copy of a property name on success, null on failure. 19 of its 25 callers immediately free the returned copy. Change object_get_canonical_path_component() to return the property name directly. Since modifying the name would be wrong, adjust the return type to const char *. Drop the free from the 19 callers become simpler, add the g_strdup() to the other six. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20200714160202.3121879-4-armbru@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Li Qiang <liq3ea@gmail.com>	2020-07-21 16:23:43 +02:00
Stefan Hajnoczi	1d719ddc35	block: fix bdrv_aio_cancel() for ENOMEDIUM requests bdrv_aio_cancel() calls aio_poll() on the AioContext for the given I/O request until it has completed. ENOMEDIUM requests are special because there is no BlockDriverState when the drive has no medium! Define a .get_aio_context() function for BlkAioEmAIOCB requests so that bdrv_aio_cancel() can find the AioContext where the completion BH is pending. Without this function bdrv_aio_cancel() aborts on ENOMEDIUM requests! libFuzzer triggered the following assertion: cat << EOF \| qemu-system-i386 -M pc-q35-5.0 \ -nographic -monitor none -serial none \ -qtest stdio -trace ide\* outl 0xcf8 0x8000fa24 outl 0xcfc 0xe106c000 outl 0xcf8 0x8000fa04 outw 0xcfc 0x7 outl 0xcf8 0x8000fb20 write 0x0 0x3 0x2780e7 write 0xe106c22c 0xd 0x1130c218021130c218021130c2 write 0xe106c218 0x15 0x110010110010110010110010110010110010110010 EOF ide_exec_cmd IDE exec cmd: bus 0x56170a77a2b8; state 0x56170a77a340; cmd 0xe7 ide_reset IDEstate 0x56170a77a340 Aborted (core dumped) (gdb) bt #1 0x00007ffff4f93895 in abort () at /lib64/libc.so.6 #2 0x0000555555dc6c00 in bdrv_aio_cancel (acb=0x555556765550) at block/io.c:2745 #3 0x0000555555dac202 in blk_aio_cancel (acb=0x555556765550) at block/block-backend.c:1546 #4 0x0000555555b1bd74 in ide_reset (s=0x555557213340) at hw/ide/core.c:1318 #5 0x0000555555b1e3a1 in ide_bus_reset (bus=0x5555572132b8) at hw/ide/core.c:2422 #6 0x0000555555b2aa27 in ahci_reset_port (s=0x55555720eb50, port=2) at hw/ide/ahci.c:650 #7 0x0000555555b29fd7 in ahci_port_write (s=0x55555720eb50, port=2, offset=44, val=16) at hw/ide/ahci.c:360 #8 0x0000555555b2a564 in ahci_mem_write (opaque=0x55555720eb50, addr=556, val=16, size=1) at hw/ide/ahci.c:513 #9 0x000055555598415b in memory_region_write_accessor (mr=0x55555720eb80, addr=556, value=0x7fffffffb838, size=1, shift=0, mask=255, attrs=...) at softmmu/memory.c:483 Looking at bdrv_aio_cancel: 2728 /* async I/Os / 2729 2730 void bdrv_aio_cancel(BlockAIOCB acb) 2731 { 2732 qemu_aio_ref(acb); 2733 bdrv_aio_cancel_async(acb); 2734 while (acb->refcnt > 1) { 2735 if (acb->aiocb_info->get_aio_context) { 2736 aio_poll(acb->aiocb_info->get_aio_context(acb), true); 2737 } else if (acb->bs) { 2738 /* qemu_aio_ref and qemu_aio_unref are not thread-safe, so 2739 * assert that we're not using an I/O thread. Thread-safe 2740 * code should use bdrv_aio_cancel_async exclusively. 2741 */ 2742 assert(bdrv_get_aio_context(acb->bs) == qemu_get_aio_context()); 2743 aio_poll(bdrv_get_aio_context(acb->bs), true); 2744 } else { 2745 abort(); <=============== 2746 } 2747 } 2748 qemu_aio_unref(acb); 2749 } Fixes: `02c50efe08` ("block: Add bdrv_aio_cancel_async") Reported-by: Alexander Bulekov <alxndr@bu.edu> Buglink: https://bugs.launchpad.net/qemu/+bug/1878255 Originally-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20200720100141.129739-1-stefanha@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-07-21 12:00:38 +02:00
Maxim Levitsky	662d0c5392	block/crypto: disallow write sharing by default My commit 'block/crypto: implement the encryption key management' accidently allowed raw luks images to be shared between different qemu processes without share-rw=on explicit override. Fix that. Fixes: `bbfdae91fb` ("block/crypto: implement the encryption key management") Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1857490 Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20200719122059.59843-2-mlevitsk@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-07-21 10:49:02 +02:00
Kevin Wolf	a8c5cf27c9	file-posix: Fix leaked fd in raw_open_common() error path Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20200717105426.51134-4-kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-07-17 14:20:57 +02:00
Kevin Wolf	bca5283bd4	file-posix: Fix check_hdev_writable() with auto-read-only For Linux block devices, being able to open the device read-write doesn't necessarily mean that the device is actually writable (one example is a read-only LV, as you get with lvchange -pr <device>). We have check_hdev_writable() to check this condition and fail opening the image read-write if it's not actually writable. However, this check doesn't take auto-read-only into account, but results in a hard failure instead of downgrading to read-only where possible. Fix this and do the writable check not based on BDRV_O_RDWR, but only when this actually results in opening the file read-write. A second check is inserted in raw_reconfigure_getfd() to have the same check when dynamic auto-read-only upgrades an image file from read-only to read-write. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20200717105426.51134-3-kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-07-17 14:20:57 +02:00
Kevin Wolf	20eaf1bf6e	file-posix: Move check_hdev_writable() up We'll need to call it in raw_open_common(), so move the function to avoid a forward declaration. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20200717105426.51134-2-kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-07-17 14:20:57 +02:00
Kevin Wolf	5edc85571e	file-posix: Allow byte-aligned O_DIRECT with NFS Since commit `a6b257a08e` ('file-posix: Handle undetectable alignment'), we assume that if we open a file with O_DIRECT and alignment probing returns 1, we just couldn't find out the real alignment requirement because some filesystems make the requirement only for allocated blocks. In this case, a safe default of 4k is used. This is too strict for NFS, which does actually allow byte-aligned requests even with O_DIRECT. Because we can't distinguish both cases with generic code, let's just look at the file system magic and disable s->needs_alignment for NFS. This way, O_DIRECT can still be used on NFS for images that are not aligned to 4k. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200716142601.111237-3-kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-07-17 14:20:57 +02:00
Marc-André Lureau	a08464521c	Remove VXHS block device The vxhs code doesn't compile since v2.12.0. There's no point in fixing and then adding CI for a config that our users have demonstrated that they do not use; better to just remove it. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20200711065926.2204721-1-marcandre.lureau@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-07-17 14:20:57 +02:00
Peter Maydell	d2628b1eb7	Block layer patches: - file-posix: Mitigate file fragmentation with extent size hints - Tighten qemu-img rules on missing backing format - qemu-img map: Don't limit block status request size - Fix crash with virtio-scsi and iothreads -----BEGIN PGP SIGNATURE----- iQJFBAABCAAvFiEE3D3rFZqa+V09dFb+fwmycsiPL9YFAl8NsgMRHGt3b2xmQHJl ZGhhdC5jb20ACgkQfwmycsiPL9Z0tA//eqauxD7cTEpwrtLNrRtpiBtMG64BBpxz QfkURzB38bMVahHlwq3Gt7Zcov8V4V7vxK66h688Z/fhw3vmqIeVe8+P6+Y5s9FL jil8lewHuLTa6xELeugoV7SZXH8AAh1W2fQmiR7EPiOmpSE0wf7C5IShVlX8A04E r0n09+61qGjRIe1hNTwTtldqQEfx6UGnxQWcQb81JUPA1lZhX3cnPg/j94Bofr+m v/DbVTfsmUtTMjc0PdU7n4DKTWu8OS5B/X0unF21rTtO//cYBrhAeY3ax2jbFBWi CIZK8HLI5m9/HFyltql1LOsd+B5TtfnXMfSdvDh2jaVUlto7wTeTnWU1fv4wxUB5 hk7XgJo/y203ebFNHpTmW8tvLfGTP8uqCVfOEFxzjy+JHGrarlbWkwL2LMOFFAZ2 s2WcwlfqiYGFTG4+OFdhPf9qPWKSqMr+jTdZJTse64/c6+YXWHk+pP9lfYEUOgSi OYwdQUY9uiZ1K13q5Tif2TbFvs+c118xdTgVhAV7VtfPnWc3c647dX7iaq8Szknc IT93670Iqf/PzEj+L7XUbbLIIsAcmxD0sr7QAQEt7bfiYIDRIQLiVPyzXplETFg2 SEkvtqBovm84ct7pWQzqA6lFvr3oIFDNquR40XFGozHNnlBeNi5s7pXQnqUBLElr wDDuEi+z5QM= =DB0q -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging Block layer patches: - file-posix: Mitigate file fragmentation with extent size hints - Tighten qemu-img rules on missing backing format - qemu-img map: Don't limit block status request size - Fix crash with virtio-scsi and iothreads # gpg: Signature made Tue 14 Jul 2020 14:24:19 BST # gpg: using RSA key DC3DEB159A9AF95D3D7456FE7F09B272C88F2FD6 # gpg: issuer "kwolf@redhat.com" # gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" [full] # Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6 * remotes/kevin/tags/for-upstream: block: Avoid stale pointer dereference in blk_get_aio_context() qemu-img: Deprecate use of -b without -F block: Add support to warn on backing file change without format iotests: Specify explicit backing format where sensible qcow2: Deprecate use of qemu-img amend to change backing file block: Error if backing file fails during creation without -u qcow: Tolerate backing_fmt= vmdk: Add trivial backing_fmt support sheepdog: Add trivial backing_fmt support block: Finish deprecation of 'qemu-img convert -n -o' qemu-img: Flush stdout before before potential stderr messages file-posix: Mitigate file fragmentation with extent size hints iotests/059: Filter out disk size with more standard filter qemu-img map: Don't limit block status request size iotests: Simplify _filter_img_create() a bit Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2020-07-14 19:39:52 +01:00
Greg Kurz	e6cada9231	block: Avoid stale pointer dereference in blk_get_aio_context() It is possible for blk_remove_bs() to race with blk_drain_all(), causing the latter to dereference a stale blk->root pointer: blk_remove_bs(blk) bdrv_root_unref_child(blk->root) child_bs = blk->root->bs bdrv_detach_child(blk->root) ... g_free(blk->root) <============== blk->root becomes stale bdrv_unref(child_bs) <============ yield at some point A blk_drain_all() can be triggered by some guest action in the meantime, eg. on POWER, SLOF might disable bus mastering on a virtio-scsi-pci device: virtio_write_config() virtio_pci_stop_ioeventfd() virtio_bus_stop_ioeventfd() virtio_scsi_dataplane_stop() blk_drain_all() blk_get_aio_context() bs = blk->root ? blk->root->bs : NULL ^^^^^^^^^ stale Then, depending on one's luck, QEMU either crashes with SEGV or hits the assertion in blk_get_aio_context(). blk->root is set by blk_insert_bs() which calls bdrv_root_attach_child() first. The blk_remove_bs() function should rollback the changes made by blk_insert_bs() in the opposite order (or it should be documented somewhere why this isn't the case). Clear blk->root before calling bdrv_root_unref_child() in blk_remove_bs(). Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <159430264541.389456.11925072456012783045.stgit@bahia.lan> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-07-14 15:24:15 +02:00
Eric Blake	e54ee1b385	block: Add support to warn on backing file change without format For now, this is a mechanical addition; all callers pass false. But the next patch will use it to improve 'qemu-img rebase -u' when selecting a backing file with no format. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com> Message-Id: <20200706203954.341758-10-eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-07-14 15:18:59 +02:00
Eric Blake	bc5ee6da71	qcow2: Deprecate use of qemu-img amend to change backing file The use of 'qemu-img amend' to change qcow2 backing files is not tested very well. In particular, our implementation has a bug where if a new backing file is provided without a format, then the prior format is blindly reused, even if this results in data corruption, but this is not caught by iotests. There are also situations where amending other options needs access to the original backing file (for example, on a downgrade to a v2 image, knowing whether a v3 zero cluster must be allocated or may be left unallocated depends on knowing whether the backing file already reads as zero), but the command line does not have a nice way to tell us both the backing file to use for opening the image as well as the backing file to install after the operation is complete. Even if we do allow changing the backing file, it is redundant with the existing ability to change backing files via 'qemu-img rebase -u'. It is time to deprecate this support (leaving the existing behavior intact, even if it is buggy), and at a point in the future, require the use of only 'qemu-img rebase' for adjusting backing chain relations, saving 'qemu-img amend' for changes unrelated to the backing chain. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20200706203954.341758-8-eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-07-14 15:18:59 +02:00
Eric Blake	344acbd62f	qcow: Tolerate backing_fmt= qcow has no space in the metadata to store a backing format, and there are existing qcow images backed both by raw or by other formats (usually qcow) images, reliant on probing to tell the difference. On the bright side, because we probe every time, raw files are marked as probed and we thus forbid a commit action into the backing file where guest-controlled contents could change the result of the probe next time around (the iotest added here proves that). Still, allowing the user to specify the backing format during creation, even if we can't record it, is a good thing. This patch blindly allows any value that resolves to a known driver, even if the user's request is a mismatch from what probing finds; then the next patch will further enhance things to verify that the user's request matches what we actually probe. With this and the next patch in place, we will finally be ready to deprecate the creation of images where a backing format was not explicitly specified by the user. Note that this is only for QemuOpts usage; there is no change to the QAPI to allow a format through -blockdev. Add a new iotest 301 just for qcow, to demonstrate the latest behavior, and to make it easier to show the improvements made in the next patch. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20200706203954.341758-6-eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-07-14 15:18:59 +02:00
Eric Blake	d51a814cf4	vmdk: Add trivial backing_fmt support vmdk already requires that if backing_file is present, that it be another vmdk image (see vmdk_co_do_create). Meanwhile, we want to move towards always being explicit about the backing format for other drivers where it matters. So for convenience, make qemu-img create -F vmdk work, while rejecting all other explicit formats (note that this is only for QemuOpts usage; there is no change to the QAPI to allow a format through -blockdev). Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20200706203954.341758-5-eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-07-14 15:18:59 +02:00
Eric Blake	80fa43e7df	sheepdog: Add trivial backing_fmt support Sheepdog already requires that if backing_file is present, that it be another sheepdog image (see sd_co_create). Meanwhile, we want to move towards always being explicit about the backing format for other drivers where it matters. So for convenience, make qemu-img create -F sheepdog work, while rejecting all other explicit formats (note that this is only for QemuOpts usage; there is no change to the QAPI to allow a format through -blockdev). Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20200706203954.341758-4-eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-07-14 15:18:59 +02:00
Kevin Wolf	ffa244c84a	file-posix: Mitigate file fragmentation with extent size hints Especially when O_DIRECT is used with image files so that the page cache indirection can't cause a merge of allocating requests, the file will fragment on the file system layer, with a potentially very small fragment size (this depends on the requests the guest sent). On Linux, fragmentation can be reduced by setting an extent size hint when creating the file (at least on XFS, it can't be set any more after the first extent has been allocated), basically giving raw files a "cluster size" for allocation. This adds a create option to set the extent size hint, and changes the default from not setting a hint to setting it to 1 MB. The main reason why qcow2 defaults to smaller cluster sizes is that COW becomes more expensive, which is not an issue with raw files, so we can choose a larger size. The tradeoff here is only potentially wasted disk space. For qcow2 (or other image formats) over file-posix, the advantage should even be greater because they grow sequentially without leaving holes, so there won't be wasted space. Setting even larger extent size hints for such images may make sense. This can be done with the new option, but let's keep the default conservative for now. The effect is very visible with a test that intentionally creates a badly fragmented file with qemu-img bench (the time difference while creating the file is already remarkable) and then looks at the number of extents and the time a simple "qemu-img map" takes. Without an extent size hint: $ ./qemu-img create -f raw -o extent_size_hint=0 ~/tmp/test.raw 10G Formatting '/home/kwolf/tmp/test.raw', fmt=raw size=10737418240 extent_size_hint=0 $ ./qemu-img bench -f raw -t none -n -w ~/tmp/test.raw -c 1000000 -S 8192 -o 0 Sending 1000000 write requests, 4096 bytes each, 64 in parallel (starting at offset 0, step size 8192) Run completed in 25.848 seconds. $ ./qemu-img bench -f raw -t none -n -w ~/tmp/test.raw -c 1000000 -S 8192 -o 4096 Sending 1000000 write requests, 4096 bytes each, 64 in parallel (starting at offset 4096, step size 8192) Run completed in 19.616 seconds. $ filefrag ~/tmp/test.raw /home/kwolf/tmp/test.raw: 2000000 extents found $ time ./qemu-img map ~/tmp/test.raw Offset Length Mapped to File 0 0x1e8480000 0 /home/kwolf/tmp/test.raw real 0m1,279s user 0m0,043s sys 0m1,226s With the new default extent size hint of 1 MB: $ ./qemu-img create -f raw -o extent_size_hint=1M ~/tmp/test.raw 10G Formatting '/home/kwolf/tmp/test.raw', fmt=raw size=10737418240 extent_size_hint=1048576 $ ./qemu-img bench -f raw -t none -n -w ~/tmp/test.raw -c 1000000 -S 8192 -o 0 Sending 1000000 write requests, 4096 bytes each, 64 in parallel (starting at offset 0, step size 8192) Run completed in 11.833 seconds. $ ./qemu-img bench -f raw -t none -n -w ~/tmp/test.raw -c 1000000 -S 8192 -o 4096 Sending 1000000 write requests, 4096 bytes each, 64 in parallel (starting at offset 4096, step size 8192) Run completed in 10.155 seconds. $ filefrag ~/tmp/test.raw /home/kwolf/tmp/test.raw: 178 extents found $ time ./qemu-img map ~/tmp/test.raw Offset Length Mapped to File 0 0x1e8480000 0 /home/kwolf/tmp/test.raw real 0m0,061s user 0m0,040s sys 0m0,014s Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20200707142329.48303-1-kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-07-14 15:18:59 +02:00
Eric Blake	00d69986da	nbd: Avoid off-by-one in long export name truncation When snprintf returns the same value as the buffer size, the final byte was truncated to ensure a NUL terminator. Fortunately, such long export names are unusual enough, with no real impact other than what is displayed to the user. Fixes: `5c86bdf120` Reported-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20200622210355.414941-1-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>	2020-07-13 09:01:01 -05:00
Xie Yongji	c58daf76a6	iscsi: return -EIO when sense fields are meaningless When an I/O request failed, now we only return correct value on scsi check condition. We should also have a default errno such as -EIO in other case. Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Message-Id: <20200701105444.3226-2-xieyongji@bytedance.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-07-10 18:02:23 -04:00
Xie Yongji	dd3b00202a	iscsi: handle check condition status in retry loop The handling of check condition was incorrect because we would only do it after retries exceed maximum. Fixes: `8c460269aa` ("iscsi: base all handling of check condition on scsi_sense_to_errno") Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Message-Id: <20200701105444.3226-1-xieyongji@bytedance.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-07-10 18:02:23 -04:00
Vladimir Sementsov-Ogievskiy	795d946d07	nbd: Use ERRP_GUARD() If we want to check error after errp-function call, we need to introduce local_err and then propagate it to errp. Instead, use the ERRP_GUARD() macro, benefits are: 1. No need of explicit error_propagate call 2. No need of explicit local_err variable: use errp directly 3. ERRP_GUARD() leaves errp as is if it's not NULL or &error_fatal, this means that we don't break error_abort (we'll abort on error_set, not on error_propagate) If we want to add some info to errp (by error_prepend() or error_append_hint()), we must use the ERRP_GUARD() macro. Otherwise, this info will not be added when errp == &error_fatal (the program will exit prior to the error_append_hint() or error_prepend() call). Fix several such cases, e.g. in nbd_read(). This commit is generated by command sed -n '/^Network Block Device (NBD)$/,/^$/{s/^F: //p}' \ MAINTAINERS \| \ xargs git ls-files \| grep '\.[hc]$' \| \ xargs spatch \ --sp-file scripts/coccinelle/errp-guard.cocci \ --macro-file scripts/cocci-macro-file.h \ --in-place --no-show-diff --max-width 80 Reported-by: Kevin Wolf <kwolf@redhat.com> Reported-by: Greg Kurz <groug@kaod.org> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> [Commit message tweaked] Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20200707165037.1026246-8-armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> [ERRP_AUTO_PROPAGATE() renamed to ERRP_GUARD(), and auto-propagated-errp.cocci to errp-guard.cocci. Commit message tweaked again.]	2020-07-10 15:18:09 +02:00
Markus Armbruster	386f6c07d2	error: Avoid error_propagate() after migrate_add_blocker() When migrate_add_blocker(blocker, &errp) is followed by error_propagate(errp, err), we can often just as well do migrate_add_blocker(..., errp). Do that with this Coccinelle script: @@ expression blocker, err, errp; expression ret; @@ - ret = migrate_add_blocker(blocker, &err); - if (err) { + ret = migrate_add_blocker(blocker, errp); + if (ret < 0) { ... when != err; - error_propagate(errp, err); ... } @@ expression blocker, err, errp; @@ - migrate_add_blocker(blocker, &err); - if (err) { + if (migrate_add_blocker(blocker, errp) < 0) { ... when != err; - error_propagate(errp, err); ... } Double-check @err is not used afterwards. Dereferencing it would be use after free, but checking whether it's null would be legitimate. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20200707160613.848843-43-armbru@redhat.com>	2020-07-10 15:18:08 +02:00
Markus Armbruster	b11a093c60	qapi: Smooth another visitor error checking pattern Convert visit_type_FOO(v, ..., &ptr, &err); ... if (err) { ... } to visit_type_FOO(v, ..., &ptr, errp); ... if (!ptr) { ... } for functions that set @ptr to non-null / null on success / error. Eliminate error_propagate() that are now unnecessary. Delete @err that are now unused. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200707160613.848843-40-armbru@redhat.com>	2020-07-10 15:18:08 +02:00
Markus Armbruster	4bc6d7ee0e	block/parallels: Simplify parallels_open() after previous commit Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20200707160613.848843-39-armbru@redhat.com>	2020-07-10 15:18:08 +02:00
Markus Armbruster	a5f9b9df25	error: Reduce unnecessary error propagation When all we do with an Error we receive into a local variable is propagating to somewhere else, we can just as well receive it there right away, even when we need to keep error_propagate() for other error paths. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200707160613.848843-38-armbru@redhat.com>	2020-07-10 15:18:08 +02:00
Markus Armbruster	992861fb1e	error: Eliminate error_propagate() manually When all we do with an Error we receive into a local variable is propagating to somewhere else, we can just as well receive it there right away. The previous two commits did that for sufficiently simple cases with Coccinelle. Do it for several more manually. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200707160613.848843-37-armbru@redhat.com>	2020-07-10 15:18:08 +02:00
Markus Armbruster	af175e85f9	error: Eliminate error_propagate() with Coccinelle, part 2 When all we do with an Error we receive into a local variable is propagating to somewhere else, we can just as well receive it there right away. The previous commit did that with a Coccinelle script I consider fairly trustworthy. This commit uses the same script with the matching of return taken out, i.e. we convert if (!foo(..., &err)) { ... error_propagate(errp, err); ... } to if (!foo(..., errp)) { ... ... } This is unsound: @err could still be read between afterwards. I don't know how to express "no read of @err without an intervening write" in Coccinelle. Instead, I manually double-checked for uses of @err. Suboptimal line breaks tweaked manually. qdev_realize() simplified further to placate scripts/checkpatch.pl. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200707160613.848843-36-armbru@redhat.com>	2020-07-10 15:18:08 +02:00
Markus Armbruster	668f62ec62	error: Eliminate error_propagate() with Coccinelle, part 1 When all we do with an Error we receive into a local variable is propagating to somewhere else, we can just as well receive it there right away. Convert if (!foo(..., &err)) { ... error_propagate(errp, err); ... return ... } to if (!foo(..., errp)) { ... ... return ... } where nothing else needs @err. Coccinelle script: @rule1 forall@ identifier fun, err, errp, lbl; expression list args, args2; binary operator op; constant c1, c2; symbol false; @@ if ( ( - fun(args, &err, args2) + fun(args, errp, args2) \| - !fun(args, &err, args2) + !fun(args, errp, args2) \| - fun(args, &err, args2) op c1 + fun(args, errp, args2) op c1 ) ) { ... when != err when != lbl: when strict - error_propagate(errp, err); ... when != err ( return; \| return c2; \| return false; ) } @rule2 forall@ identifier fun, err, errp, lbl; expression list args, args2; expression var; binary operator op; constant c1, c2; symbol false; @@ - var = fun(args, &err, args2); + var = fun(args, errp, args2); ... when != err if ( ( var \| !var \| var op c1 ) ) { ... when != err when != lbl: when strict - error_propagate(errp, err); ... when != err ( return; \| return c2; \| return false; \| return var; ) } @depends on rule1 \|\| rule2@ identifier err; @@ - Error *err = NULL; ... when != err Not exactly elegant, I'm afraid. The "when != lbl:" is necessary to avoid transforming if (fun(args, &err)) { goto out } ... out: error_propagate(errp, err); even though other paths to label out still need the error_propagate(). For an actual example, see sclp_realize(). Without the "when strict", Coccinelle transforms vfio_msix_setup(), incorrectly. I don't know what exactly "when strict" does, only that it helps here. The match of return is narrower than what I want, but I can't figure out how to express "return where the operand doesn't use @err". For an example where it's too narrow, see vfio_intx_enable(). Silently fails to convert hw/arm/armsse.c, because Coccinelle gets confused by ARMSSE being used both as typedef and function-like macro there. Converted manually. Line breaks tidied up manually. One nested declaration of @local_err deleted manually. Preexisting unwanted blank line dropped in hw/riscv/sifive_e.c. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200707160613.848843-35-armbru@redhat.com>	2020-07-10 15:18:08 +02:00
Markus Armbruster	dcfe480544	error: Avoid unnecessary error_propagate() after error_setg() Replace error_setg(&err, ...); error_propagate(errp, err); by error_setg(errp, ...); Related pattern: if (...) { error_setg(&err, ...); goto out; } ... out: error_propagate(errp, err); return; When all paths to label out are that way, replace by if (...) { error_setg(errp, ...); return; } and delete the label along with the error_propagate(). When we have at most one other path that actually needs to propagate, and maybe one at the end that where propagation is unnecessary, e.g. foo(..., &err); if (err) { goto out; } ... bar(..., &err); out: error_propagate(errp, err); return; move the error_propagate() to where it's needed, like if (...) { foo(..., &err); error_propagate(errp, err); return; } ... bar(..., errp); return; and transform the error_setg() as above. In some places, the transformation results in obviously unnecessary error_propagate(). The next few commits will eliminate them. Bonus: the elimination of gotos will make later patches in this series easier to review. Candidates for conversion tracked down with this Coccinelle script: @@ identifier err, errp; expression list args; @@ - error_setg(&err, args); + error_setg(errp, args); ... when != err error_propagate(errp, err); Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200707160613.848843-34-armbru@redhat.com>	2020-07-10 15:18:08 +02:00
Markus Armbruster	14217038bc	qapi: Use returned bool to check for failure, manual part The previous commit used Coccinelle to convert from checking the Error object to checking the return value. Convert a few more manually. Also tweak control flow in places to conform to the conventional "if error bail out" pattern. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20200707160613.848843-20-armbru@redhat.com>	2020-07-10 15:18:08 +02:00
Markus Armbruster	62a35aaa31	qapi: Use returned bool to check for failure, Coccinelle part The previous commit enables conversion of visit_foo(..., &err); if (err) { ... } to if (!visit_foo(..., errp)) { ... } for visitor functions that now return true / false on success / error. Coccinelle script: @@ identifier fun =~ "check_list\|input_type_enum\|lv_start_struct\|lv_type_bool\|lv_type_int64\|lv_type_str\|lv_type_uint64\|output_type_enum\|parse_type_bool\|parse_type_int64\|parse_type_null\|parse_type_number\|parse_type_size\|parse_type_str\|parse_type_uint64\|print_type_bool\|print_type_int64\|print_type_null\|print_type_number\|print_type_size\|print_type_str\|print_type_uint64\|qapi_clone_start_alternate\|qapi_clone_start_list\|qapi_clone_start_struct\|qapi_clone_type_bool\|qapi_clone_type_int64\|qapi_clone_type_null\|qapi_clone_type_number\|qapi_clone_type_str\|qapi_clone_type_uint64\|qapi_dealloc_start_list\|qapi_dealloc_start_struct\|qapi_dealloc_type_anything\|qapi_dealloc_type_bool\|qapi_dealloc_type_int64\|qapi_dealloc_type_null\|qapi_dealloc_type_number\|qapi_dealloc_type_str\|qapi_dealloc_type_uint64\|qobject_input_check_list\|qobject_input_check_struct\|qobject_input_start_alternate\|qobject_input_start_list\|qobject_input_start_struct\|qobject_input_type_any\|qobject_input_type_bool\|qobject_input_type_bool_keyval\|qobject_input_type_int64\|qobject_input_type_int64_keyval\|qobject_input_type_null\|qobject_input_type_number\|qobject_input_type_number_keyval\|qobject_input_type_size_keyval\|qobject_input_type_str\|qobject_input_type_str_keyval\|qobject_input_type_uint64\|qobject_input_type_uint64_keyval\|qobject_output_start_list\|qobject_output_start_struct\|qobject_output_type_any\|qobject_output_type_bool\|qobject_output_type_int64\|qobject_output_type_null\|qobject_output_type_number\|qobject_output_type_str\|qobject_output_type_uint64\|start_list\|visit_check_list\|visit_check_struct\|visit_start_alternate\|visit_start_list\|visit_start_struct\|visit_type_."; expression list args; typedef Error; Error err; @@ - fun(args, &err); - if (err) + if (!fun(args, &err)) { ... } A few line breaks tidied up manually. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20200707160613.848843-19-armbru@redhat.com>	2020-07-10 15:18:08 +02:00
Markus Armbruster	235e59cf03	qemu-option: Use returned bool to check for failure The previous commit enables conversion of foo(..., &err); if (err) { ... } to if (!foo(..., &err)) { ... } for QemuOpts functions that now return true / false on success / error. Coccinelle script: @@ identifier fun = { opts_do_parse, parse_option_bool, parse_option_number, parse_option_size, qemu_opt_parse, qemu_opt_rename, qemu_opt_set, qemu_opt_set_bool, qemu_opt_set_number, qemu_opts_absorb_qdict, qemu_opts_do_parse, qemu_opts_from_qdict_entry, qemu_opts_set, qemu_opts_validate }; expression list args, args2; typedef Error; Error *err; @@ - fun(args, &err, args2); - if (err) + if (!fun(args, &err, args2)) { ... } A few line breaks tidied up manually. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20200707160613.848843-15-armbru@redhat.com> [Conflict with commit `0b6786a9c1` "block/amend: refactor qcow2 amend options" resolved by rerunning Coccinelle on master's version]	2020-07-10 15:17:35 +02:00
Markus Armbruster	c6ecec43b2	qemu-option: Check return value instead of @err where convenient Convert uses like opts = qemu_opts_create(..., &err); if (err) { ... } to opts = qemu_opts_create(..., errp); if (!opts) { ... } Eliminate error_propagate() that are now unnecessary. Delete @err that are now unused. Note that we can't drop parallels_open()'s error_propagate() here. We continue to execute it even in the converted case. It's a no-op then: local_err is null. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Greg Kurz <groug@kaod.org> Message-Id: <20200707160613.848843-8-armbru@redhat.com>	2020-07-10 15:01:06 +02:00
Eric Blake	365fed5111	qed: Simplify backing reads The other four drivers that support backing files (qcow, qcow2, parallels, vmdk) all rely on the block layer to populate zeroes when reading beyond EOF of a short backing file. We can simplify the qed code by doing likewise. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20200528094405.145708-11-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-07-06 10:34:14 +02:00
Vladimir Sementsov-Ogievskiy	a2adbbf603	block: drop unallocated_blocks_are_zero Currently this field only set by qed and qcow2. But in fact, all backing-supporting formats (parallels, qcow, qcow2, qed, vmdk) share these semantics: on unallocated blocks, if there is no backing file they just memset the buffer with zeroes. So, document this behavior for .supports_backing and drop .unallocated_blocks_are_zero Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200528094405.145708-10-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-07-06 10:34:14 +02:00
Vladimir Sementsov-Ogievskiy	cdf9ebf18f	block/vhdx: drop unallocated_blocks_are_zero vhdx doesn't have .bdrv_co_block_status handler, so DATA\|ALLOCATED is always assumed for it in bdrv_co_block_status(). unallocated_blocks_are_zero is useless (it doesn't affect the only user of the field: bdrv_co_block_status()), drop it. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200528094405.145708-9-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-07-06 10:34:14 +02:00
Vladimir Sementsov-Ogievskiy	ac9185603e	block/file-posix: drop unallocated_blocks_are_zero raw_co_block_status() in block/file-posix.c never returns 0, so unallocated_blocks_are_zero is useless (it doesn't affect the only user of the field: bdrv_co_block_status()). Drop it. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200528094405.145708-8-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-07-06 10:34:14 +02:00
Vladimir Sementsov-Ogievskiy	32d293c8c6	block/iscsi: drop unallocated_blocks_are_zero We set bdi->unallocated_blocks_are_zero = iscsilun->lbprz, but iscsi_co_block_status doesn't return 0 in case of iscsilun->lbprz, it returns ZERO when appropriate. So actually unallocated_blocks_are_zero is useless (it doesn't affect the only user of the field: bdrv_co_block_status()). Drop it now. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200528094405.145708-7-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-07-06 10:34:14 +02:00
Vladimir Sementsov-Ogievskiy	74036395ea	block/crypto: drop unallocated_blocks_are_zero It's false by default, no needs to set it. We are going to drop this variable at all, so drop it now here, it doesn't hurt. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200528094405.145708-6-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-07-06 10:34:14 +02:00
Vladimir Sementsov-Ogievskiy	2c060c0f50	block/vpc: return ZERO block-status when appropriate In case when get_image_offset() returns -1, we do zero out the corresponding chunk of qiov. So, this should be reported as ZERO. Note that this changes visible output of "qemu-img map --output=json" and "qemu-io -c map" commands. For qemu-img map, the change is obvious: we just mark as zero what is really zero. For qemu-io it's less obvious: what was unallocated now is allocated. There is an inconsistency in understanding of unallocated regions in Qemu: backing-supporting format-drivers return 0 block-status to report go-to-backing logic for this area. Some protocol-drivers (iscsi) return 0 to report fs-unallocated-non-zero status (i.e., don't occupy space on disk, read result is undefined). BDRV_BLOCK_ALLOCATED is defined as something more close to go-to-backing logic. Still it is calculated as ZERO \| DATA, so 0 from iscsi is treated as unallocated. It doesn't influence backing-chain behavior, as iscsi can't have backing file. But it does influence "qemu-io -c map". We should solve this inconsistency at some future point. Now, let's just make backing-not-supporting format drivers (vdi in the previous patch and vpc now) to behave more like backing-supporting drivers and not report 0 block-status. More over, returning ZERO status is absolutely valid thing, and again, corresponds to how the other format-drivers (backing-supporting) work. After block-status update, it never reports 0, so setting unallocated_blocks_are_zero doesn't make sense (as the only user of it is bdrv_co_block_status and it checks unallocated_blocks_are_zero only for unallocated areas). Drop it. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200528094405.145708-5-vsementsov@virtuozzo.com> [mreitz: qemu-io -c map as used by iotest 146 now reports everything as allocated; in order to make the test do something useful, we use qemu-img map --output=json now] Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-07-06 10:32:38 +02:00
Vladimir Sementsov-Ogievskiy	2ea0332f42	block/vdi: return ZERO block-status when appropriate In case of !VDI_IS_ALLOCATED[], we do zero out the corresponding chunk of qiov. So, this should be reported as ZERO. Note that this changes visible output of "qemu-img map --output=json" and "qemu-io -c map" commands. For qemu-img map, the change is obvious: we just mark as zero what is really zero. For qemu-io it's less obvious: what was unallocated now is allocated. There is an inconsistency in understanding of unallocated regions in Qemu: backing-supporting format-drivers return 0 block-status to report go-to-backing logic for this area. Some protocol-drivers (iscsi) return 0 to report fs-unallocated-non-zero status (i.e., don't occupy space on disk, read result is undefined). BDRV_BLOCK_ALLOCATED is defined as something more close to go-to-backing logic. Still it is calculated as ZERO \| DATA, so 0 from iscsi is treated as unallocated. It doesn't influence backing-chain behavior, as iscsi can't have backing file. But it does influence "qemu-io -c map". We should solve this inconsistency at some future point. Now, let's just make backing-not-supporting format drivers (vdi at this patch and vpc with the following) to behave more like backing-supporting drivers and not report 0 block-status. More over, returning ZERO status is absolutely valid thing, and again, corresponds to how the other format-drivers (backing-supporting) work. After block-status update, it never reports 0, so setting unallocated_blocks_are_zero doesn't make sense (as the only user of it is bdrv_co_block_status and it checks unallocated_blocks_are_zero only for unallocated areas). Drop it. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200528094405.145708-4-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-07-06 08:49:28 +02:00
Vladimir Sementsov-Ogievskiy	7b1efe996c	block: inline bdrv_unallocated_blocks_are_zero() The function has only one user: bdrv_co_block_status(). Inline it to simplify reviewing of the following patches, which will finally drop unallocated_blocks_are_zero field too. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200528094405.145708-3-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-07-06 08:49:28 +02:00
Maxim Levitsky	8ea1613d91	block/qcow2: implement blockdev-amend Currently the implementation only supports amending the encryption options, unlike the qemu-img version Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200608094030.670121-14-mlevitsk@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-07-06 08:49:28 +02:00
Maxim Levitsky	30da9dd88a	block/crypto: implement blockdev-amend Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200608094030.670121-13-mlevitsk@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-07-06 08:49:28 +02:00
Maxim Levitsky	ced914d0ab	block/core: add generic infrastructure for x-blockdev-amend qmp command blockdev-amend will be used similiar to blockdev-create to allow on the fly changes of the structure of the format based block devices. Current plan is to first support encryption keyslot management for luks based formats (raw and embedded in qcow2) Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <20200608094030.670121-12-mlevitsk@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-07-06 08:49:28 +02:00
Maxim Levitsky	90766d9db9	block/qcow2: extend qemu-img amend interface with crypto options Now that we have all the infrastructure in place, wire it in the qcow2 driver and expose this to the user. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200608094030.670121-9-mlevitsk@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-07-06 08:49:28 +02:00
Maxim Levitsky	bbfdae91fb	block/crypto: implement the encryption key management This implements the encryption key management using the generic code in qcrypto layer and exposes it to the user via qemu-img This code adds another 'write_func' because the initialization write_func works directly on the underlying file, and amend works on instance of luks device. This commit also adds a 'hack/workaround' I and Kevin Wolf (thanks) made to make the driver both support write sharing (to avoid breaking the users), and be safe against concurrent metadata update (the keyslots) Eventually the write sharing for luks driver will be deprecated and removed together with this hack. The hack is that we ask (as a format driver) for BLK_PERM_CONSISTENT_READ and then when we want to update the keys, we unshare that permission. So if someone else has the image open, even readonly, encryption key update will fail gracefully. Also thanks to Daniel Berrange for the idea of unsharing read, rather that write permission which allows to avoid cases when the other user had opened the image read-only. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200608094030.670121-8-mlevitsk@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-07-06 08:49:28 +02:00
Maxim Levitsky	e0d0ddc591	block/crypto: rename two functions rename the write_func to create_write_func, and init_func to create_init_func. This is preparation for other write_func that will be used to update the encryption keys. No functional changes Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <20200608094030.670121-7-mlevitsk@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-07-06 08:49:28 +02:00
Maxim Levitsky	0b6786a9c1	block/amend: refactor qcow2 amend options Some qcow2 create options can't be used for amend. Remove them from the qcow2 create options and add generic logic to detect such options in qemu-img Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> [mreitz: Dropped some iotests reference output hunks that became unnecessary thanks to "iotests: Make _filter_img_create more active"] Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200625125548.870061-12-mreitz@redhat.com>	2020-07-06 08:49:28 +02:00
Maxim Levitsky	df373fb0a3	block/amend: separate amend and create options for qemu-img Some options are only useful for creation (or hard to be amended, like cluster size for qcow2), while some other options are only useful for amend, like upcoming keyslot management options for luks Since currently only qcow2 supports amend, move all its options to a common macro and then include it in each action option list. In future it might be useful to remove some options which are not supported anyway from amend list, which currently cause an error message if amended. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200608094030.670121-5-mlevitsk@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-07-06 08:49:28 +02:00
Maxim Levitsky	a3579bfa0a	block/amend: add 'force' option 'force' option will be used for some unsafe amend operations. This includes things like erasing last keyslot in luks based formats (which destroys the data, unless the master key is backed up by external means), but that _might_ be desired result. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200608094030.670121-4-mlevitsk@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-07-06 08:49:28 +02:00
Maxim Levitsky	43cbd06df2	qcrypto/core: add generic infrastructure for crypto options amendment This will be used first to implement luks keyslot management. block_crypto_amend_opts_init will be used to convert qemu-img cmdline to QCryptoBlockAmendOptions Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <20200608094030.670121-2-mlevitsk@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-07-06 08:49:28 +02:00
Alberto Garcia	a5675f3901	qcow2: Fix preallocation on images with unaligned sizes When resizing an image with qcow2_co_truncate() using the falloc or full preallocation modes the code assumes that both the old and new sizes are cluster-aligned. There are two problems with this: 1) The calculation of how many clusters are involved does not always get the right result. Example: creating a 60KB image and resizing it (with preallocation=full) to 80KB won't allocate the second cluster. 2) No copy-on-write is performed, so in the previous example if there is a backing file then the first 60KB of the first cluster won't be filled with data from the backing file. This patch fixes both issues. Signed-off-by: Alberto Garcia <berto@igalia.com> Message-Id: <20200617140036.20311-1-berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-07-06 08:33:06 +02:00
Vladimir Sementsov-Ogievskiy	e8de7ba9ea	block/block-copy: block_copy_dirty_clusters: fix failure check ret may be > 0 on success path at this point. Fix assertion, which may crash currently. Fixes: `4ce5dd3e9b` Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20200526181347.489557-1-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-07-06 08:33:06 +02:00
Kevin Wolf	3dfa23b9ef	vvfat: Fix array_remove_slice() array_remove_slice() calls array_roll() with array->next - 1 as the destination index. This is only correct for count == 1, otherwise we're writing past the end of the array. array->next - count would be correct. However, this is the only place ever calling array_roll(), so this rather complicated operation isn't even necessary. Fix the problem and simplify the code by replacing it with a single memmove() call. array_roll() can now be removed. Reported-by: Nathan Huckleberry <nhuck15@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20200623175534.38286-3-kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-07-03 09:37:03 +02:00
Kevin Wolf	c79e243ed6	vvfat: Check that updated filenames are valid FAT allows only a restricted set of characters in file names, and for some of the illegal characters, it's actually important that we catch them: If filenames can contain '/', the guest can construct filenames containing "../" and escape from the assigned vvfat directory. The same problem could arise if ".." was ever accepted as a literal filename. Fix this by adding a check that all filenames are valid in check_directory_consistency(). Reported-by: Nathan Huckleberry <nhuck15@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20200623175534.38286-2-kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-07-03 09:37:03 +02:00
Stefan Hajnoczi	7838c67f22	block/nvme: support nested aio_poll() QEMU block drivers are supposed to support aio_poll() from I/O completion callback functions. This means completion processing must be re-entrant. The standard approach is to schedule a BH during completion processing and cancel it at the end of processing. If aio_poll() is invoked by a callback function then the BH will run. The BH continues the suspended completion processing. All of this means that request A's cb() can synchronously wait for request B to complete. Previously the nvme block driver would hang because it didn't process completions from nested aio_poll(). Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Sergio Lopez <slp@redhat.com> Message-id: 20200617132201.1832152-8-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-06-23 15:46:08 +01:00
Stefan Hajnoczi	b75fd5f554	block/nvme: keep BDRVNVMeState pointer in NVMeQueuePair Passing around both BDRVNVMeState and NVMeQueuePair is unwieldy. Reduce the number of function arguments by keeping the BDRVNVMeState pointer in NVMeQueuePair. This will come in handly when a BH is introduced in a later patch and only one argument can be passed to it. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Sergio Lopez <slp@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20200617132201.1832152-7-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-06-23 15:46:08 +01:00
Stefan Hajnoczi	a5db74f324	block/nvme: clarify that free_req_queue is protected by q->lock Existing users access free_req_queue under q->lock. Document this. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Sergio Lopez <slp@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20200617132201.1832152-6-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-06-23 15:46:08 +01:00
Stefan Hajnoczi	1086e95da1	block/nvme: switch to a NVMeRequest freelist There are three issues with the current NVMeRequest->busy field: 1. The busy field is accidentally accessed outside q->lock when request submission fails. 2. Waiters on free_req_queue are not woken when a request is returned early due to submission failure. 2. Finding a free request involves scanning all requests. This makes request submission O(n^2). Switch to an O(1) freelist that is always accessed under the lock. Also differentiate between NVME_QUEUE_SIZE, the actual SQ/CQ size, and NVME_NUM_REQS, the number of usable requests. This makes the code simpler than using NVME_QUEUE_SIZE everywhere and having to keep in mind that one slot is reserved. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Sergio Lopez <slp@redhat.com> Message-id: 20200617132201.1832152-5-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-06-23 15:46:08 +01:00
Stefan Hajnoczi	04b3fb39c8	block/nvme: don't access CQE after moving cq.head Do not access a CQE after incrementing q->cq.head and releasing q->lock. It is unlikely that this causes problems in practice but it's a latent bug. The reason why it should be safe at the moment is that completion processing is not re-entrant and the CQ doorbell isn't written until the end of nvme_process_completion(). Make this change now because QEMU expects completion processing to be re-entrant and later patches will do that. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Sergio Lopez <slp@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20200617132201.1832152-4-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-06-23 15:46:08 +01:00
Stefan Hajnoczi	d38253cf8b	block/nvme: drop tautologous assertion nvme_process_completion() explicitly checks cid so the assertion that follows is always true: if (cid == 0 \|\| cid > NVME_QUEUE_SIZE) { ... continue; } assert(cid <= NVME_QUEUE_SIZE); Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Sergio Lopez <slp@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20200617132201.1832152-3-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-06-23 15:46:08 +01:00
Stefan Hajnoczi	2446e0e2e9	block/nvme: poll queues without q->lock A lot of CPU time is spent simply locking/unlocking q->lock during polling. Check for completion outside the lock to make q->lock disappear from the profile. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Sergio Lopez <slp@redhat.com> Message-id: 20200617132201.1832152-2-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-06-23 15:46:08 +01:00
Eric Blake	f17d684770	qcow2: Tweak comments on qcow2_get_persistent_dirty_bitmap_size For now, we don't have persistent bitmaps in any other formats, but that might not be true in the future. Make it obvious that our incoming parameter is not necessarily a qcow2 image, and therefore is limited to just the bdrv_dirty_bitmap_* API calls (rather than probing into qcow2 internals). Suggested-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20200608190821.3293867-1-eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-06-17 14:53:39 +02:00
Eric Blake	e37adbebd1	block: Refactor subdirectory recursion during make Rather than listing block/monitor from the top-level Makefile.objs, we should instead list monitor from block/Makefile.objs. Suggested-by: Kevin Wolf <kwolf@redhat.com> Fixes: `bb4e58c613` Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20200608173339.3244211-1-eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-06-17 14:53:39 +02:00
Eric Blake	5c86bdf120	block: Call attention to truncation of long NBD exports Commit `93676c88` relaxed our NBD client code to request export names up to the NBD protocol maximum of 4096 bytes without NUL terminator, even though the block layer can't store anything longer than 4096 bytes including NUL terminator for display to the user. Since this means there are some export names where we have to truncate things, we can at least try to make the truncation a bit more obvious for the user. Note that in spite of the truncated display name, we can still communicate with an NBD server using such a long export name; this was deemed nicer than refusing to even connect to such a server (since the server may not be under our control, and since determining our actual length limits gets tricky when nbd://host:port/export and nbd+unix:///export?socket=/path are themselves variable-length expansions beyond the export name but count towards the block layer name length). Reported-by: Xueqiang Wei <xuwei@redhat.com> Fixes: https://bugzilla.redhat.com/1843684 Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20200610163741.3745251-3-eblake@redhat.com>	2020-06-10 12:58:59 -05:00
Vladimir Sementsov-Ogievskiy	7d2410cea1	block: Factor out bdrv_run_co() We have a few bdrv_*() functions that can either spawn a new coroutine and wait for it with BDRV_POLL_WHILE() or use a fastpath if they are alreeady running in a coroutine. All of them duplicate basically the same code. Factor the common code into a new function bdrv_run_co(). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20200520144901.16589-1-vsementsov@virtuozzo.com [Factor out bdrv_run_co_entry too] Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-06-05 09:54:48 +01:00
Stefano Garzarella	769335ecb1	io_uring: use io_uring_cq_ready() to check for ready cqes In qemu_luring_poll_cb() we are not using the cqe peeked from the CQ ring. We are using io_uring_peek_cqe() only to see if there are cqes ready, so we can replace it with io_uring_cq_ready(). Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Message-id: 20200519134942.118178-1-sgarzare@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-06-05 09:54:48 +01:00
Stefano Garzarella	b4e44c9944	io_uring: retry io_uring_submit() if it fails with errno=EINTR As recently documented [1], io_uring_enter(2) syscall can return an error (errno=EINTR) if the operation was interrupted by a delivery of a signal before it could complete. This should happen when IORING_ENTER_GETEVENTS flag is used, for example during io_uring_submit_and_wait() or during io_uring_submit() when IORING_SETUP_IOPOLL is enabled. We shouldn't have this problem for now, but it's better to prevent it. [1] `344355ec66` Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Message-id: 20200519133041.112138-1-sgarzare@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-06-05 09:54:48 +01:00
Eric Blake	5d72c68b49	qcow2: Expose bitmaps' size during measure It's useful to know how much space can be occupied by qcow2 persistent bitmaps, even though such metadata is unrelated to the guest-visible data. Report this value as an additional QMP field, present when measuring an existing image and output format that both support bitmaps. Update iotest 178 and 190 to updated output, as well as new coverage in 190 demonstrating non-zero values made possible with the recently-added qemu-img bitmap command (see `3b51ab4b`). The new 'bitmaps size:' field is displayed automatically as part of 'qemu-img measure' any time it is present in QMP (that is, any time both the source image being measured and destination format support bitmaps, even if the measurement is 0 because there are no bitmaps present). If the field is absent, it means that no bitmaps can be copied (source, destination, or both lack bitmaps, including when measuring based on size rather than on a source image). This behavior is compatible with an upcoming patch adding 'qemu-img convert --bitmaps': that command will fail in the same situations where this patch omits the field. The addition of a new field demonstrates why we should always zero-initialize qapi C structs; while the qcow2 driver still fully populates all fields, the raw and crypto drivers had to be tweaked to avoid uninitialized data. Consideration was also given towards having a 'qemu-img measure --bitmaps' which errors out when bitmaps are not possible, and otherwise sums the bitmaps into the existing allocation totals rather than displaying as a separate field, as a potential convenience factor. But this was ultimately decided to be more complexity than necessary when the QMP interface was sufficient enough with bitmaps remaining a separate field. See also: https://bugzilla.redhat.com/1779904 Reported-by: Nir Soffer <nsoffer@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20200521192137.1120211-3-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>	2020-05-28 13:16:16 -05:00
Vladimir Sementsov-Ogievskiy	7ae89a0de9	block/dirty-bitmap: add bdrv_has_named_bitmaps helper To be used for bitmap migration in further commit. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200521220648.3255-3-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2020-05-28 13:15:22 -05:00
Eric Blake	bb4e58c613	blockdev: Split off basic bitmap operations for qemu-img Upcoming patches want to add some basic bitmap manipulation abilities to qemu-img. But blockdev.o is too heavyweight to link into qemu-img (among other things, it would drag in block jobs and transaction support - qemu-img does offline manipulation, where atomicity is less important because there are no concurrent modifications to compete with), so it's time to split off the bare bones of what we will need into a new file block/monitor/bitmap-qmp-cmds.o. This is sufficient to expose 6 QMP commands for use by qemu-img (add, remove, clear, enable, disable, merge), as well as move the three helper functions touched in the previous patch. Regarding MAINTAINERS, the new file is automatically part of block core, but also makes sense as related to other dirty bitmap files. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200513011648.166876-6-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>	2020-05-19 10:32:14 -05:00
Eric Blake	ef893b5c84	block: Make it easier to learn which BDS support bitmaps Upcoming patches will enhance bitmap support in qemu-img, but in doing so, it turns out to be nice to suppress output when persistent bitmaps make no sense (such as on a qcow2 v2 image). Add a hook to make this easier to query. This patch adds a new callback .bdrv_supports_persistent_dirty_bitmap, rather than trying to shoehorn the answer in via existing callbacks. In particular, while it might have been possible to overload .bdrv_co_can_store_new_dirty_bitmap to special-case a NULL input to answer whether any persistent bitmaps are supported, that is at odds with whether a particular bitmap can be stored (for example, even on an image that supports persistent bitmaps but has currently filled up the maximum number of bitmaps, attempts to store another one should fail); and the new functionality doesn't require coroutine safety. Similarly, we could have added one more piece of information to .bdrv_get_info, but then again, most callers to that function tend to already discard extraneous information, and making it a catch-all rather than a series of dedicated scalar queries hasn't really simplified life. In the future, when we improve the ability to look up bitmaps through a filter, we will probably also want to teach the block layer to automatically let filters pass this request on through. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20200513011648.166876-4-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>	2020-05-19 10:32:14 -05:00
Philippe Mathieu-Daudé	d7eca54222	block/block-copy: Simplify block_copy_do_copy() block_copy_do_copy() is static, only used in block_copy_task_entry with the error_is_read argument set. No need to check for it, simplify. Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200507121129.29760-3-philmd@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-05-18 19:05:25 +02:00
Philippe Mathieu-Daudé	c78dd00e35	block/block-copy: Fix uninitialized variable in block_copy_task_entry Fix when building with -Os: CC block/block-copy.o block/block-copy.c: In function ‘block_copy_task_entry’: block/block-copy.c:428:38: error: ‘error_is_read’ may be used uninitialized in this function [-Werror=maybe-uninitialized] 428 \| t->call_state->error_is_read = error_is_read; \| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~ Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200507121129.29760-2-philmd@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-05-18 19:05:25 +02:00
Max Reitz	e5d8a40685	block: Drop @child_class from bdrv_child_perm() Implementations should decide the necessary permissions based on @role. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200513110544.176672-35-mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-05-18 19:05:25 +02:00
Max Reitz	1f38f04eac	block: Pass BdrvChildRole in remaining cases These calls have no real use for the child role yet, but it will not harm to give one. Notably, the bdrv_root_attach_child() call in blockjob.c is left unmodified because there is not much the generic BlockJob object wants from its children. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200513110544.176672-34-mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-05-18 19:05:25 +02:00
Max Reitz	69dca43d6b	block: Use bdrv_default_perms() bdrv_default_perms() can decide which permission profile to use based on the BdrvChildRole, so block drivers do not need to select it explicitly. The blkverify driver now no longer shares the WRITE permission for the image to verify. We thus have to adjust two places in test-block-iothread not to take it. (Note that in theory, blkverify should behave like quorum in this regard and share neither WRITE nor RESIZE for both of its children. In practice, it does not really matter, because blkverify is used only for debugging, so we might as well keep its permissions rather liberal.) Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200513110544.176672-30-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-05-18 19:05:25 +02:00
Max Reitz	58944401d6	block: Use child_of_bds in remaining places Replace child_file by child_of_bds in all remaining places (excluding tests). Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200513110544.176672-28-mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-05-18 19:05:25 +02:00
Max Reitz	b3af2af43b	block: Make filter drivers use child_of_bds Note that some filters have secondary children, namely blkverify (the image to be verified) and blklogwrites (the log). This patch does not touch those children. Note that for blkverify, the filtered child should not be format-probed. While there is nothing enforcing this here, in practice, it will not be: blkverify implements .bdrv_file_open. The block layer ensures (and in fact, asserts) that BDRV_O_PROTOCOL is set for every BDS whose driver implements .bdrv_file_open. This flag will then be bequeathed to blkverify's children, and they will thus (by default) not be probed either. ("By default" refers to the fact that blkverify's other child (the non-filtered one) will have BDRV_O_PROTOCOL force-unset, because that is what happens for all non-filtered children of non-format drivers.) Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200513110544.176672-27-mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-05-18 19:05:25 +02:00
Max Reitz	8b1869daad	block: Make format drivers use child_of_bds Commonly, they need to pass the BDRV_CHILD_IMAGE set as the BdrvChildRole; but there are exceptions for drivers with external data files (qcow2 and vmdk). Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200513110544.176672-26-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-05-18 19:05:25 +02:00
Max Reitz	25191e5ff0	block: Make backing files child_of_bds children Make all parents of backing files pass the appropriate BdrvChildRole. By doing so, we can switch their BdrvChildClass over to the generic child_of_bds, which will do the right thing when given a correct BdrvChildRole. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200513110544.176672-24-mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-05-18 19:05:25 +02:00
Max Reitz	36ee58d13b	block: Switch child_format users to child_of_bds Both users (quorum and blkverify) use child_format for not-really-filtered children, so the appropriate BdrvChildRole in both cases is DATA. (Note that this will cause bdrv_inherited_options() to force-allow format probing.) Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200513110544.176672-22-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-05-18 19:05:25 +02:00
Max Reitz	500e243420	raw-format: Split raw_read_options() Split raw_read_options() into one function that actually just reads the options, and another that applies them. This will allow us to detect whether the user has specified any options before attaching the file child (so we can decide on its role based on the options). Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200513110544.176672-21-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-05-18 19:05:25 +02:00
Max Reitz	3cdc69d31b	block: Pass parent_is_format to .inherit_options() We plan to unify the generic .inherit_options() functions. The resulting common function will need to decide whether to force-enable format probing, force-disable it, or leave it as-is. To make this decision, it will need to know whether the parent node is a format node or not (because we never want format probing if the parent is a format node already (except for the backing chain)). Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200513110544.176672-9-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-05-18 19:05:25 +02:00
Max Reitz	272c02eaef	block: Pass BdrvChildRole to .inherit_options() For now, all callers (effectively) pass 0 and no callee evaluates thie value. Later patches will change both. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200513110544.176672-8-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-05-18 19:05:25 +02:00
Max Reitz	bf8e925eb5	block: Pass BdrvChildRole to bdrv_child_perm() For now, all callers pass 0 and no callee evaluates this value. Later patches will change both. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200513110544.176672-7-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-05-18 19:05:25 +02:00
Max Reitz	258b776515	block: Add BdrvChildRole to BdrvChild For now, it is always set to 0. Later patches in this series will ensure that all callers pass an appropriate combination of flags. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200513110544.176672-6-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-05-18 19:05:25 +02:00
Max Reitz	bd86fb990c	block: Rename BdrvChildRole to BdrvChildClass This structure nearly only contains parent callbacks for child state changes. It cannot really reflect a child's role, because different roles may overlap (as we will see when real roles are introduced), and because parents can have custom callbacks even when the child fulfills a standard role. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-Id: <20200513110544.176672-4-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-05-18 19:05:25 +02:00
Max Reitz	d67066d8bc	block: Add BlockDriver.is_format We want to unify child_format and child_file at some point. One of the important things that set format drivers apart from other drivers is that they do not expect other format nodes under them (except in the backing chain), i.e. we must not probe formats inside of formats. That means we need something on which to distinguish format drivers from others, and hence this flag. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-Id: <20200513110544.176672-3-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-05-18 19:05:25 +02:00
Max Reitz	6540fd153c	block: Mark commit, mirror, blkreplay as filters The commit, mirror, and blkreplay block nodes are filters, so they should be marked as such. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200513110544.176672-2-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-05-18 19:05:25 +02:00
Max Reitz	f844ec01b3	block: Use bdrv_make_empty() where possible Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20200429141126.85159-3-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-05-18 19:05:25 +02:00
Kevin Wolf	6ecbc6c526	replication: Avoid blk_make_empty() on read-only child This is just a bandaid to keep tests/test-replication working after bdrv_make_empty() starts to assert that we're not trying to call it on a read-only child. For the real solution in the future, replication should not steal the BdrvChild from its backing file (this is never correct to do!), but instead have its own child node references, with the appropriate permissions. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-05-18 19:05:25 +02:00
Max Reitz	2d97fde439	block: Use blk_make_empty() after commits bdrv_commit() already has a BlockBackend pointing to the BDS that we want to empty, it just has the wrong permissions. qemu-img commit has no BlockBackend pointing to the old backing file yet, but introducing one is simple. After this commit, bdrv_make_empty() is the only remaining caller of BlockDriver.bdrv_make_empty(). Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200429141126.85159-5-mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> [kwolf: Fixed up reference output for 098] Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-05-18 19:05:25 +02:00
Max Reitz	2b7bbdbdef	block: Add blk_make_empty() Two callers of BlockDriver.bdrv_make_empty() remain that should not call this method directly. Both do not have access to a BdrvChild, but they can use a BlockBackend, so we add this function that lets them use it. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200429141126.85159-4-mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-05-18 19:05:25 +02:00
Lukas Straub	e140f4b7b8	block/replication.c: Avoid cancelling the job twice If qemu in colo secondary mode is stopped, it crashes because s->backup_job is canceled twice: First with job_cancel_sync_all() in qemu_cleanup() and then in replication_stop(). Fix this by assigning NULL to s->backup_job when the job completes so replication_stop() and replication_do_checkpoint() won't touch the job. Signed-off-by: Lukas Straub <lukasstraub2@web.de> Message-Id: <20200511090801.7ed5d8f3@luklap> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-05-18 19:05:25 +02:00
Kevin Wolf	e83dd6808c	mirror: Make sure that source and target size match If the target is shorter than the source, mirror would copy data until it reaches the end of the target and then fail with an I/O error when trying to write past the end. If the target is longer than the source, the mirror job would complete successfully, but the target wouldn't actually be an accurate copy of the source image (it would contain some additional garbage at the end). Fix this by checking that both images have the same size when the job starts. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200511135825.219437-4-kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-05-18 19:05:24 +02:00
Markus Armbruster	d2623129a7	qom: Drop parameter @errp of object_property_add() & friends The only way object_property_add() can fail is when a property with the same name already exists. Since our property names are all hardcoded, failure is a programming error, and the appropriate way to handle it is passing &error_abort. Same for its variants, except for object_property_add_child(), which additionally fails when the child already has a parent. Parentage is also under program control, so this is a programming error, too. We have a bit over 500 callers. Almost half of them pass &error_abort, slightly fewer ignore errors, one test case handles errors, and the remaining few callers pass them to their own callers. The previous few commits demonstrated once again that ignoring programming errors is a bad idea. Of the few ones that pass on errors, several violate the Error API. The Error ** argument must be NULL, &error_abort, &error_fatal, or a pointer to a variable containing NULL. Passing an argument of the latter kind twice without clearing it in between is wrong: if the first call sets an error, it no longer points to NULL for the second call. ich9_pm_add_properties(), sparc32_ledma_realize(), sparc32_dma_realize(), xilinx_axidma_realize(), xilinx_enet_realize() are wrong that way. When the one appropriate choice of argument is &error_abort, letting users pick the argument is a bad idea. Drop parameter @errp and assert the preconditions instead. There's one exception to "duplicate property name is a programming error": the way object_property_add() implements the magic (and undocumented) "automatic arrayification". Don't drop @errp there. Instead, rename object_property_add() to object_property_try_add(), and add the obvious wrapper object_property_add(). Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20200505152926.18877-15-armbru@redhat.com> [Two semantic rebase conflicts resolved]	2020-05-15 07:07:58 +02:00
Vladimir Sementsov-Ogievskiy	fc9aefc8c0	block/block-copy: fix use-after-free of task pointer Obviously, we should g_free the task after trace point and offset update. Reported-by: Coverity (CID 1428756) Fixes: `4ce5dd3e9b` Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20200507183800.22626-1-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-05-13 14:20:31 +02:00
Denis Plotnikov	d298ac10ad	qcow2: add zstd cluster compression zstd significantly reduces cluster compression time. It provides better compression performance maintaining the same level of the compression ratio in comparison with zlib, which, at the moment, is the only compression method available. The performance test results: Test compresses and decompresses qemu qcow2 image with just installed rhel-7.6 guest. Image cluster size: 64K. Image on disk size: 2.2G The test was conducted with brd disk to reduce the influence of disk subsystem to the test results. The results is given in seconds. compress cmd: time ./qemu-img convert -O qcow2 -c -o compression_type=[zlib\|zstd] src.img [zlib\|zstd]_compressed.img decompress cmd time ./qemu-img convert -O qcow2 [zlib\|zstd]_compressed.img uncompressed.img compression decompression zlib zstd zlib zstd ------------------------------------------------------------ real 65.5 16.3 (-75 %) 1.9 1.6 (-16 %) user 65.0 15.8 5.3 2.5 sys 3.3 0.2 2.0 2.0 Both ZLIB and ZSTD gave the same compression ratio: 1.57 compressed image size in both cases: 1.4G Signed-off-by: Denis Plotnikov <dplotnikov@virtuozzo.com> QAPI part: Acked-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20200507082521.29210-4-dplotnikov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-05-13 14:20:31 +02:00
Denis Plotnikov	25dd077d1d	qcow2: rework the cluster compression routine The patch enables processing the image compression type defined for the image and chooses an appropriate method for image clusters (de)compression. Signed-off-by: Denis Plotnikov <dplotnikov@virtuozzo.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200507082521.29210-3-dplotnikov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-05-13 14:20:31 +02:00
Denis Plotnikov	572ad9783f	qcow2: introduce compression type feature The patch adds some preparation parts for incompatible compression type feature to qcow2 allowing the use different compression methods for image clusters (de)compressing. It is implied that the compression type is set on the image creation and can be changed only later by image conversion, thus compression type defines the only compression algorithm used for the image, and thus, for all image clusters. The goal of the feature is to add support of other compression methods to qcow2. For example, ZSTD which is more effective on compression than ZLIB. The default compression is ZLIB. Images created with ZLIB compression type are backward compatible with older qemu versions. Adding of the compression type breaks a number of tests because now the compression type is reported on image creation and there are some changes in the qcow2 header in size and offsets. The tests are fixed in the following ways: * filter out compression_type for many tests * fix header size, feature table size and backing file offset affected tests: 031, 036, 061, 080 header_size +=8: 1 byte compression type 7 bytes padding feature_table += 48: incompatible feature compression type backing_file_offset += 56 (8 + 48 -> header_change + feature_table_change) * add "compression type" for test output matching when it isn't filtered affected tests: 049, 060, 061, 065, 082, 085, 144, 182, 185, 198, 206, 242, 255, 274, 280 Signed-off-by: Denis Plotnikov <dplotnikov@virtuozzo.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> QAPI part: Acked-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20200507082521.29210-2-dplotnikov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-05-13 14:20:31 +02:00
Eric Blake	47e0b38a13	block: Drop unused .bdrv_has_zero_init_truncate Now that there are no clients of bdrv_has_zero_init_truncate, none of the drivers need to worry about providing it. What's more, this eliminates a source of some confusion: a literal reading of the documentation as written in `ceaca56f` and implemented in commit `1dcaf527` claims that a driver which returns 0 for bdrv_has_zero_init_truncate() must not return 1 for bdrv_has_zero_init(); this condition was violated for parallels, qcow, and sometimes for vdi, although in practice it did not matter since those drivers also lacked .bdrv_co_truncate. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20200428202905.770727-10-eblake@redhat.com> Acked-by: Richard W.M. Jones <rjones@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-05-08 13:26:35 +02:00
Eric Blake	dbc636e791	vhdx: Rework truncation logic The vhdx driver uses truncation for image growth, with a special case for blocks that already read as zero but which are only being partially written. But with a bit of rearranging, it's just as easy to defer the decision on whether truncation resulted in zeroes to the actual allocation attempt, reducing the number of places that still use bdrv_has_zero_init_truncate. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20200428202905.770727-9-eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-05-08 13:26:35 +02:00
Eric Blake	bda4cdcbb9	parallels: Rework truncation logic The parallels driver tries to use truncation for image growth, but can only do so when reads are guaranteed as zero. Now that we have a way to request zero contents from truncation, we can defer the decision to actual allocation attempts rather than up front, reducing the number of places that still use bdrv_has_zero_init_truncate. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20200428202905.770727-8-eblake@redhat.com> Reviewed-by: Denis V. Lunev <den@openvz.org> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-05-08 13:26:35 +02:00
Eric Blake	be9c9404db	ssh: Support BDRV_REQ_ZERO_WRITE for truncate Our .bdrv_has_zero_init_truncate can detect when the remote side always zero fills; we can reuse that same knowledge to implement BDRV_REQ_ZERO_WRITE by ignoring it when the server gives it to us for free. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20200428202905.770727-7-eblake@redhat.com> Reviewed-by: Richard W.M. Jones <rjones@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-05-08 13:26:35 +02:00
Eric Blake	fec00559e7	sheepdog: Support BDRV_REQ_ZERO_WRITE for truncate Our .bdrv_has_zero_init_truncate always returns 1 because sheepdog always 0-fills; we can use that same knowledge to implement BDRV_REQ_ZERO_WRITE by ignoring it. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20200428202905.770727-6-eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-05-08 13:26:35 +02:00
Eric Blake	2f98910d5b	rbd: Support BDRV_REQ_ZERO_WRITE for truncate Our .bdrv_has_zero_init_truncate always returns 1 because rbd always 0-fills; we can use that same knowledge to implement BDRV_REQ_ZERO_WRITE by ignoring it. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20200428202905.770727-5-eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-05-08 13:26:35 +02:00
Eric Blake	8f23aaf5d6	nfs: Support BDRV_REQ_ZERO_WRITE for truncate Our .bdrv_has_zero_init_truncate returns 1 if we detect that the OS always 0-fills; we can use that same knowledge to implement BDRV_REQ_ZERO_WRITE by ignoring it when the OS gives it to us for free. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20200428202905.770727-4-eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-05-08 13:26:35 +02:00
Eric Blake	8e51979504	file-win32: Support BDRV_REQ_ZERO_WRITE for truncate When using bdrv_file, .bdrv_has_zero_init_truncate always returns 1; therefore, we can behave just like file-posix, and always implement BDRV_REQ_ZERO_WRITE by ignoring it since the OS gives it to us for free (note that file-posix.c had to use an 'if' because it shared code between regular files and block devices, but in file-win32.c, bdrv_host_device uses a separate .bdrv_file_open). Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20200428202905.770727-3-eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-05-08 13:26:35 +02:00
Eric Blake	5e09bcee5b	gluster: Drop useless has_zero_init callback block.c already defaults to 0 if we don't provide a callback; there's no need to write a callback that always fails. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-Id: <20200428202905.770727-2-eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-05-08 13:26:35 +02:00
Max Reitz	4b96fa3846	qcow2: Fix preallocation on block devices Calling bdrv_getlength() to get the pre-truncate file size will not really work on block devices, because they have always the same length, and trying to write beyond it will fail with a rather cryptic error message. Instead, we should use qcow2_get_last_cluster() and bdrv_getlength() only as a fallback. Before this patch: $ truncate -s 1G test.img $ sudo losetup -f --show test.img /dev/loop0 $ sudo qemu-img create -f qcow2 -o preallocation=full /dev/loop0 64M Formatting '/dev/loop0', fmt=qcow2 size=67108864 cluster_size=65536 preallocation=full lazy_refcounts=off refcount_bits=16 qemu-img: /dev/loop0: Could not resize image: Failed to resize refcount structures: No space left on device With this patch: $ sudo qemu-img create -f qcow2 -o preallocation=full /dev/loop0 64M Formatting '/dev/loop0', fmt=qcow2 size=67108864 cluster_size=65536 preallocation=full lazy_refcounts=off refcount_bits=16 qemu-img: /dev/loop0: Could not resize image: Failed to resize underlying file: Preallocation mode 'full' unsupported for this non-regular file So as you can see, it still fails, but now the problem is missing support on the block device level, so we at least get a better error message. Note that we cannot preallocate block devices on truncate by design, because we do not know what area to preallocate. Their length is always the same, the truncate operation does not change it. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200505141801.1096763-1-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-05-08 13:26:35 +02:00
Kevin Wolf	958a04bd32	backup: Make sure that source and target size match Since the introduction of a backup filter node in commit `00e30f05d`, the backup block job crashes when the target image is smaller than the source image because it will try to write after the end of the target node without having BLK_PERM_RESIZE. (Previously, the BlockBackend layer would have caught this and errored out gracefully.) We can fix this and even do better than the old behaviour: Check that source and target have the same image size at the start of the block job and unshare BLK_PERM_RESIZE. (This permission was already unshared before the same commit `00e30f05d`, but the BlockBackend that was used to make the restriction was removed without a replacement.) This will immediately error out when starting the job instead of only when writing to a block that doesn't exist in the target. Longer target than source would technically work because we would never write to blocks that don't exist, but semantically these are invalid, too, because a backup is supposed to create a copy, not just an image that starts with a copy. Fixes: `00e30f05de` Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1778593 Cc: qemu-stable@nongnu.org Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20200430142755.315494-4-kwolf@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-05-08 13:26:35 +02:00
Kevin Wolf	58226634c4	backup: Improve error for bdrv_getlength() failure bdrv_get_device_name() will be an empty string with modern management tools that don't use -drive. Use bdrv_get_device_or_node_name() instead so that the node name is used if the BlockBackend is anonymous. While at it, start with upper case to make the message consistent with the rest of the function. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-Id: <20200430142755.315494-3-kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-05-08 13:26:35 +02:00
Kevin Wolf	2758be056b	vmdk: Flush only once in vmdk_L2update() If we have a backup L2 table, we currently flush once after writing to the active L2 table and again after writing to the backup table. A single flush is enough and makes things a little less slow. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20200430133007.170335-6-kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-05-08 13:26:35 +02:00
Kevin Wolf	78cae78dbc	vmdk: Don't update L2 table for zero write on zero cluster If a cluster is already zeroed, we don't have to call vmdk_L2update(), which is rather slow because it flushes the image file. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20200430133007.170335-5-kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-05-08 13:26:35 +02:00
Kevin Wolf	4823cde58e	vmdk: Fix partial overwrite of zero cluster When overwriting a zero cluster, we must not perform copy-on-write from the backing file, but from a zeroed buffer. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20200430133007.170335-4-kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-05-08 13:26:35 +02:00
Kevin Wolf	2821c1cc0f	vmdk: Fix zero cluster allocation m_data must contain valid data even for zero clusters when no cluster was allocated in the image file. Without this, zero writes segfault with images that have zeroed_grain=on. For zero writes, we don't want to allocate a cluster in the image file even in compressed files. Fixes: `524089bce4` Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20200430133007.170335-3-kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-05-08 13:26:35 +02:00
Kevin Wolf	4dc20e6465	vmdk: Rename VmdkMetaData.valid to new_allocation m_data is used for zero clusters even though valid == 0. It really only means that a new cluster was allocated in the image file. Rename it to reflect this. While at it, change it from int to bool, too. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20200430133007.170335-2-kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-05-08 13:26:35 +02:00
Alberto Garcia	e4d7019e1a	qcow2: Avoid integer wraparound in qcow2_co_truncate() After commit `f01643fb8b` when an image is extended and BDRV_REQ_ZERO_WRITE is set then the new clusters are zeroized. The code however does not detect correctly situations when the old and the new end of the image are within the same cluster. The problem can be reproduced with these steps: qemu-img create -f qcow2 backing.qcow2 1M qemu-img create -f qcow2 -F qcow2 -b backing.qcow2 top.qcow2 qemu-img resize --shrink top.qcow2 520k qemu-img resize top.qcow2 567k In the last step offset - zero_start causes an integer wraparound. Signed-off-by: Alberto Garcia <berto@igalia.com> Message-Id: <20200504155217.10325-1-berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-05-08 13:26:35 +02:00
Maxim Levitsky	3d1900a471	block: luks: better error message when creating too large files Currently if you attampt to create too large file with luks you get the following error message: Formatting 'test.luks', fmt=luks size=17592186044416 key-secret=sec0 qemu-img: test.luks: Could not resize file: File too large While for raw format the error message is qemu-img: test.img: The image size is too large for file format 'raw' The reason for this is that qemu-img checks for errono of the failure, and presents the later error when it is -EFBIG However crypto generic code 'swallows' the errno and replaces it with -EIO. As an attempt to make it better, we can make luks driver, detect -EFBIG and in this case present a better error message, which is what this patch does The new error message is: qemu-img: error creating test.luks: The requested file size is too large Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1534898 Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>	2020-05-07 12:52:33 +01:00
Peter Maydell	ea1329bb3a	Block patches: - Asynchronous copying for block-copy (i.e., the backup job) - Allow resizing of qcow2 images when they have internal snapshots - iotests: Logging improvements for Python tests - iotest 153 fix, and block comment cleanups -----BEGIN PGP SIGNATURE----- iQFGBAABCAAwFiEEkb62CjDbPohX0Rgp9AfbAGHVz0AFAl6xYpoSHG1yZWl0ekBy ZWRoYXQuY29tAAoJEPQH2wBh1c9AF7IH/j1wr6m8OrZtdAoebVqFQn2buydT+kGP DP4IVfJ4YoTwmZCBxoR5ZuH6kWTWgUDz1w5x7U5A70tVLqK+RwCnAxrlz19s6rVP ACp1d6GO7iLAEH58KRsvSvy7OTvpKzEXP8tS8kDsK58xl8m65vSBLIt9xUpZMql2 VAiftyu7MYGpObEoy2SnTuhM5H9pPcNiuwATGLNrJqdmi+bW8sIKiV3+yND7s1cz lWqVHaWq0XUfBaSG4sx6BqBeBl+cchv0XyRrwNJefY3lVLxIiizwxmb7UYBRSBrY XIKGOl1qZNE0AETuF2wh7XzTJuHe7Asash5dIIip0sqtbiToVlSxQwY= =vtg/ -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/maxreitz/tags/pull-block-2020-05-05' into staging Block patches: - Asynchronous copying for block-copy (i.e., the backup job) - Allow resizing of qcow2 images when they have internal snapshots - iotests: Logging improvements for Python tests - iotest 153 fix, and block comment cleanups # gpg: Signature made Tue 05 May 2020 13:56:58 BST # gpg: using RSA key 91BEB60A30DB3E8857D11829F407DB0061D5CF40 # gpg: issuer "mreitz@redhat.com" # gpg: Good signature from "Max Reitz <mreitz@redhat.com>" [full] # Primary key fingerprint: 91BE B60A 30DB 3E88 57D1 1829 F407 DB00 61D5 CF40 * remotes/maxreitz/tags/pull-block-2020-05-05: (24 commits) block/block-copy: use aio-task-pool API block/block-copy: refactor task creation block/block-copy: add state pointer to BlockCopyTask block/block-copy: alloc task on each iteration block/block-copy: rename in-flight requests to tasks Fix iotest 153 block: Comment cleanups qcow2: Tweak comment about bitmaps vs. resize qcow2: Allow resize of images with internal snapshots block: Add blk_new_with_bs() helper iotests: use python logging for iotests.log() iotests: Mark verify functions as private iotest 258: use script_main iotests: add script_initialize iotests: add hmp helper with logging iotests: limit line length to 79 chars iotests: touch up log function signature iotests: drop pre-Python 3.4 compatibility code iotests: alphabetize standard imports iotests: add pylintrc file ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2020-05-05 16:46:37 +01:00
Peter Maydell	f19d118bed	nbd patches for 2020-05-04 - reduce client-side fragmentation of NBD trim and status requests - fix iotest 41 when run in deep tree - fix socket activation in qemu-nbd -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEccLMIrHEYCkn0vOqp6FrSiUnQ2oFAl6whTUACgkQp6FrSiUn Q2r4nAf7BtGSFMkUu6nWYeq+Ggg+Xwmz2FLAzWTK/rccGDC44c9ETzOIbWEddo6X FHpU07VXdLW1h2M7ox8lQVo0DZEFxTRBYTPtUtjB7izfkAs4CkYeElJsZAPAZKgU GsKqa3RM6uXubsQaXXXjMFCGlYgqi1dVkmkgtPebt7evSe0ATlTfYfd0y9gb5f9C cbHD3CVcGKQe4ZtNcSBpTzOvXJSrBZznyCyhBO2qmVXTynt/5Ygog+Ulq3DHZsPX UkRkTPohKA0BhXuS7wD49danlzCLiTlvswr62fAncM1+AJTbmIa+apy3SwiOkwMh Aawq5vDtaFV+HEBKbMC0QRhgtoEe1w== =ExlI -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/ericb/tags/pull-nbd-2020-05-04' into staging nbd patches for 2020-05-04 - reduce client-side fragmentation of NBD trim and status requests - fix iotest 41 when run in deep tree - fix socket activation in qemu-nbd # gpg: Signature made Mon 04 May 2020 22:12:21 BST # gpg: using RSA key 71C2CC22B1C4602927D2F3AAA7A16B4A2527436A # gpg: Good signature from "Eric Blake <eblake@redhat.com>" [full] # gpg: aka "Eric Blake (Free Software Programmer) <ebb9@byu.net>" [full] # gpg: aka "[jpeg image of size 6874]" [full] # Primary key fingerprint: 71C2 CC22 B1C4 6029 27D2 F3AA A7A1 6B4A 2527 436A * remotes/ericb/tags/pull-nbd-2020-05-04: block/nbd-client: drop max_block restriction from discard block/nbd-client: drop max_block restriction from block_status iotests/041: Fix NBD socket path tools: Fix use of fcntl(F_SETFD) during socket activation Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2020-05-05 15:47:44 +01:00
Peter Maydell	a2261b2754	trivial patches (20200504) Silent static analyzer warning Remove dead assignments Support -chardev serial on macOS Update MAINTAINERS Some cosmetic changes -----BEGIN PGP SIGNATURE----- iQJGBAABCAAwFiEEzS913cjjpNwuT1Fz8ww4vT8vvjwFAl6wOI4SHGxhdXJlbnRA dml2aWVyLmV1AAoJEPMMOL0/L748p7UQAIFSNN0FrDV+K7i8qqq0X+JrS+dNOHNm DSpOf8IaGm/BezzL6XirXBVpFxg9iB5DQVLsjP1kUggO7rbBO0blx5H5eOPhnXZj xg60kLN16ty7NZ/WPS1G9jF4nDsjz0ZUtCXb0OXsuGJIOrsmN2r/lxdJwcjHZaqJ RzbcCSFXlvL0g7mOakJinMJH5r/nWCiUoEYsikhP10DcvuSBoCnjr+LYV6Ef02G0 Y5lgKN2G0EAMgWTJaL3gIF27zS8QLDNll+eO+PIU5K4yo75/wRCKr4e3PpErZlf6 B+hCAAPnXCpDKw+8sK2z+9OZXUGe1hQ8LHNgNNM921C66f+vLLXpIDTAECihM4K4 0wThYlFDwT4j+PMHFNlzIobGMtb33ui8m40lepMt/YOVFqY4tr8u3MLhHkVDo2+8 sNuOOWLXAoFOYyRqgTeVJvZvMUFQqtDiftghw1BR55TyIpDWjvLYRqae5CI+MGXs 6YylZVHGzVjMVptxvivvIQ735Nq8LaKq7N8Cb7uvcbRaCki39BsxXVPZx4p6NdwN dMndUOz/y75dNlRMDjK8l/oRFPJa/p1Yz8mZhl0uVOO6JeJhBwYmk+WkQ7g/GHZb Rx15HnVWRu6C/Icbw4kqZYyqrgl5lykS8aAWURePdpjzKY77rY1H71FesMhjifRN ZGgfUdWI88M4 =ibgH -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/vivier2/tags/trivial-branch-for-5.1-pull-request' into staging trivial patches (20200504) Silent static analyzer warning Remove dead assignments Support -chardev serial on macOS Update MAINTAINERS Some cosmetic changes # gpg: Signature made Mon 04 May 2020 16:45:18 BST # gpg: using RSA key CD2F75DDC8E3A4DC2E4F5173F30C38BD3F2FBE3C # gpg: issuer "laurent@vivier.eu" # gpg: Good signature from "Laurent Vivier <lvivier@redhat.com>" [full] # gpg: aka "Laurent Vivier <laurent@vivier.eu>" [full] # gpg: aka "Laurent Vivier (Red Hat) <lvivier@redhat.com>" [full] # Primary key fingerprint: CD2F 75DD C8E3 A4DC 2E4F 5173 F30C 38BD 3F2F BE3C * remotes/vivier2/tags/trivial-branch-for-5.1-pull-request: hw/timer/pxa2xx_timer: Add assertion to silent static analyzer warning hw/timer/stm32f2xx_timer: Remove dead assignment hw/gpio/aspeed_gpio: Remove dead assignment hw/isa/i82378: Remove dead assignment hw/ide/sii3112: Remove dead assignment hw/input/adb-kbd: Remove dead assignment hw/i2c/pm_smbus: Remove dead assignment blockdev: Remove dead assignment block: Avoid dead assignment Compress lines for immediate return chardev: Add macOS to list of OSes that support -chardev serial MAINTAINERS: Update Keith Busch's email address elf_ops: Don't try to g_mapped_file_unref(NULL) hw/mem/pc-dimm: Fix line over 80 characters warning hw/mem/pc-dimm: Print slot number on error at pc_dimm_pre_plug() MAINTAINERS: Mark the LatticeMico32 target as orphan timer/exynos4210_mct: Remove redundant statement in exynos4210_mct_write() display/blizzard: use extract16() for fix clang analyzer warning in blizzard_draw_line16_32() scsi/esp-pci: add g_assert() for fix clang analyzer warning in esp_pci_io_write() Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2020-05-05 14:03:28 +01:00
Vladimir Sementsov-Ogievskiy	4ce5dd3e9b	block/block-copy: use aio-task-pool API Run block_copy iterations in parallel in aio tasks. Changes: - BlockCopyTask becomes aio task structure. Add zeroes field to pass it to block_copy_do_copy - add call state - it's a state of one call of block_copy(), shared between parallel tasks. For now used only to keep information about first error: is it read or not. - convert block_copy_dirty_clusters to aio-task loop. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20200429130847.28124-6-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-05-05 14:03:28 +02:00
Vladimir Sementsov-Ogievskiy	42ac214406	block/block-copy: refactor task creation Instead of just relying on the comment "Called only on full-dirty region" in block_copy_task_create() let's move initial dirty area search directly to block_copy_task_create(). Let's also use effective bdrv_dirty_bitmap_next_dirty_area instead of looping through all non-dirty clusters. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200429130847.28124-5-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-05-05 14:03:28 +02:00
Vladimir Sementsov-Ogievskiy	1348a65774	block/block-copy: add state pointer to BlockCopyTask We are going to use aio-task-pool API, so we'll need state pointer in BlockCopyTask anyway. Add it now and use where possible. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200429130847.28124-4-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-05-05 14:03:28 +02:00
Vladimir Sementsov-Ogievskiy	f13e60a973	block/block-copy: alloc task on each iteration We are going to use aio-task-pool API, so tasks will be handled in parallel. We need therefore separate allocated task on each iteration. Introduce this logic now. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200429130847.28124-3-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-05-05 14:03:28 +02:00
Vladimir Sementsov-Ogievskiy	e9407785cc	block/block-copy: rename in-flight requests to tasks We are going to use aio-task-pool API and extend in-flight request structure to be a successor of AioTask, so rename things appropriately. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200429130847.28124-2-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-05-05 14:03:28 +02:00
Eric Blake	f464906951	block: Comment cleanups It's been a while since we got rid of the sector-based bdrv_read and bdrv_write (commit `2e11d756`); let's finish the job on a few remaining comments. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20200428213807.776655-1-eblake@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-05-05 13:17:36 +02:00
Eric Blake	ee1244a2e9	qcow2: Tweak comment about bitmaps vs. resize Our comment did not actually match the code. Rewrite the comment to be less sensitive to any future changes to qcow2-bitmap.c that might implement scenarios that we currently reject. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200428192648.749066-4-eblake@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-05-05 13:17:36 +02:00
Eric Blake	7fa140abf6	qcow2: Allow resize of images with internal snapshots We originally refused to allow resize of images with internal snapshots because the v2 image format did not require the tracking of snapshot size, making it impossible to safely revert to a snapshot with a different size than the current view of the image. But the snapshot size tracking was rectified in v3, and our recent fixes to qemu-img amend (see `0a85af35`) guarantee that we always have a valid snapshot size. Thus, we no longer need to artificially limit image resizes, but it does become one more thing that would prevent a downgrade back to v2. And now that we support different-sized snapshots, it's also easy to fix reverting to a snapshot to apply the new size. Upgrade iotest 61 to cover this (we previously had NO coverage of refusal to resize while snapshots exist). Note that the amend process can fail but still have effects: in particular, since we break things into upgrade, resize, downgrade, a failure during resize does not roll back changes made during upgrade, nor does failure in downgrade roll back a resize. But this situation is pre-existing even without this patch; and without journaling, the best we could do is minimize the chance of partial failure by collecting all changes prior to doing any writes - which adds a lot of complexity but could still fail with EIO. On the other hand, we are careful that even if we have partial modification but then fail, the image is left viable (that is, we are careful to sequence things so that after each successful cluster write, there may be transient leaked clusters but no corrupt metadata). And complicating the code to make it more transaction-like is not worth the effort: a user can always request multiple 'qemu-img amend' changing one thing each, if they need finer-grained control over detecting the first failure than what they get by letting qemu decide how to sequence multiple changes. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200428192648.749066-3-eblake@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-05-05 13:17:36 +02:00
Eric Blake	a3aeeab557	block: Add blk_new_with_bs() helper There are several callers that need to create a new block backend from an existing BDS; make the task slightly easier with a common helper routine. Suggested-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20200424190903.522087-2-eblake@redhat.com> [mreitz: Set @ret only in error paths, see https://lists.nongnu.org/archive/html/qemu-block/2020-04/msg01216.html] Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200428192648.749066-2-eblake@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-05-05 13:17:36 +02:00
Vladimir Sementsov-Ogievskiy	714eb0dbc5	block/nbd-client: drop max_block restriction from discard The NBD spec was updated (see nbd.git commit 9f30fedb) so that max_block doesn't relate to NBD_CMD_TRIM. So, drop the restriction. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20200401150112.9557-3-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> [eblake: tweak commit message to call out NBD commit] Signed-off-by: Eric Blake <eblake@redhat.com>	2020-05-04 15:16:46 -05:00
Vladimir Sementsov-Ogievskiy	6bf792b464	block/nbd-client: drop max_block restriction from block_status The NBD spec was updated (see nbd.git commit 9f30fedb) so that max_block doesn't relate to NBD_CMD_BLOCK_STATUS. So, drop the restriction. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200401150112.9557-2-vsementsov@virtuozzo.com> [eblake: tweak commit message to call out NBD commit] Signed-off-by: Eric Blake <eblake@redhat.com>	2020-05-04 15:13:14 -05:00
Daniel Brodsky	6e8a355de6	lockable: replaced locks with lock guard macros where appropriate - ran regexp "qemu_mutex_lock$.$.\n.*if" to find targets - replaced result with QEMU_LOCK_GUARD if all unlocks at function end - replaced result with WITH_QEMU_LOCK_GUARD if unlock not at end Signed-off-by: Daniel Brodsky <dnbrdsky@gmail.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Message-id: 20200404042108.389635-3-dnbrdsky@gmail.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-05-04 16:07:43 +01:00
Simran Singhal	b3ac2b94cd	Compress lines for immediate return Compress two lines into a single line if immediate return statement is found. It also remove variables progress, val, data, ret and sock as they are no longer needed. Remove space between function "mixer_load" and '(' to fix the checkpatch.pl error:- ERROR: space prohibited between function name and open parenthesis '(' Done using following coccinelle script: @@ local idexpression ret; expression e; @@ -ret = +return e; -return ret; Signed-off-by: Simran Singhal <singhalsimran0@gmail.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20200401165314.GA3213@simran-Inspiron-5558> [lv: in handle_aiocb_write_zeroes_unmap() move "int ret" inside the #ifdef] Signed-off-by: Laurent Vivier <laurent@vivier.eu>	2020-05-04 14:43:22 +02:00
Kevin Wolf	eb8a0cf3ba	qcow2: Forward ZERO_WRITE flag for full preallocation The BDRV_REQ_ZERO_WRITE is currently implemented in a way that first the image is possibly preallocated and then the zero flag is added to all clusters. This means that a copy-on-write operation may be needed when writing to these clusters, despite having used preallocation, negating one of the major benefits of preallocation. Instead, try to forward the BDRV_REQ_ZERO_WRITE to the protocol driver, and if the protocol driver can ensure that the new area reads as zeros, we can skip setting the zero flag in the qcow2 layer. Unfortunately, the same approach doesn't work for metadata preallocation, so we'll still set the zero flag there. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200424142701.67053-1-kwolf@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-04-30 17:51:07 +02:00
Kevin Wolf	955c7d6687	block: truncate: Don't make backing file data visible When extending the size of an image that has a backing file larger than its old size, make sure that the backing file data doesn't become visible in the guest, but the added area is properly zeroed out. Consider the following scenario where the overlay is shorter than its backing file: base.qcow2: AAAAAAAA overlay.qcow2: BBBB When resizing (extending) overlay.qcow2, the new blocks should not stay unallocated and make the additional As from base.qcow2 visible like before this patch, but zeros should be read. A similar case happens with the various variants of a commit job when an intermediate file is short (- for unallocated): base.qcow2: A-A-AAAA mid.qcow2: BB-B top.qcow2: C--C--C- After commit top.qcow2 to mid.qcow2, the following happens: mid.qcow2: CB-C00C0 (correct result) mid.qcow2: CB-C--C- (before this fix) Without the fix, blocks that previously read as zeros on top.qcow2 suddenly turn into A. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20200424125448.63318-8-kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-04-30 17:51:07 +02:00
Kevin Wolf	2f0c6e7a65	file-posix: Support BDRV_REQ_ZERO_WRITE for truncate For regular files, we always get BDRV_REQ_ZERO_WRITE behaviour from the OS, so we can advertise the flag and just ignore it. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200424125448.63318-7-kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-04-30 17:51:07 +02:00
Kevin Wolf	1ddaabaecb	raw-format: Support BDRV_REQ_ZERO_WRITE for truncate The raw format driver can simply forward the flag and let its bs->file child take care of actually providing the zeros. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20200424125448.63318-6-kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-04-30 17:51:07 +02:00
Kevin Wolf	f01643fb8b	qcow2: Support BDRV_REQ_ZERO_WRITE for truncate If BDRV_REQ_ZERO_WRITE is set and we're extending the image, calling qcow2_cluster_zeroize() with flags=0 does the right thing: It doesn't undo any previous preallocation, but just adds the zero flag to all relevant L2 entries. If an external data file is in use, a write_zeroes request to the data file is made instead. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20200424125448.63318-5-kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-04-30 17:51:07 +02:00
Kevin Wolf	8c6242b6f3	block-backend: Add flags to blk_truncate() Now that node level interface bdrv_truncate() supports passing request flags to the block driver, expose this on the BlockBackend level, too. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200424125448.63318-4-kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-04-30 17:51:07 +02:00
Kevin Wolf	7b8e485742	block: Add flags to bdrv(_co)_truncate() Now that block drivers can support flags for .bdrv_co_truncate, expose the parameter in the node level interfaces bdrv_co_truncate() and bdrv_truncate(). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200424125448.63318-3-kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-04-30 17:51:07 +02:00
Kevin Wolf	92b92799dc	block: Add flags to BlockDriver.bdrv_co_truncate() This adds a new BdrvRequestFlags parameter to the .bdrv_co_truncate() driver callbacks, and a supported_truncate_flags field in BlockDriverState that allows drivers to advertise support for request flags in the context of truncate. For now, we always pass 0 and no drivers declare support for any flag. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200424125448.63318-2-kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-04-30 17:51:07 +02:00
Markus Armbruster	1f5842487a	qapi: Only input visitors can actually fail The previous few commits have made this more obvious, and removed the one exception. Time to clarify the documentation, and drop dead error checking. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200424084338.26803-13-armbru@redhat.com>	2020-04-30 07:26:40 +02:00
Markus Armbruster	77ed971b9d	block/file-posix: Fix check_cache_dropped() error handling The Error ** argument must be NULL, &error_abort, &error_fatal, or a pointer to a variable containing NULL. Passing an argument of the latter kind twice without clearing it in between is wrong: if the first call sets an error, it no longer points to NULL for the second call. check_cache_dropped() calls error_setg() in a loop. It fails to break the loop in one instance. If a subsequent iteration error_setg()s again, it trips error_setv()'s assertion. Fix it to break the loop. Fixes: `31be8a2a97` Cc: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200422130719.28225-3-armbru@redhat.com>	2020-04-29 08:01:52 +02:00
Philippe Mathieu-Daudé	78ee6bd048	various: Remove suspicious '\' character outside of #define in C code Fixes the following coccinelle warnings: $ spatch --sp-file --verbose-parsing ... \ scripts/coccinelle/remove_local_err.cocci ... SUSPICIOUS: a \ character appears outside of a #define at ./target/ppc/translate_init.inc.c:5213 SUSPICIOUS: a \ character appears outside of a #define at ./target/ppc/translate_init.inc.c:5261 SUSPICIOUS: a \ character appears outside of a #define at ./target/microblaze/cpu.c:166 SUSPICIOUS: a \ character appears outside of a #define at ./target/microblaze/cpu.c:167 SUSPICIOUS: a \ character appears outside of a #define at ./target/microblaze/cpu.c:169 SUSPICIOUS: a \ character appears outside of a #define at ./target/microblaze/cpu.c:170 SUSPICIOUS: a \ character appears outside of a #define at ./target/microblaze/cpu.c:171 SUSPICIOUS: a \ character appears outside of a #define at ./target/microblaze/cpu.c:172 SUSPICIOUS: a \ character appears outside of a #define at ./target/microblaze/cpu.c:173 SUSPICIOUS: a \ character appears outside of a #define at ./target/i386/cpu.c:5787 SUSPICIOUS: a \ character appears outside of a #define at ./target/i386/cpu.c:5789 SUSPICIOUS: a \ character appears outside of a #define at ./target/i386/cpu.c:5800 SUSPICIOUS: a \ character appears outside of a #define at ./target/i386/cpu.c:5801 SUSPICIOUS: a \ character appears outside of a #define at ./target/i386/cpu.c:5802 SUSPICIOUS: a \ character appears outside of a #define at ./target/i386/cpu.c:5804 SUSPICIOUS: a \ character appears outside of a #define at ./target/i386/cpu.c:5805 SUSPICIOUS: a \ character appears outside of a #define at ./target/i386/cpu.c:5806 SUSPICIOUS: a \ character appears outside of a #define at ./target/i386/cpu.c:6329 SUSPICIOUS: a \ character appears outside of a #define at ./hw/sd/sdhci.c:1133 SUSPICIOUS: a \ character appears outside of a #define at ./hw/scsi/scsi-disk.c:3081 SUSPICIOUS: a \ character appears outside of a #define at ./hw/net/virtio-net.c:1529 SUSPICIOUS: a \ character appears outside of a #define at ./hw/riscv/sifive_u.c:468 SUSPICIOUS: a \ character appears outside of a #define at ./dump/dump.c:1895 SUSPICIOUS: a \ character appears outside of a #define at ./block/vhdx.c:2209 SUSPICIOUS: a \ character appears outside of a #define at ./block/vhdx.c:2215 SUSPICIOUS: a \ character appears outside of a #define at ./block/vhdx.c:2221 SUSPICIOUS: a \ character appears outside of a #define at ./block/vhdx.c:2222 SUSPICIOUS: a \ character appears outside of a #define at ./block/replication.c:172 SUSPICIOUS: a \ character appears outside of a #define at ./block/replication.c:173 Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-Id: <20200412223619.11284-2-f4bug@amsat.org> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Acked-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Markus Armbruster <armbru@redhat.com>	2020-04-29 08:01:51 +02:00
Chen Qun	ff0507c239	block/iscsi:fix heap-buffer-overflow in iscsi_aio_ioctl_cb There is an overflow, the source 'datain.data[2]' is 100 bytes, but the 'ss' is 252 bytes.This may cause a security issue because we can access a lot of unrelated memory data. The len for sbp copy data should take the minimum of mx_sb_len and sb_len_wr, not the maximum. If we use iscsi device for VM backend storage, ASAN show stack: READ of size 252 at 0xfffd149dcfc4 thread T0 #0 0xaaad433d0d34 in __asan_memcpy (aarch64-softmmu/qemu-system-aarch64+0x2cb0d34) #1 0xaaad45f9d6d0 in iscsi_aio_ioctl_cb /qemu/block/iscsi.c:996:9 #2 0xfffd1af0e2dc (/usr/lib64/iscsi/libiscsi.so.8+0xe2dc) #3 0xfffd1af0d174 (/usr/lib64/iscsi/libiscsi.so.8+0xd174) #4 0xfffd1af19fac (/usr/lib64/iscsi/libiscsi.so.8+0x19fac) #5 0xaaad45f9acc8 in iscsi_process_read /qemu/block/iscsi.c:403:5 #6 0xaaad4623733c in aio_dispatch_handler /qemu/util/aio-posix.c:467:9 #7 0xaaad4622f350 in aio_dispatch_handlers /qemu/util/aio-posix.c:510:20 #8 0xaaad4622f350 in aio_dispatch /qemu/util/aio-posix.c:520 #9 0xaaad46215944 in aio_ctx_dispatch /qemu/util/async.c:298:5 #10 0xfffd1bed12f4 in g_main_context_dispatch (/lib64/libglib-2.0.so.0+0x512f4) #11 0xaaad46227de0 in glib_pollfds_poll /qemu/util/main-loop.c:219:9 #12 0xaaad46227de0 in os_host_main_loop_wait /qemu/util/main-loop.c:242 #13 0xaaad46227de0 in main_loop_wait /qemu/util/main-loop.c:518 #14 0xaaad43d9d60c in qemu_main_loop /qemu/softmmu/vl.c:1662:9 #15 0xaaad4607a5b0 in main /qemu/softmmu/main.c:49:5 #16 0xfffd1a460b9c in __libc_start_main (/lib64/libc.so.6+0x20b9c) #17 0xaaad43320740 in _start (aarch64-softmmu/qemu-system-aarch64+0x2c00740) 0xfffd149dcfc4 is located 0 bytes to the right of 100-byte region [0xfffd149dcf60,0xfffd149dcfc4) allocated by thread T0 here: #0 0xaaad433d1e70 in __interceptor_malloc (aarch64-softmmu/qemu-system-aarch64+0x2cb1e70) #1 0xfffd1af0e254 (/usr/lib64/iscsi/libiscsi.so.8+0xe254) #2 0xfffd1af0d174 (/usr/lib64/iscsi/libiscsi.so.8+0xd174) #3 0xfffd1af19fac (/usr/lib64/iscsi/libiscsi.so.8+0x19fac) #4 0xaaad45f9acc8 in iscsi_process_read /qemu/block/iscsi.c:403:5 #5 0xaaad4623733c in aio_dispatch_handler /qemu/util/aio-posix.c:467:9 #6 0xaaad4622f350 in aio_dispatch_handlers /qemu/util/aio-posix.c:510:20 #7 0xaaad4622f350 in aio_dispatch /qemu/util/aio-posix.c:520 #8 0xaaad46215944 in aio_ctx_dispatch /qemu/util/async.c:298:5 #9 0xfffd1bed12f4 in g_main_context_dispatch (/lib64/libglib-2.0.so.0+0x512f4) #10 0xaaad46227de0 in glib_pollfds_poll /qemu/util/main-loop.c:219:9 #11 0xaaad46227de0 in os_host_main_loop_wait /qemu/util/main-loop.c:242 #12 0xaaad46227de0 in main_loop_wait /qemu/util/main-loop.c:518 #13 0xaaad43d9d60c in qemu_main_loop /qemu/softmmu/vl.c:1662:9 #14 0xaaad4607a5b0 in main /qemu/softmmu/main.c:49:5 #15 0xfffd1a460b9c in __libc_start_main (/lib64/libc.so.6+0x20b9c) #16 0xaaad43320740 in _start (aarch64-softmmu/qemu-system-aarch64+0x2c00740) Reported-by: Euler Robot <euler.robot@huawei.com> Signed-off-by: Chen Qun <kuhn.chenqun@huawei.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20200418062602.10776-1-kuhn.chenqun@huawei.com Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2020-04-20 11:31:46 +01:00
Peter Maydell	2f37b0222c	Block layer patches: - Fix crashes and hangs related to iothreads, bdrv_drain and block jobs: - Fix some AIO context locking in jobs - Fix blk->in_flight during blk_wait_while_drained() - vpc: Don't round up already aligned BAT sizes -----BEGIN PGP SIGNATURE----- iQIcBAABAgAGBQJejI1UAAoJEH8JsnLIjy/WQhEQAIF2zkkdbT77VTMLvFmPU36J hPR2RSzd5egNtfn3zm7vGZJ3FEZb5SEAV5R9CnwVAJeQnPUll7bjkPsoynrEUFU2 PSoFe8lKuYwoAOOSjqakASpKx/Xs3sctKc4fgRWr5/dbWFCC77Q+qxHiKO4KYfeh g00jsjeCLMVLop3r9uMDovlA81mJUIP5axSjSH7akgzLC+ZCfH2RFFu62D7wYv9w UgQM/NZt2YuLFm+BeQLB4K2BK+PZBCN1trPj+h11ecUwm9b1XZZfO/7R0T8Wrljc 49fd5Z/GombCnyxuLCr6QrhLZcr8FffLJXQgkdHTkWUKQXqZUfmqLjVIAbli0ZiC hP8jE5EgdAXpBrGeM2oH+dk0iQSBGTIMaGlhDwxO32xOAQ8TKpRiMZMfwGHqB+vn m/EPoHLHL6vQGxISfGjj4k3fnSP74nvRrS656MhLuG03SkPZacyTZQpTkErg2imW AeU6g4afvvLtJBoF/YYA5Qhff1Ux5eh9jactIW1DRf/Q4tTc4ioTU25560Le4eGZ kex/AwIcf9P47eTUCP8L6iNKz8RU7bV4g9vl9zz7fQm3i9GEhly88XvOppnXRUvT XdkfdlSmUZ9vue6rAfgsL5fQIHtsGRfH90nT11/IW1X4baOImtcQBWg3xZdR4zPS W2H0J01PlSKE8l/OWQlo =oODf -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging Block layer patches: - Fix crashes and hangs related to iothreads, bdrv_drain and block jobs: - Fix some AIO context locking in jobs - Fix blk->in_flight during blk_wait_while_drained() - vpc: Don't round up already aligned BAT sizes # gpg: Signature made Tue 07 Apr 2020 15:25:24 BST # gpg: using RSA key 7F09B272C88F2FD6 # gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" [full] # Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6 * remotes/kevin/tags/for-upstream: vpc: Don't round up already aligned BAT sizes block: Fix blk->in_flight during blk_wait_while_drained() block: Increase BB.in_flight for coroutine and sync interfaces block-backend: Reorder flush/pdiscard function definitions backup: don't acquire aio_context in backup_clean replication: assert we own context before job_cancel_sync job: take each job's lock individually in job_txn_apply Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2020-04-07 19:12:45 +01:00
Kevin Wolf	3f6de653b9	vpc: Don't round up already aligned BAT sizes As reported on Launchpad, Azure apparently doesn't accept images for upload that are not both aligned to 1 MB blocks and have a BAT size that matches the image size exactly. As far as I can tell, there is no real reason why we create a BAT that is one entry longer than necessary for aligned image sizes, so change that. (Even though the condition is only mentioned as "should" in the spec and previous products accepted larger BATs - but we'll try to maintain compatibility with as many of Microsoft's ever-changing interpretations of the VHD spec as possible.) Fixes: https://bugs.launchpad.net/bugs/1870098 Reported-by: Tobias Witek Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20200402093603.2369-1-kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-04-07 15:42:08 +02:00
Kevin Wolf	7f16476fab	block: Fix blk->in_flight during blk_wait_while_drained() Waiting in blk_wait_while_drained() while blk->in_flight is increased for the current request is wrong because it will cause the drain operation to deadlock. This patch makes sure that blk_wait_while_drained() is called with blk->in_flight increased exactly once for the current request, and that it temporarily decreases the counter while it waits. Fixes: `cf3129323f` Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200407121259.21350-4-kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-04-07 15:40:57 +02:00
Kevin Wolf	fbb92b6798	block: Increase BB.in_flight for coroutine and sync interfaces External callers of blk_co_() and of the synchronous blk_() functions don't currently increase the BlockBackend.in_flight counter, but calls from blk_aio_() do, so there is an inconsistency whether the counter has been increased or not. This patch moves the actual operations to static functions that can later know they will always be called with in_flight increased exactly once, even for external callers using the blk_co_() coroutine interfaces. If the public blk_co_*() interface is unused, remove it. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20200407121259.21350-3-kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-04-07 15:40:41 +02:00
Kevin Wolf	564806c529	block-backend: Reorder flush/pdiscard function definitions Move all variants of the flush/pdiscard functions to a single place and put the blk_co_*() version first because it is called by all other variants (and will become static in the next patch). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200407121259.21350-2-kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-04-07 15:40:28 +02:00
Stefan Reiter	eca0f3524a	backup: don't acquire aio_context in backup_clean All code-paths leading to backup_clean (via job_clean) have the job's context already acquired. The job's context is guaranteed to be the same as the one used by backup_top via backup_job_create. Since the previous logic effectively acquired the lock twice, this broke cleanup of backups for disks using IO threads, since the BDRV_POLL_WHILE in bdrv_backup_top_drop -> bdrv_do_drained_begin would only release the lock once, thus deadlocking with the IO thread. This is a partial revert of `0abf258171`. Signed-off-by: Stefan Reiter <s.reiter@proxmox.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200407115651.69472-4-s.reiter@proxmox.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-04-07 14:34:47 +02:00
Stefan Reiter	08558e3325	replication: assert we own context before job_cancel_sync job_cancel_sync requires the job's lock to be held, all other callers already do this (replication_stop, drive_backup_abort, blockdev_backup_abort, job_cancel_sync_all, cancel_common). In this case we're in a BlockDriver handler, so we already have a lock, just assert that it is the same as the one used for the commit_job. Signed-off-by: Stefan Reiter <s.reiter@proxmox.com> Message-Id: <20200407115651.69472-3-s.reiter@proxmox.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-04-07 14:34:47 +02:00
Alberto Garcia	fb43d2d46e	qcow2: Check request size in qcow2_co_pwritev_compressed_part() When issuing a compressed write request the number of bytes must be a multiple of the cluster size or reach the end of the last cluster. With the current code such requests are allowed and we hit an assertion: $ qemu-img create -f qcow2 img.qcow2 1M $ qemu-io -c 'write -c 0 32k' img.qcow2 qemu-io: block/qcow2.c:4257: qcow2_co_pwritev_compressed_task: Assertion `bytes == s->cluster_size \|\| (bytes < s->cluster_size && (offset + bytes == bs->total_sectors << BDRV_SECTOR_BITS))' failed. Aborted This patch fixes a regression introduced in `0d483dce38` Signed-off-by: Alberto Garcia <berto@igalia.com> Message-Id: <20200406143401.26854-1-berto@igalia.com> Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-04-07 13:51:09 +02:00
Alberto Garcia	80f5c01183	qcow2: Forbid discard in qcow2 v2 images with backing files A discard request deallocates the selected clusters so they read back as zeroes. This is done by clearing the cluster offset field and setting QCOW_OFLAG_ZERO in the L2 entry. This flag is however only supported when qcow_version >= 3. In older images the cluster is simply deallocated, exposing any possible stale data from the backing file. Since discard is an advisory operation it's safer to simply forbid it in this scenario. Note that we are adding this check to qcow2_co_pdiscard() and not to qcow2_cluster_discard() or discard_in_l2_slice() because the last two are also used by qcow2_snapshot_create() to discard the clusters used by the VM state. In this case there's no risk of exposing stale data to the guest and we really want that the clusters are always discarded. Signed-off-by: Alberto Garcia <berto@igalia.com> Message-Id: <20200331114345.29993-1-berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-04-07 13:51:09 +02:00
Kevin Wolf	df74b1d3df	qcow2: Remove unused fields from BDRVQcow2State These fields were already removed in commit `c3c10f72`, but then commit `b58deb34` revived them probably due to bad merge conflict resolution. They are still unused, so remove them again. Fixes: `b58deb344d` Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20200326170757.12344-1-kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-03-27 14:47:23 +01:00
Kevin Wolf	ce8cabbd17	mirror: Wait only for in-flight operations mirror_wait_for_free_in_flight_slot() just picks a random operation to wait for. However, a MirrorOp is already in s->ops_in_flight when mirror_co_read() waits for free slots, so if not enough slots are immediately available, an operation can end up waiting for itself, or two or more operations can wait for each other to complete, which results in a hang. Fix this by adding a flag to MirrorOp that tells us if the request is already in flight (and therefore occupies slots that it will later free), and picking only such operations for waiting. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1794692 Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20200326153628.4869-3-kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-03-27 14:47:23 +01:00
Kevin Wolf	9178f4fe5f	Revert "mirror: Don't let an operation wait for itself" This reverts commit `7e6c4ff792`. The fix was incomplete as it only protected against requests waiting for themselves, but not against requests waiting for each other. We need a different solution. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20200326153628.4869-2-kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-03-27 14:47:23 +01:00
Chen Qun	34afc5c298	block/iscsi:use the flags in iscsi_open() prevent Clang warning Clang static code analyzer show warning: block/iscsi.c:1920:9: warning: Value stored to 'flags' is never read flags &= ~BDRV_O_RDWR; ^ ~~~~~~~~~~~~ In iscsi_allocmap_init() only checks BDRV_O_NOCACHE, which is the same in both of flags and bs->open_flags. We can use the flags instead bs->open_flags to prevent Clang warning. Reported-by: Euler Robot <euler.robot@huawei.com> Signed-off-by: Chen Qun <kuhn.chenqun@huawei.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20200311032927.35092-1-kuhn.chenqun@huawei.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-03-27 14:47:23 +01:00
Eric Blake	ed04991063	sheepdog: Consistently set bdrv_has_zero_init_truncate block_int.h claims that .bdrv_has_zero_init must return 0 if .bdrv_has_zero_init_truncate does likewise; but this is violated if only the former callback is provided if .bdrv_co_truncate also exists. When adding the latter callback, it was mistakenly added to only one of the three possible sheepdog instantiations. Fixes: `1dcaf527` Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20200324174233.1622067-5-eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-03-26 14:44:33 +01:00
Eric Blake	e7be13ad3f	qcow2: Avoid feature name extension on small cluster size As the feature name table can be quite large (over 9k if all 64 bits of all three feature fields have names; a mere 8 features leaves only 8 bytes for a backing file name in a 512-byte cluster), it is unwise to emit this optional header in images with small cluster sizes. Update iotest 036 to skip running on small cluster sizes; meanwhile, note that iotest 061 never passed on alternative cluster sizes (however, I limited this patch to tests with output affected by adding feature names, rather than auditing for other tests that are not robust to alternative cluster sizes). Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-Id: <20200324174233.1622067-4-eblake@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-03-26 14:44:33 +01:00
Eric Blake	bb40ebce2c	qcow2: List autoclear bit names in header The feature table is supposed to advertise the name of all feature bits that we support; however, we forgot to update the table for autoclear bits. While at it, move the table to read-only memory in code, and tweak the qcow2 spec to name the second autoclear bit. Update iotests that are affected by the longer header length. Fixes: `88ddffae` Fixes: `93c24936` Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20200324174233.1622067-3-eblake@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-03-26 14:44:33 +01:00
Eric Blake	a951a631b9	qcow2: Comment typo fixes Various trivial typos noticed while working on this file. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-Id: <20200324174233.1622067-2-eblake@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-03-26 14:44:33 +01:00
Maxim Levitsky	5a5e7f8cd8	block: trickle down the fallback image creation function use to the block drivers Instead of checking the .bdrv_co_create_opts to see if we need the fallback, just implement the .bdrv_co_create_opts in the drivers that need it. This way we don't break various places that need to know if the underlying protocol/format really supports image creation, and this way we still allow some drivers to not support image creation. Fixes: `fd17146cd9` Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1816007 Note that technically this driver reverts the image creation fallback for the vxhs driver since I don't have a means to test it, and IMHO it is better to leave it not supported as it was prior to generic image creation patches. Also drop iscsi_create_opts which was left accidentally. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20200326011218.29230-3-mlevitsk@redhat.com> Reviewed-by: Denis V. Lunev <den@openvz.org> [mreitz: Fixed alignment, and moved bdrv_co_create_opts_simple() and bdrv_create_opts_simple from block.h into block_int.h] Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-03-26 14:44:33 +01:00
Maxim Levitsky	b92902dfea	block: pass BlockDriver reference to the .bdrv_co_create This will allow the reuse of a single generic .bdrv_co_create implementation for several drivers. No functional changes. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20200326011218.29230-2-mlevitsk@redhat.com> Reviewed-by: Denis V. Lunev <den@openvz.org> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-03-26 14:44:33 +01:00
Vladimir Sementsov-Ogievskiy	66c8672d24	block/mirror: fix use after free of local_err local_err is used again in mirror_exit_common() after bdrv_set_backing_hd(), so we must zero it. Otherwise try to set non-NULL local_err will crash. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20200324153630.11882-3-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-03-26 14:44:32 +01:00
Vladimir Sementsov-Ogievskiy	808cf3cb6a	block/qcow2: zero data_file child after free data_file being NULL doesn't seem to be a correct state, but it's better than dead pointer and simpler to debug. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20200316060631.30052-3-vsementsov@virtuozzo.com> Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-03-24 11:41:46 +01:00
Eric Blake	71eaec2e8c	block: Avoid memleak on qcow2 image info failure If we fail to get bitmap info, we must not leak the encryption info. Fixes: `b8968c875f` Fixes: Coverity CID 1421894 Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20200320183620.1112123-1-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Tested-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-03-24 11:41:46 +01:00
Peter Maydell	e6d567db23	Pull request -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE+ber27ys35W+dsvQfe+BBqr8OQ4FAl5yg0AACgkQfe+BBqr8 OQ7uWBAArnxhM0kej5NAQiGsC3sHDOGUszKhsz8qwSOYQ0NkOH9495ksklOuw8T9 v/7CAp4rqvma00nL5D+6Srjc9Pe+9BdIz3aa0xvi+tkXvRvFKcWoZlLyNpK+SCbB FwT5sAzoZ004cijG3xsV8U/1i/jEWiRk0AV8USuGymTkQh0M9XS/MHVlrv0HJU9k 2tLz7FGHBmEGhXFXKjviNsrQfvHl2AzIKIYnOcK6OhF6wevJzRz8ShsYcmhyN7V3 rBm0IDiJxJ54a6oZ8FyukkqL5gj+mZDEoGXnspvWkmixqjmNw0iAWgyWNk8vnBue ROphtzZ7xvATu560vIXnlj9q+new5h1ZEb2Q1Nl6WL0PFHJ0vEmRHnpSj+lUTHFD aEwLL72qk12yZFPwEO9V+tO15mE+9vYbbsfe8SXVIlH52ec+OHGQ3/ytulL2zEDz vVJYy37wOhp8f5TFpdmFXi8ovHV7a8g3Vr6Hh1XmcEW0o/GjEgunjW7V+JHSJlDe XBp5IXsz0+zWn2hCxtjUzX3qBDIswyR89qR/NRJKpuu1D5VSaqVwM3vd5dCgDLHA lnCCi8kcq9rsnRNyeBb0t8d7c6e1HSW8OIHL4OC515VUFdrKSEyxprzDyIMPifDg 9B9ImLXg3CgOQVOXPU/0mbh4beTIeI7uJrmgzvzs3vZc+dLG+H8= =xYlZ -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/jnsnow/tags/bitmaps-pull-request' into staging Pull request # gpg: Signature made Wed 18 Mar 2020 20:23:28 GMT # gpg: using RSA key F9B7ABDBBCACDF95BE76CBD07DEF8106AAFC390E # gpg: Good signature from "John Snow (John Huston) <jsnow@redhat.com>" [full] # Primary key fingerprint: FAEB 9711 A12C F475 812F 18F2 88A9 064D 1835 61EB # Subkey fingerprint: F9B7 ABDB BCAC DF95 BE76 CBD0 7DEF 8106 AAFC 390E * remotes/jnsnow/tags/bitmaps-pull-request: block/qcow2-bitmap: use bdrv_dirty_bitmap_next_dirty nbd/server: use bdrv_dirty_bitmap_next_dirty_area nbd/server: introduce NBDExtentArray block/dirty-bitmap: improve _next_dirty_area API block/dirty-bitmap: add _next_dirty API block/dirty-bitmap: switch _next_dirty_area and _next_zero to int64_t hbitmap: drop meta bitmaps as they are unused hbitmap: unpublish hbitmap_iter_skip_words hbitmap: move hbitmap_iter_next_word to hbitmap.c hbitmap: assert that we don't create bitmap larger than INT64_MAX build: Silence clang warning on older glib autoptr usage Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2020-03-19 15:31:09 +00:00
Vladimir Sementsov-Ogievskiy	2d00cbd8e2	block/qcow2-bitmap: use bdrv_dirty_bitmap_next_dirty store_bitmap_data() loop does bdrv_set_dirty_iter() on each iteration, which means that we actually don't need iterator itself and we can use simpler bitmap API. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20200205112041.6003-11-vsementsov@virtuozzo.com Signed-off-by: John Snow <jsnow@redhat.com>	2020-03-18 14:03:46 -04:00
Vladimir Sementsov-Ogievskiy	299ea9ff01	block/dirty-bitmap: improve _next_dirty_area API Firstly, _next_dirty_area is for scenarios when we may contiguously search for next dirty area inside some limited region, so it is more comfortable to specify "end" which should not be recalculated on each iteration. Secondly, let's add a possibility to limit resulting area size, not limiting searching area. This will be used in NBD code in further commit. (Note that now bdrv_dirty_bitmap_next_dirty_area is unused) Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20200205112041.6003-8-vsementsov@virtuozzo.com Signed-off-by: John Snow <jsnow@redhat.com>	2020-03-18 14:03:46 -04:00
Vladimir Sementsov-Ogievskiy	9399c54b75	block/dirty-bitmap: add _next_dirty API We have bdrv_dirty_bitmap_next_zero, let's add corresponding bdrv_dirty_bitmap_next_dirty, which is more comfortable to use than bitmap iterators in some cases. For test modify test_hbitmap_next_zero_check_range to check both next_zero and next_dirty and add some new checks. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20200205112041.6003-7-vsementsov@virtuozzo.com Signed-off-by: John Snow <jsnow@redhat.com>	2020-03-18 14:03:46 -04:00
Vladimir Sementsov-Ogievskiy	642700fda0	block/dirty-bitmap: switch _next_dirty_area and _next_zero to int64_t We are going to introduce bdrv_dirty_bitmap_next_dirty so that same variable may be used to store its return value and to be its parameter, so it would int64_t. Similarly, we are going to refactor hbitmap_next_dirty_area to use hbitmap_next_dirty together with hbitmap_next_zero, therefore we want hbitmap_next_zero parameter type to be int64_t too. So, for convenience update all parameters of _next_zero and _next_dirty_area to be int64_t. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20200205112041.6003-6-vsementsov@virtuozzo.com Signed-off-by: John Snow <jsnow@redhat.com>	2020-03-18 14:03:46 -04:00
Peter Maydell	cf4b64406c	Error reporting patches for 2020-03-17 -----BEGIN PGP SIGNATURE----- iQJGBAABCAAwFiEENUvIs9frKmtoZ05fOHC0AOuRhlMFAl5w+zkSHGFybWJydUBy ZWRoYXQuY29tAAoJEDhwtADrkYZTaeAQALPrnwX3g9/HLm2YHc1P0TB1eTenBqen K204sRW53waxzm4g9trb8P4Nzmp8r1oGmZfPriVzB3ykoW2Kzfu+4oa95+YT+exk H4XSQfCvCp1e/ZShkx5rY9Kg1gSgWhQ00MNwz8puHUsHtcp5dMTkmYqL4hzgWnA0 TwV7w06+6kLP4fRglIc5X7BVggBKosmMPfvjg/KYUe12Z3moSSQZA5dyEp5VAVl9 MNFJpryWVek6+Z8UFiQ3CMmR/H2UVI0liDlU1aZsR9pcyjiuJxrBEwboVO5qY3N7 lraKg+CVdiK7rn21bs6wAFOk08eG8VqZMeTb7HU6KJ6FIP2KopwvRXIEmNgo2C/C xU3XRl5oyRtaAOKSnwOBzEhZZ+wTRp2RcMzFS6p7URm5R3LNfB1dlqE7yE5z4lcl EgdbMLy4LiMkKwUPrVGBwzZNDO6ywVjFWUcHze9Dyb3z1ciWhwEENaIGe0CU3lhG ii+GxTzMTGoeJ2HE2hRmGTLACNt7a/we88aDY0kDLeVz5rq80oa+xckqV/oG3XpN v/imWHMugdsUwmQshUrT0JQq+BCnuwiHc82pm0X8bTqtJ6TmoIYhxuJkh040QIxt 5ymFfAMz7ysc+50JY7OEVRI/8YQPyCaZmst/D42dicWUU9NdasWcIx+kCmK3LOjj 0/Nb4vfX3xgN =vpk3 -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/armbru/tags/pull-error-2020-03-17' into staging Error reporting patches for 2020-03-17 # gpg: Signature made Tue 17 Mar 2020 16:30:49 GMT # gpg: using RSA key 354BC8B3D7EB2A6B68674E5F3870B400EB918653 # gpg: issuer "armbru@redhat.com" # gpg: Good signature from "Markus Armbruster <armbru@redhat.com>" [full] # gpg: aka "Markus Armbruster <armbru@pond.sub.org>" [full] # Primary key fingerprint: 354B C8B3 D7EB 2A6B 6867 4E5F 3870 B400 EB91 8653 * remotes/armbru/tags/pull-error-2020-03-17: hw/sd/ssi-sd: fix error handling in ssi_sd_realize xen-block: Use one Error * variable instead of two hw/misc/ivshmem: Use one Error * variable instead of two Use &error_abort instead of separate assert() Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2020-03-18 17:57:40 +00:00
Peter Maydell	d649689a8e	* Bugfixes all over the place * get/set_uint cleanups (Felipe) * Lock guard support (Stefan) * MemoryRegion ownership cleanup (Philippe) * AVX512 optimization for buffer_is_zero (Robert) -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQEcBAABAgAGBQJecOZiAAoJEL/70l94x66DgGkH/jpY4IgqlSAAWCgaxfe1n1vg ahSzSLrC8wiJq2Jxbmxn+5BbH6BxQ9ibflsY5bvCY/sTb7UlOFCPkFhQ2iUgplkw ciB5UfgCA6OHpKEhpHhXtzlybtNOlxXNWYJ1SrcVXbRES8f7XdhMKs15mnJJuOOE k/tuZo/44yZRJl0Cv+nkvIFcCVgyu1q0Lln/1MMPngY2r9gt893cY9feTBSSWgnp +7HZr5TXI7mcIytczFKzbdujlG4391DGejKX66IIxGcWg9vXS7TwAStzH1vSKVfJ 73SKZBoCU5gpHHHC+dqVyouMerV+UE+WQPNtF+LCsNgJBw/2NXc1ZgDrtz1OI2c= =+LRX -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging * Bugfixes all over the place * get/set_uint cleanups (Felipe) * Lock guard support (Stefan) * MemoryRegion ownership cleanup (Philippe) * AVX512 optimization for buffer_is_zero (Robert) # gpg: Signature made Tue 17 Mar 2020 15:01:54 GMT # gpg: using RSA key BFFBD25F78C7AE83 # gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [full] # gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" [full] # Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1 # Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83 * remotes/bonzini/tags/for-upstream: (62 commits) hw/arm: Let devices own the MemoryRegion they create hw/arm: Remove unnecessary memory_region_set_readonly() on ROM alias hw/ppc/ppc405: Use memory_region_init_rom() with read-only regions hw/arm/stm32: Use memory_region_init_rom() with read-only regions hw/char: Let devices own the MemoryRegion they create hw/riscv: Let devices own the MemoryRegion they create hw/dma: Let devices own the MemoryRegion they create hw/display: Let devices own the MemoryRegion they create hw/core: Let devices own the MemoryRegion they create scripts/cocci: Patch to let devices own their MemoryRegions scripts/cocci: Patch to remove unnecessary memory_region_set_readonly() scripts/cocci: Patch to detect potential use of memory_region_init_rom hw/sparc: Use memory_region_init_rom() with read-only regions hw/sh4: Use memory_region_init_rom() with read-only regions hw/riscv: Use memory_region_init_rom() with read-only regions hw/ppc: Use memory_region_init_rom() with read-only regions hw/pci-host: Use memory_region_init_rom() with read-only regions hw/net: Use memory_region_init_rom() with read-only regions hw/m68k: Use memory_region_init_rom() with read-only regions hw/display: Use memory_region_init_rom() with read-only regions ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2020-03-17 18:33:05 +00:00
Markus Armbruster	20ac582d0c	Use &error_abort instead of separate assert() Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20200313170517.22480-2-armbru@redhat.com> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Acked-by: Alexander Bulekov <alxndr@bu.edu> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> [Unused Error *variable deleted]	2020-03-17 16:05:40 +01:00
Philippe Mathieu-Daudé	880a7817c1	misc: Replace zero-length arrays with flexible array member (manual) Description copied from Linux kernel commit from Gustavo A. R. Silva (see [3]): --v-- description start --v-- The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member [1], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being unadvertenly introduced [2] to the Linux codebase from now on. --^-- description end --^-- Do the similar housekeeping in the QEMU codebase (which uses C99 since commit `7be41675f7`). All these instances of code were found with the help of the following command (then manual analysis, without modifying structures only having a single flexible array member, such QEDTable in block/qed.h): git grep -F '[0];' [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=76497732932f [3] https://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux.git/commit/?id=17642a2fbd2c1 Inspired-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-03-16 22:07:42 +01:00
Philippe Mathieu-Daudé	f7795e4096	misc: Replace zero-length arrays with flexible array member (automatic) Description copied from Linux kernel commit from Gustavo A. R. Silva (see [3]): --v-- description start --v-- The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member [1], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being unadvertenly introduced [2] to the Linux codebase from now on. --^-- description end --^-- Do the similar housekeeping in the QEMU codebase (which uses C99 since commit `7be41675f7`). All these instances of code were found with the help of the following Coccinelle script: @@ identifier s, m, a; type t, T; @@ struct s { ... t m; - T a[0]; + T a[]; }; @@ identifier s, m, a; type t, T; @@ struct s { ... t m; - T a[0]; + T a[]; } QEMU_PACKED; [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=76497732932f [3] https://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux.git/commit/?id=17642a2fbd2c1 Inspired-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-03-16 22:07:42 +01:00
Vladimir Sementsov-Ogievskiy	4ab78b1918	block/io: fix bdrv_co_do_copy_on_readv Prior to `1143ec5ebf` it was OK to qemu_iovec_from_buf() from aligned-up buffer to original qiov, as qemu_iovec_from_buf() will stop at qiov end anyway. But after `1143ec5ebf` we assume that bdrv_co_do_copy_on_readv works on part of original qiov, defined by qiov_offset and bytes. So we must not touch qiov behind qiov_offset+bytes bound. Fix it. Cc: qemu-stable@nongnu.org # v4.2 Fixes: `1143ec5ebf` Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20200312081949.5350-1-vsementsov@virtuozzo.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-03-16 11:46:11 +00:00
Peter Maydell	49780a582d	Block layer patches: - Relax restrictions for blockdev-snapshot (allows libvirt to do live storage migration with blockdev-mirror) - luks: Delete created files when block_crypto_co_create_opts_luks fails - Fix memleaks in qmp_object_add -----BEGIN PGP SIGNATURE----- iQIcBAABAgAGBQJeaQYTAAoJEH8JsnLIjy/WQwYP/3pzAjqVecL3dGmnPWAkBqCV CFpxT2nIMe+xCBvWQBoeekHsFJ7GQf4E1WVNRZgoAh9VQkvkajZsVNn8Auo2Veq2 c7w/R4xf/Wet2hKGVRS0JXwbg69U5BbpcF7E2DRNfp+CvaDCafHSDNeGTb3hFUjT x1jwhK6VqfY7+LHU0B0QmX2KA66nDx1p8l8HJQYd1MlCKAbj8kv/swEbqBJn32hA 32CIYfC4VCqkW5va1eOjd3Kyi/ugkFCHTI8+mOa45/BFBzIiKfCsDaFHh/DI59QB qcDKkUcO3+W788vCKgJQGnG070TwKPx2OnjhxFKiEGaoX3Sz+AY4wUvf3mfFl8GM zYqTdOy4Xh0ckvA6JCS0jtAKmANkeEGqnECAgub22Z+kyOzqC05B1FkYwqYDcFXY atWKm5Vr47jgD6Oq6O0OpZaZrAUWOfoBmq4ErnrBEHuW5329NEInmjYwxednK+43 CwU/lSdX7ujRSsjS8Xi1dHS4pxHK/mg51dInL44zGFUayegiLPgA8cuESE0mHOfZ 67X14rxu6D4Y5r0L+w7rsSGjByR29VynE1McL9fZ1Wp29JHaQ5fjdG6GMXEwYxmV R0YNXe85FAlNgqj0Bme+fR2YPxZ48NqHIMOvFFStNHyfD0qQN0TtT1iarfEcBR0u jm8MnSoIDLkRXEgBW9bW =ociU -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging Block layer patches: - Relax restrictions for blockdev-snapshot (allows libvirt to do live storage migration with blockdev-mirror) - luks: Delete created files when block_crypto_co_create_opts_luks fails - Fix memleaks in qmp_object_add # gpg: Signature made Wed 11 Mar 2020 15:38:59 GMT # gpg: using RSA key 7F09B272C88F2FD6 # gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" [full] # Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6 * remotes/kevin/tags/for-upstream: qemu-iotests: adding LUKS cleanup for non-UTF8 secret error crypto.c: cleanup created file when block_crypto_co_create_opts_luks fails block.c: adding bdrv_co_delete_file block: introducing 'bdrv_co_delete_file' interface tests/qemu-iotests: Fix socket_scm_helper build path qapi: Add '@allow-write-only-overlay' feature for 'blockdev-snapshot' iotests: Add iothread cases to 155 block: Fix cross-AioContext blockdev-snapshot iotests: Test mirror with temporarily disabled target backing file iotests: Fix run_job() with use_log=False block: Relax restrictions for blockdev-snapshot block: Make bdrv_get_cumulative_perm() public qom-qmp-cmds: fix two memleaks in qmp_object_add Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2020-03-12 16:51:26 +00:00
Daniel Henrique Barboza	1bba30da24	crypto.c: cleanup created file when block_crypto_co_create_opts_luks fails When using a non-UTF8 secret to create a volume using qemu-img, the following error happens: $ qemu-img create -f luks --object secret,id=vol_1_encrypt0,file=vol_resize_pool.vol_1.secret.qzVQrI -o key-secret=vol_1_encrypt0 /var/tmp/pool_target/vol_1 10240K Formatting '/var/tmp/pool_target/vol_1', fmt=luks size=10485760 key-secret=vol_1_encrypt0 qemu-img: /var/tmp/pool_target/vol_1: Data from secret vol_1_encrypt0 is not valid UTF-8 However, the created file '/var/tmp/pool_target/vol_1' is left behind in the file system after the failure. This behavior can be observed when creating the volume using Libvirt, via 'virsh vol-create', and then getting "volume target path already exist" errors when trying to re-create the volume. The volume file is created inside block_crypto_co_create_opts_luks(), in block/crypto.c. If the bdrv_create_file() call is successful but any succeeding step fails, the existing 'fail' label does not take into account the created file, leaving it behind. This patch changes block_crypto_co_create_opts_luks() to delete 'filename' in case of failure. A failure in this point means that the volume is now truncated/corrupted, so even if 'filename' was an existing volume before calling qemu-img, it is now unusable. Deleting the file it is not much worse than leaving it in the filesystem in this scenario, and we don't have to deal with checking the file pre-existence in the code. in our case, block_crypto_co_create_generic calls qcrypto_block_create, which calls qcrypto_block_luks_create, and this function fails when calling qcrypto_secret_lookup_as_utf8. Reported-by: Srikanth Aithal <bssrikanth@in.ibm.com> Suggested-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <20200130213907.2830642-4-danielhb413@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-03-11 15:54:38 +01:00
Daniel Henrique Barboza	9bffae14df	block: introducing 'bdrv_co_delete_file' interface Adding to Block Drivers the capability of being able to clean up its created files can be useful in certain situations. For the LUKS driver, for instance, a failure in one of its authentication steps can leave files in the host that weren't there before. This patch adds the 'bdrv_co_delete_file' interface to block drivers and add it to the 'file' driver in file-posix.c. The implementation is given by 'raw_co_delete_file'. Suggested-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <20200130213907.2830642-2-danielhb413@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-03-11 15:54:38 +01:00
Vladimir Sementsov-Ogievskiy	397f4e9d83	block/block-copy: hide structure definitions Hide structure definitions and add explicit API instead, to keep an eye on the scope of the shared fields. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200311103004.7649-10-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-03-11 12:42:30 +01:00
Vladimir Sementsov-Ogievskiy	5332e5d210	block/block-copy: reduce intersecting request lock Currently, block_copy operation lock the whole requested region. But there is no reason to lock clusters, which are already copied, it will disturb other parallel block_copy requests for no reason. Let's instead do the following: Lock only sub-region, which we are going to operate on. Then, after copying all dirty sub-regions, we should wait for intersecting requests block-copy, if they failed, we should retry these new dirty clusters. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Message-Id: <20200311103004.7649-9-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-03-11 12:42:30 +01:00
Vladimir Sementsov-Ogievskiy	8719091f9d	block/block-copy: rename start to offset in interfaces offset/bytes pair is more usual naming in block layer, let's use it. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200311103004.7649-8-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-03-11 12:42:30 +01:00
Vladimir Sementsov-Ogievskiy	dafaf13593	block/block-copy: refactor interfaces to use bytes instead of end We have a lot of "chunk_end - start" invocations, let's switch to bytes/cur_bytes scheme instead. While being here, improve check on block_copy_do_copy parameters to not overflow when calculating nbytes and use int64_t for bytes in block_copy for consistency. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200311103004.7649-7-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-03-11 12:42:30 +01:00
Vladimir Sementsov-Ogievskiy	17187cb646	block/block-copy: factor out find_conflicting_inflight_req Split find_conflicting_inflight_req to be used separately. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200311103004.7649-6-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-03-11 12:42:30 +01:00
Vladimir Sementsov-Ogievskiy	2d57511a88	block/block-copy: use block_status Use bdrv_block_status_above to chose effective chunk size and to handle zeroes effectively. This substitutes checking for just being allocated or not, and drops old code path for it. Assistance by backup job is dropped too, as caching block-status information is more difficult than just caching is-allocated information in our dirty bitmap, and backup job is not good place for this caching anyway. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200311103004.7649-5-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-03-11 12:42:30 +01:00
Vladimir Sementsov-Ogievskiy	9d31bc53fa	block/block-copy: specialcase first copy_range request In block_copy_do_copy we fallback to read+write if copy_range failed. In this case copy_size is larger than defined for buffered IO, and there is corresponding commit. Still, backup copies data cluster by cluster, and most of requests are limited to one cluster anyway, so the only source of this one bad-limited request is copy-before-write operation. Further patch will move backup to use block_copy directly, than for cases where copy_range is not supported, first request will be oversized in each backup. It's not good, let's change it now. Fix is simple: just limit first copy_range request like buffer-based request. If it succeed, set larger copy_range limit. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200311103004.7649-4-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-03-11 12:42:30 +01:00
Vladimir Sementsov-Ogievskiy	d0ebeca14a	block/block-copy: fix progress calculation Assume we have two regions, A and B, and region B is in-flight now, region A is not yet touched, but it is unallocated and should be skipped. Correspondingly, as progress we have total = A + B current = 0 If we reset unallocated region A and call progress_reset_callback, it will calculate 0 bytes dirty in the bitmap and call job_progress_set_remaining, which will set total = current + 0 = 0 + 0 = 0 So, B bytes are actually removed from total accounting. When job finishes we'll have total = 0 current = B , which doesn't sound good. This is because we didn't considered in-flight bytes, actually when calculating remaining, we should have set (in_flight + dirty_bytes) as remaining, not only dirty_bytes. To fix it, let's refactor progress calculation, moving it to block-copy itself instead of fixing callback. And, of course, track in_flight bytes count. We still have to keep one callback, to maintain backup job bytes_read calculation, but it will go on soon, when we turn the whole backup process into one block_copy call. Cc: qemu-stable@nongnu.org Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Message-Id: <20200311103004.7649-3-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-03-11 12:42:30 +01:00
Vladimir Sementsov-Ogievskiy	e7266570f2	block/qcow2-threads: fix qcow2_decompress On success path we return what inflate() returns instead of 0. And it most probably works for Z_STREAM_END as it is positive, but is definitely broken for Z_BUF_ERROR. While being here, switch to errno return code, to be closer to qcow2_compress API (and usual expectations). Revert condition in if to be more positive. Drop dead initialization of ret. Cc: qemu-stable@nongnu.org # v4.0 Fixes: `341926ab83` Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20200302150930.16218-1-vsementsov@virtuozzo.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Ján Tomko <jtomko@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-03-11 12:42:30 +01:00
Pan Nengyuan	4aebf0f0da	block/qcow2: do free crypto_opts in qcow2_close() 'crypto_opts' forgot to free in qcow2_close(), this patch fix the bellow leak stack: Direct leak of 24 byte(s) in 1 object(s) allocated from: #0 0x7f0edd81f970 in __interceptor_calloc (/lib64/libasan.so.5+0xef970) #1 0x7f0edc6d149d in g_malloc0 (/lib64/libglib-2.0.so.0+0x5249d) #2 0x55d7eaede63d in qobject_input_start_struct /mnt/sdb/qemu-new/qemu_test/qemu/qapi/qobject-input-visitor.c:295 #3 0x55d7eaed78b8 in visit_start_struct /mnt/sdb/qemu-new/qemu_test/qemu/qapi/qapi-visit-core.c:49 #4 0x55d7eaf5140b in visit_type_QCryptoBlockOpenOptions qapi/qapi-visit-crypto.c:290 #5 0x55d7eae43af3 in block_crypto_open_opts_init /mnt/sdb/qemu-new/qemu_test/qemu/block/crypto.c:163 #6 0x55d7eacd2924 in qcow2_update_options_prepare /mnt/sdb/qemu-new/qemu_test/qemu/block/qcow2.c:1148 #7 0x55d7eacd33f7 in qcow2_update_options /mnt/sdb/qemu-new/qemu_test/qemu/block/qcow2.c:1232 #8 0x55d7eacd9680 in qcow2_do_open /mnt/sdb/qemu-new/qemu_test/qemu/block/qcow2.c:1512 #9 0x55d7eacdc55e in qcow2_open_entry /mnt/sdb/qemu-new/qemu_test/qemu/block/qcow2.c:1792 #10 0x55d7eacdc8fe in qcow2_open /mnt/sdb/qemu-new/qemu_test/qemu/block/qcow2.c:1819 #11 0x55d7eac3742d in bdrv_open_driver /mnt/sdb/qemu-new/qemu_test/qemu/block.c:1317 #12 0x55d7eac3e990 in bdrv_open_common /mnt/sdb/qemu-new/qemu_test/qemu/block.c:1575 #13 0x55d7eac4442c in bdrv_open_inherit /mnt/sdb/qemu-new/qemu_test/qemu/block.c:3126 #14 0x55d7eac45c3f in bdrv_open /mnt/sdb/qemu-new/qemu_test/qemu/block.c:3219 #15 0x55d7ead8e8a4 in blk_new_open /mnt/sdb/qemu-new/qemu_test/qemu/block/block-backend.c:397 #16 0x55d7eacde74c in qcow2_co_create /mnt/sdb/qemu-new/qemu_test/qemu/block/qcow2.c:3534 #17 0x55d7eacdfa6d in qcow2_co_create_opts /mnt/sdb/qemu-new/qemu_test/qemu/block/qcow2.c:3668 #18 0x55d7eac1c678 in bdrv_create_co_entry /mnt/sdb/qemu-new/qemu_test/qemu/block.c:485 #19 0x55d7eb0024d2 in coroutine_trampoline /mnt/sdb/qemu-new/qemu_test/qemu/util/coroutine-ucontext.c:115 Reported-by: Euler Robot <euler.robot@huawei.com> Signed-off-by: Pan Nengyuan <pannengyuan@huawei.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200227012950.12256-2-pannengyuan@huawei.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-03-11 12:42:30 +01:00
David Edmondson	69032253c3	block/curl: HTTP header field names are case insensitive RFC 7230 section 3.2 indicates that HTTP header field names are case insensitive. Signed-off-by: David Edmondson <david.edmondson@oracle.com> Message-Id: <20200224101310.101169-3-david.edmondson@oracle.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-03-11 12:42:29 +01:00
David Edmondson	7788a31939	block/curl: HTTP header fields allow whitespace around values RFC 7230 section 3.2 indicates that whitespace is permitted between the field name and field value and after the field value. Signed-off-by: David Edmondson <david.edmondson@oracle.com> Message-Id: <20200224101310.101169-2-david.edmondson@oracle.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-03-11 12:42:29 +01:00
Stefan Hajnoczi	a9da6e49d8	luks: implement .bdrv_measure() Add qemu-img measure support in the "luks" block driver. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200221112522.1497712-3-stefanha@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-03-11 12:42:29 +01:00
Stefan Hajnoczi	6d49d3a859	luks: extract qcrypto_block_calculate_payload_offset() The qcow2 .bdrv_measure() code calculates the crypto payload offset. This logic really belongs in crypto/block.c where it can be reused by other image formats. The "luks" block driver will need this same logic in order to implement .bdrv_measure(), so extract the qcrypto_block_calculate_payload_offset() function now. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200221112522.1497712-2-stefanha@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-03-11 12:42:29 +01:00
Maxim Levitsky	89802d5ae7	monitor/hmp: Move hmp_drive_add_node to block-hmp-cmds.c Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20200308092440.23564-12-mlevitsk@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-03-09 18:20:22 +00:00
Maxim Levitsky	2bcad73c4b	monitor/hmp: move hmp_info_block* to block-hmp-cmds.c Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20200308092440.23564-11-mlevitsk@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-03-09 18:20:21 +00:00
Maxim Levitsky	1061f8dd80	monitor/hmp: move remaining hmp_block* functions to block-hmp-cmds.c Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20200308092440.23564-10-mlevitsk@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-03-09 18:20:13 +00:00
Maxim Levitsky	e263120ecc	monitor/hmp: move hmp_nbd_server* to block-hmp-cmds.c Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20200308092440.23564-9-mlevitsk@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-03-09 18:17:58 +00:00
Maxim Levitsky	fce2b91fdf	monitor/hmp: move hmp_snapshot_* to block-hmp-cmds.c hmp_snapshot_blkdev is from GPLv2 version of the hmp-cmds.c thus have to change the licence to GPLv2 Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20200308092440.23564-8-mlevitsk@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-03-09 18:07:50 +00:00
Maxim Levitsky	6b7fbf61fb	monitor/hmp: move hmp_block_job* to block-hmp-cmds.c Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20200308092440.23564-7-mlevitsk@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-03-09 18:07:48 +00:00
Maxim Levitsky	0932e3f23d	monitor/hmp: move hmp_drive_mirror and hmp_drive_backup to block-hmp-cmds.c Moved code was added after 2012-01-13, thus under GPLv2+ Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20200308092440.23564-6-mlevitsk@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Fixed commit message	2020-03-09 18:07:35 +00:00
Maxim Levitsky	a1edae276a	monitor/hmp: move hmp_drive_del and hmp_commit to block-hmp-cmds.c Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20200308092440.23564-5-mlevitsk@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-03-09 18:05:33 +00:00
Maxim Levitsky	a2dde2f221	monitor/hmp: rename device-hotplug.c to block/monitor/block-hmp-cmds.c These days device-hotplug.c only contains the hmp_drive_add In the next patch, rest of hmp_drive* functions will be moved there. Also add block-hmp-cmds.h to contain prototypes of these functions License for block-hmp-cmds.h since it contains the code moved from sysemu.h which lacks license and thus according to LICENSE is under GPLv2+ Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20200308092440.23564-4-mlevitsk@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-03-09 18:05:31 +00:00
Chen Qun	76e91cda07	block/file-posix: Remove redundant statement in raw_handle_perm_lock() Clang static code analyzer show warning: block/file-posix.c:891:9: warning: Value stored to 'op' is never read op = RAW_PL_ABORT; ^ ~~~~~~~~~~~~ Reported-by: Euler Robot <euler.robot@huawei.com> Signed-off-by: Chen Qun <kuhn.chenqun@huawei.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20200302130715.29440-5-kuhn.chenqun@huawei.com> Signed-off-by: Laurent Vivier <laurent@vivier.eu>	2020-03-09 15:59:31 +01:00
Chen Qun	35c9453592	block/stream: Remove redundant statement in stream_run() Clang static code analyzer show warning: block/stream.c:186:9: warning: Value stored to 'ret' is never read ret = 0; ^ ~ Reported-by: Euler Robot <euler.robot@huawei.com> Signed-off-by: Chen Qun <kuhn.chenqun@huawei.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20200302130715.29440-3-kuhn.chenqun@huawei.com> Signed-off-by: Laurent Vivier <laurent@vivier.eu>	2020-03-09 15:59:31 +01:00
Florian Florensa	19ae9ae014	block/rbd: Add support for ceph namespaces Starting from ceph Nautilus, RBD has support for namespaces, allowing for finer grain ACLs on images inside a pool, and tenant isolation. In the rbd cli tool documentation, the new image-spec and snap-spec are : - [pool-name/[namespace-name/]]image-name - [pool-name/[namespace-name/]]image-name@snap-name When using an non namespace's enabled qemu, it complains about not finding the image called namespace-name/image-name, thus we only need to parse the image once again to find if there is a '/' in its name, and if there is, use what is before it as the name of the namespace to later pass it to rados_ioctx_set_namespace. rados_ioctx_set_namespace if called with en empty string or a null pointer as the namespace parameters pretty much does nothing, as it then defaults to the default namespace. The namespace is extracted inside qemu_rbd_parse_filename, stored in the qdict, and used in qemu_rbd_connect to make it work with both qemu-img, and qemu itself. Signed-off-by: Florian Florensa <fflorensa@online.net> Message-Id: <20200110111513.321728-2-fflorensa@online.net> Reviewed-by: Jason Dillaman <dillaman@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-03-06 17:21:28 +01:00
Kevin Wolf	14837c6493	qemu-storage-daemon: Add --blockdev option This adds a --blockdev option to the storage daemon that works the same as the -blockdev option of the system emulator. In order to be able to link with blockdev.o, we also need to change stream.o from common-obj to block-obj, which is where all other block jobs already are. In contrast to the system emulator, qemu-storage-daemon options will be processed in the order they are given. The user needs to take care to refer to other objects only after defining them. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20200224143008.13362-7-kwolf@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-03-06 17:15:38 +01:00
Kevin Wolf	12c929bca2	block: Move system emulator QMP commands to block/qapi-sysemu.c These commands make only sense for system emulators and their implementations call functions that don't exist in tools (e.g. to resolve qdev IDs). Move them out so that blockdev.c can be linked to qemu-storage-daemon. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20200224143008.13362-4-kwolf@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-03-06 17:15:38 +01:00
Peter Krempa	65eb7c85a3	block/qcow2: Move bitmap reopen into bdrv_reopen_commit_post The bitmap code requires writing the 'file' child when the qcow2 driver is reopened in read-write mode. If the 'file' child is being reopened due to a permissions change, the modification is commited yet when qcow2_reopen_commit is called. This means that any attempt to write the 'file' child will end with EBADFD as the original fd was already closed. Moving bitmap reopening to the new callback which is called after permission modifications are commited fixes this as the file descriptor will be replaced with the correct one. The above problem manifests itself when reopening 'qcow2' format layer which uses a 'file-posix' file child which was opened with the 'auto-read-only' property set. Signed-off-by: Peter Krempa <pkrempa@redhat.com> Message-Id: <db118dbafe1955afbc0a18d3dd220931074ce349.1582893284.git.pkrempa@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-03-06 17:15:37 +01:00
Max Reitz	3ede935fdb	qcow2: Fix alloc_cluster_abort() for pre-existing clusters handle_alloc() reuses preallocated zero clusters. If anything goes wrong during the data write, we do not change their L2 entry, so we must not let qcow2_alloc_cluster_abort() free them. Fixes: `8b24cd1415` Cc: qemu-stable@nongnu.org Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200225143130.111267-2-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-03-06 17:15:37 +01:00
Lukas Straub	08ddb4eb6d	block/replication.c: Ignore requests after failover After failover the Secondary side of replication shouldn't change state, because it now functions as our primary disk. In replication_start, replication_do_checkpoint, replication_stop, ignore the request if current state is BLOCK_REPLICATION_DONE (sucessful failover) or BLOCK_REPLICATION_FAILOVER (failover in progres i.e. currently merging active and hidden images into the base image). Signed-off-by: Lukas Straub <lukasstraub2@web.de> Reviewed-by: Zhang Chen <chen.zhang@intel.com> Acked-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com>	2020-03-03 18:04:47 +08:00
Pan Nengyuan	8198cf5ef0	block/nbd: fix memory leak in nbd_open() In currently implementation there will be a memory leak when nbd_client_connect() returns error status. Here is an easy way to reproduce: 1. run qemu-iotests as follow and check the result with asan: ./check -raw 143 Following is the asan output backtrack: Direct leak of 40 byte(s) in 1 object(s) allocated from: #0 0x7f629688a560 in calloc (/usr/lib64/libasan.so.3+0xc7560) #1 0x7f6295e7e015 in g_malloc0 (/usr/lib64/libglib-2.0.so.0+0x50015) #2 0x56281dab4642 in qobject_input_start_struct /mnt/sdb/qemu-4.2.0-rc0/qapi/qobject-input-visitor.c:295 #3 0x56281dab1a04 in visit_start_struct /mnt/sdb/qemu-4.2.0-rc0/qapi/qapi-visit-core.c:49 #4 0x56281dad1827 in visit_type_SocketAddress qapi/qapi-visit-sockets.c:386 #5 0x56281da8062f in nbd_config /mnt/sdb/qemu-4.2.0-rc0/block/nbd.c:1716 #6 0x56281da8062f in nbd_process_options /mnt/sdb/qemu-4.2.0-rc0/block/nbd.c:1829 #7 0x56281da8062f in nbd_open /mnt/sdb/qemu-4.2.0-rc0/block/nbd.c:1873 Direct leak of 15 byte(s) in 1 object(s) allocated from: #0 0x7f629688a3a0 in malloc (/usr/lib64/libasan.so.3+0xc73a0) #1 0x7f6295e7dfbd in g_malloc (/usr/lib64/libglib-2.0.so.0+0x4ffbd) #2 0x7f6295e96ace in g_strdup (/usr/lib64/libglib-2.0.so.0+0x68ace) #3 0x56281da804ac in nbd_process_options /mnt/sdb/qemu-4.2.0-rc0/block/nbd.c:1834 #4 0x56281da804ac in nbd_open /mnt/sdb/qemu-4.2.0-rc0/block/nbd.c:1873 Indirect leak of 24 byte(s) in 1 object(s) allocated from: #0 0x7f629688a3a0 in malloc (/usr/lib64/libasan.so.3+0xc73a0) #1 0x7f6295e7dfbd in g_malloc (/usr/lib64/libglib-2.0.so.0+0x4ffbd) #2 0x7f6295e96ace in g_strdup (/usr/lib64/libglib-2.0.so.0+0x68ace) #3 0x56281dab41a3 in qobject_input_type_str_keyval /mnt/sdb/qemu-4.2.0-rc0/qapi/qobject-input-visitor.c:536 #4 0x56281dab2ee9 in visit_type_str /mnt/sdb/qemu-4.2.0-rc0/qapi/qapi-visit-core.c:297 #5 0x56281dad0fa1 in visit_type_UnixSocketAddress_members qapi/qapi-visit-sockets.c:141 #6 0x56281dad17b6 in visit_type_SocketAddress_members qapi/qapi-visit-sockets.c:366 #7 0x56281dad186a in visit_type_SocketAddress qapi/qapi-visit-sockets.c:393 #8 0x56281da8062f in nbd_config /mnt/sdb/qemu-4.2.0-rc0/block/nbd.c:1716 #9 0x56281da8062f in nbd_process_options /mnt/sdb/qemu-4.2.0-rc0/block/nbd.c:1829 #10 0x56281da8062f in nbd_open /mnt/sdb/qemu-4.2.0-rc0/block/nbd.c:1873 Fixes: `8f071c9db5` Reported-by: Euler Robot <euler.robot@huawei.com> Signed-off-by: Pan Nengyuan <pannengyuan@huawei.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Cc: qemu-stable <qemu-stable@nongnu.org> Cc: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <1575517528-44312-3-git-send-email-pannengyuan@huawei.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2020-02-26 17:29:00 -06:00
Pan Nengyuan	7f493662be	block/nbd: extract the common cleanup code The BDRVNBDState cleanup code is common in two places, add nbd_clear_bdrvstate() function to do these cleanups. Suggested-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Pan Nengyuan <pannengyuan@huawei.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <1575517528-44312-2-git-send-email-pannengyuan@huawei.com> Reviewed-by: Eric Blake <eblake@redhat.com> [eblake: fix compilation error and commit message] Signed-off-by: Eric Blake <eblake@redhat.com>	2020-02-26 17:28:08 -06:00
Eric Blake	2485f22fe9	nbd-client: Support leading / in NBD URI The NBD URI specification [1] states that only one leading slash at the beginning of the URI path component is stripped, not all such slashes. This becomes important to a patch I just proposed to nbdkit [2], which would allow the exportname to select a file embedded within an ext2 image: ext2fs demands an absolute pathname beginning with '/', and because qemu was inadvertantly stripping it, my nbdkit patch had to work around the behavior. [1] https://github.com/NetworkBlockDevice/nbd/blob/master/doc/uri.md [2] https://www.redhat.com/archives/libguestfs/2020-February/msg00109.html Note that the qemu bug only affects handling of URIs such as nbd://host:port//abs/path (where '/abs/path' should be the export name); it is still possible to use --image-opts and pass the desired export name with a leading slash directly through JSON even without this patch. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20200212023101.1162686-1-eblake@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>	2020-02-26 14:45:02 -06:00
Max Reitz	804359b8b9	block: Fix VM size field width in snapshot dump When printing the snapshot list (e.g. with qemu-img snapshot -l), the VM size field is only seven characters wide. As of `de38b5005e`, this is not necessarily sufficient: We generally print three digits, and this may require a decimal point. Also, the unit field grew from something as plain as "M" to " MiB". This means that number and unit may take up eight characters in total; but we also want spaces in front. Considering previously the maximum width was four characters and the field width was chosen to be three characters wider, let us adjust the field width to be eleven now. Fixes: `de38b5005e` Buglink: https://bugs.launchpad.net/qemu/+bug/1859989 Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200117105859.241818-2-mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-02-20 16:43:42 +01:00
Max Reitz	80f0900905	iscsi: Drop iscsi_co_create_opts() The generic fallback implementation effectively does the same. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200122164532.178040-5-mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-02-20 16:43:42 +01:00
Max Reitz	87ca3b8fa6	file-posix: Drop hdev_co_create_opts() The generic fallback implementation effectively does the same. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200122164532.178040-4-mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-02-20 16:43:42 +01:00
Max Reitz	78c81a3f10	block/nbd: Fix hang in .bdrv_close() When nbd_close() is called from a coroutine, the connection_co never gets to run, and thus nbd_teardown_connection() hangs. This is because aio_co_enter() only puts the connection_co into the main coroutine's wake-up queue, so this main coroutine needs to yield and wait for connection_co to terminate. Suggested-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200122164532.178040-2-mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-02-20 16:43:42 +01:00
Vladimir Sementsov-Ogievskiy	4bc267a7c7	block/backup-top: fix flags handling backup-top "supports" write-unchanged, by skipping CBW operation in backup_top_co_pwritev. But it forgets to do the same in backup_top_co_pwrite_zeroes, as well as declare support for BDRV_REQ_WRITE_UNCHANGED. Fix this, and, while being here, declare also support for flags supported by source child. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20200207161231.32707-1-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-02-20 16:43:42 +01:00
Daniel P. Berrangé	087ab8e775	block: always fill entire LUKS header space with zeros When initializing the LUKS header the size with default encryption parameters will currently be 2068480 bytes. This is rounded up to a multiple of the cluster size, 2081792, with 64k sectors. If the end of the header is not the same as the end of the cluster we fill the extra space with zeros. This was forgetting that not even the space allocated for the header will be fully initialized, as we only write key material for the first key slot. The space left for the other 7 slots is never written to. An optimization to the ref count checking code: commit `a5fff8d4b4` (refs/bisect/bad) Author: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Date: Wed Feb 27 16:14:30 2019 +0300 qcow2-refcount: avoid eating RAM made the assumption that every cluster which was allocated would have at least some data written to it. This was violated by way the LUKS header is only partially written, with much space simply reserved for future use. Depending on the cluster size this problem was masked by the logic which wrote zeros between the end of the LUKS header and the end of the cluster. $ qemu-img create --object secret,id=cluster_encrypt0,data=123456 \ -f qcow2 -o cluster_size=2k,encrypt.iter-time=1,\ encrypt.format=luks,encrypt.key-secret=cluster_encrypt0 \ cluster_size_check.qcow2 100M Formatting 'cluster_size_check.qcow2', fmt=qcow2 size=104857600 encrypt.format=luks encrypt.key-secret=cluster_encrypt0 encrypt.iter-time=1 cluster_size=2048 lazy_refcounts=off refcount_bits=16 $ qemu-img check --object secret,id=cluster_encrypt0,data=redhat \ 'json:{"driver": "qcow2", "encrypt.format": "luks", \ "encrypt.key-secret": "cluster_encrypt0", \ "file.driver": "file", "file.filename": "cluster_size_check.qcow2"}' ERROR: counting reference for region exceeding the end of the file by one cluster or more: offset 0x2000 size 0x1f9000 Leaked cluster 4 refcount=1 reference=0 ...snip... Leaked cluster 130 refcount=1 reference=0 1 errors were found on the image. Data may be corrupted, or further writes to the image may corrupt it. 127 leaked clusters were found on the image. This means waste of disk space, but no harm to data. Image end offset: 268288 The problem only exists when the disk image is entirely empty. Writing data to the disk image payload will solve the problem by causing the end of the file to be extended further. The change fixes it by ensuring that the entire allocated LUKS header region is fully initialized with zeros. The qemu-img check will still fail for any pre-existing disk images created prior to this change, unless at least 1 byte of the payload is written to. Fully writing zeros to the entire LUKS header is a good idea regardless as it ensures that space has been allocated on the host filesystem (or whatever block storage backend is used). Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <20200207135520.2669430-1-berrange@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-02-20 16:43:42 +01:00
Peter Krempa	facda5443f	qapi: Allow getting flat output from 'query-named-block-nodes' When a management application manages node names there's no reason to recurse into backing images in the output of query-named-block-nodes. Add a parameter to the command which will return just the top level structs. Signed-off-by: Peter Krempa <pkrempa@redhat.com> Message-Id: <4470f8c779abc404dcf65e375db195cd91a80651.1579509782.git.pkrempa@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> [mreitz: Fixed coding style] Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-02-20 16:43:42 +01:00
Max Reitz	3c7f75b321	quorum: Stop marking it as a filter Quorum is not a filter, for example because it cannot guarantee which of its children will serve the next request. Thus, any of its children may differ from the data visible to quorum's parents. We have other filters with multiple children, but they differ in this aspect: - blkverify quits the whole qemu process if its children differ. As such, we can always skip it when we want to skip it (as a filter node) by going to any of its children. Both have the same data. - replication generally serves requests from bs->file, so this is its only actually filtered child. - Block job filters currently only have one child, but they will probably get more children in the future. Still, they will always have only one actually filtered child. Having "filters" as a dedicated node category only makes sense if you can skip them by going to a one fixed child that always shows the same data as the filter node. Quorum cannot fulfill this, so it is not a filter. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20200218103454.296704-13-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-02-18 11:55:40 +01:00
Max Reitz	6e9cc05181	mirror: Double-check immediately before replacing There is no guarantee that we can still replace the node we want to replace at the end of the mirror job. Double-check by calling bdrv_recurse_can_replace(). Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20200218103454.296704-12-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-02-18 11:55:40 +01:00
Max Reitz	6b4907cf42	block: Remove bdrv_recurse_is_first_non_filter() It no longer has any users. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20200218103454.296704-11-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-02-18 11:55:40 +01:00
Max Reitz	a3ed794b36	quorum: Implement .bdrv_recurse_can_replace() Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200218103454.296704-9-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-02-18 11:55:40 +01:00
Max Reitz	998a6b2fc5	blkverify: Implement .bdrv_recurse_can_replace() Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20200218103454.296704-8-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-02-18 11:55:39 +01:00
Max Reitz	37a3791b38	quorum: Fix child permissions Quorum cannot share WRITE or RESIZE on its children. Presumably, it only does so because as a filter, it seemed intuitively correct to point its .bdrv_child_perm to bdrv_filter_default_perm(). However, it is not really a filter, and bdrv_filter_default_perm() does not work for it, so we have to provide a custom .bdrv_child_perm implementation. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20200218103454.296704-6-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-02-18 11:55:39 +01:00
Philippe Mathieu-Daudé	74e4a8a961	block/io_uring: Remove superfluous semicolon Fixes: `6663a0a337` Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200218094402.26625-5-philmd@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-02-18 10:54:02 +01:00
Kevin Wolf	9ad1e79f3f	commit: Fix is_read for block_job_error_action() block_job_error_action() needs to know if reading from the top node or writing to the base node failed so that it can set the right 'operation' in the BLOCK_JOB_ERROR QMP event. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20200214200812.28180-6-kwolf@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-02-18 10:53:56 +01:00
Kevin Wolf	0c42e175fc	commit: Inline commit_populate() commit_populate() is a very short function and only called in a single place. Its return value doesn't tell us whether an error happened while reading or writing, which would be necessary for sending the right data in the BLOCK_JOB_ERROR QMP event. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20200214200812.28180-5-kwolf@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-02-18 10:53:56 +01:00
Kevin Wolf	c5507b4d55	commit: Fix argument order for block_job_error_action() The block_job_error_action() error call in the commit job gives the on_err and is_read arguments in the wrong order. Fix this. (Of course, hard-coded is_read = false is wrong, too, but that's a separate problem for a separate patch.) Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20200214200812.28180-4-kwolf@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-02-18 10:53:56 +01:00
Kevin Wolf	d71e65ec1d	commit: Remove unused bytes_written The bytes_written variable is only ever written to, it serves no purpose. This has actually been the case since the commit job was first introduced in commit `747ff60263`. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20200214200812.28180-3-kwolf@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-02-18 10:53:56 +01:00
Philippe Mathieu-Daudé	5b1405db0f	block/qcow2-bitmap: Remove unneeded variable assignment Fix warning reported by Clang static code analyzer: CC block/qcow2-bitmap.o block/qcow2-bitmap.c:650:5: warning: Value stored to 'ret' is never read ret = -EINVAL; ^ ~~~~~~~ Fixes: `88ddffae8` Reported-by: Clang Static Analyzer Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200215161557.4077-2-philmd@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Ján Tomko <jtomko@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-02-18 10:53:56 +01:00
Kevin Wolf	c3b6658c1a	qcow2: Fix qcow2_alloc_cluster_abort() for external data file For external data file, cluster allocations return an offset in the data file and are not refcounted. In this case, there is nothing to do for qcow2_alloc_cluster_abort(). Freeing the same offset in the qcow2 file is wrong and causes crashes in the better case or image corruption in the worse case. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20200211094900.17315-3-kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-02-18 10:53:56 +01:00
Kevin Wolf	dea9052ef1	qcow2: update_refcount(): Reset old_table_index after qcow2_cache_put() In the case that update_refcount() frees a refcount block, it evicts it from the metadata cache. Before doing so, however, it returns the currently used refcount block to the cache because it might be the same. Returning the refcount block early means that we need to reset old_table_index so that we reload the refcount block in the next iteration if it is actually still in use. Fixes: `f71c08ea8e` Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20200211094900.17315-2-kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-02-18 10:53:56 +01:00
Hikaru Nishida	8475ea4854	block/vvfat: Do not unref qcow on closing backing bdrv Before this commit, BDRVVVFATState.qcow is unrefed in write_target_close on closing backing bdrv of vvfat. However, qcow bdrv is opend as a child of vvfat in enable_write_target() so it will be also unrefed on closing vvfat itself. This causes use-after-free of qcow on freeing vvfat which has backing bdrv and qcow bdrv as children in this order because bdrv_close(vvfat) tries to free qcow bdrv after freeing backing bdrv as QLIST_FOREACH_SAFE() loop keeps next pointer, but BdrvChild of qcow is already freed in bdrv_close(backing bdrv). Signed-off-by: Hikaru Nishida <hikarupsp@gmail.com> Message-Id: <20200209175156.85748-1-hikarupsp@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-02-18 10:53:56 +01:00
Alberto Garcia	2d4b5256cf	qcow2: Fix alignment checks in encrypted images I/O requests to encrypted media should be aligned to the sector size used by the underlying encryption method, not to BDRV_SECTOR_SIZE. Fortunately this doesn't break anything at the moment because both existing QCRYPTO_BLOCK_*_SECTOR_SIZE have the same value as BDRV_SECTOR_SIZE. The checks in qcow2_co_preadv_encrypted() are also unnecessary because they are repeated immediately afterwards in qcow2_co_encdec(). Signed-off-by: Alberto Garcia <berto@igalia.com> Message-Id: <20200213171646.15876-1-berto@igalia.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-02-18 10:53:56 +01:00
Kevin Wolf	7e6c4ff792	mirror: Don't let an operation wait for itself mirror_wait_for_free_in_flight_slot() just picks a random operation to wait for. However, when mirror_co_read() waits for free slots, its MirrorOp is already in s->ops_in_flight, so if not enough slots are immediately available, an operation can end up waiting for itself to complete, which results in a hang. Fix this by passing the current MirrorOp and skipping this operation when picking an operation to wait for. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1794692 Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2020-02-18 10:53:56 +01:00
Kevin Wolf	eed325b92c	mirror: Store MirrorOp.co for debuggability If a coroutine is launched, but the coroutine pointer isn't stored anywhere, debugging any problems inside the coroutine is quite hard. Let's store the coroutine pointer of a mirror operation in MirrorOp to have it available in the debugger. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2020-02-18 10:53:56 +01:00
Vladimir Sementsov-Ogievskiy	ac9d00bf7b	block: fix crash on zero-length unaligned write and read Commit `7a3f542fbd` "block/io: refactor padding" occasionally dropped aligning for zero-length request: bdrv_init_padding() blindly return false if bytes == 0, like there is nothing to align. This leads the following command to crash: ./qemu-io --image-opts -c 'write 1 0' \ driver=blkdebug,align=512,image.driver=null-co,image.size=512 >> qemu-io: block/io.c:1955: bdrv_aligned_pwritev: Assertion `(offset & (align - 1)) == 0' failed. >> Aborted (core dumped) Prior to `7a3f542fbd` we does aligning of such zero requests. Instead of recovering this behavior let's just do nothing on such requests as it is useless. Note that driver may have special meaning of zero-length reqeusts, like qcow2_co_pwritev_compressed_part, so we can't skip any zero-length operation. But for unaligned ones, we can't pass it to driver anyway. This commit also fixes crash in iotest 80 running with -nocache: ./check -nocache -qcow2 80 which crashes on same assertion due to trying to read empty extra data in qcow2_do_read_snapshots(). Cc: qemu-stable@nongnu.org # v4.2 Fixes: `7a3f542fbd` Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20200206164245.17781-1-vsementsov@virtuozzo.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-02-07 16:46:59 +00:00
Vladimir Sementsov-Ogievskiy	0df62f45c1	block/backup-top: fix failure path We can't access top after call bdrv_backup_top_drop, as it is already freed at this time. Also, no needs to unref target child by hand, it will be unrefed on bdrv_close() automatically. So, just do bdrv_backup_top_drop if append succeed and one bdrv_unref otherwise. Note, that in !appended case bdrv_unref(top) moved into drained section on source. It doesn't really matter, but just for code simplicity. Fixes: `7df7868b96` Cc: qemu-stable@nongnu.org # v4.2.0 Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20200121142802.21467-2-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-02-06 13:47:45 +01:00
Alberto Garcia	3afea40243	qcow2: Use BDRV_SECTOR_SIZE instead of the hardcoded value This replaces all remaining instances in the qcow2 code. Signed-off-by: Alberto Garcia <berto@igalia.com> Message-id: b5f74b606c2d9873b12d29acdb7fd498029c4025.1579374329.git.berto@igalia.com Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-02-06 13:47:45 +01:00
Alberto Garcia	25ae71db55	qcow2: Don't require aligned offsets in qcow2_co_copy_range_from() qemu-img's convert_co_copy_range() operates at the sector level and block_copy() operates at the cluster level so this condition is always true, but it is not necessary to restrict this here, so let's leave it to the driver implementation return an error if there is any. Signed-off-by: Alberto Garcia <berto@igalia.com> Message-id: a4264aaee656910c84161a2965f7a501437379ca.1579374329.git.berto@igalia.com Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-02-06 13:47:45 +01:00
Alberto Garcia	da86f8cbad	qcow2: Use bs->bl.request_alignment when updating an L1 entry When updating an L1 entry the qcow2 driver writes a (512-byte) sector worth of data to avoid a read-modify-write cycle. Instead of always writing 512 bytes we should follow the alignment requirements of the storage backend. (the only exception is when the alignment is larger than the cluster size because then we could be overwriting data after the L1 table) Signed-off-by: Alberto Garcia <berto@igalia.com> Message-id: 71f34d4ae4b367b32fb36134acbf4f4f7ee681f4.1579374329.git.berto@igalia.com Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-02-06 13:47:45 +01:00
Alberto Garcia	344ffea951	qcow2: Tighten cluster_offset alignment assertions qcow2_alloc_cluster_offset() and qcow2_get_cluster_offset() always return offsets that are cluster-aligned so don't just check that they are sector-aligned. The check in qcow2_co_preadv_task() is also replaced by an assertion for the same reason. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 558ba339965f858bede4c73ce3f50f0c0493597d.1579374329.git.berto@igalia.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-02-06 13:47:45 +01:00
Alberto Garcia	ef97d608c7	qcow2: Don't round the L1 table allocation up to the sector size The L1 table is read from disk using the byte-based bdrv_pread() and is never accessed beyond its last element, so there's no need to allocate more memory than that. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: b2e27214ec7b03a585931bcf383ee1ac3a641a10.1579374329.git.berto@igalia.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-02-06 13:47:45 +01:00
Alberto Garcia	7cdca2e233	qcow2: Use a GString in report_unsupported_feature() This is a bit more efficient than having to allocate and free memory for each item. The default size (60) is enough for all the existing incompatible features or the "Unknown incompatible feature" message. Suggested-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Message-id: 20200115135626.19442-1-berto@igalia.com Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-02-06 13:47:45 +01:00
Alberto Garcia	3a75a870ef	qcow2: Assert that host cluster offsets fit in L2 table entries The standard cluster descriptor in L2 table entries has a field to store the host cluster offset. When we need to get that offset from an entry we use L2E_OFFSET_MASK to ensure that we only use the bits that belong to that field. But while that mask is used every time we read from an L2 entry, it is never used when we write to it. Due to the QCOW_MAX_CLUSTER_OFFSET limit set in the cluster allocation code QEMU can never produce offsets that don't fit in that field so any such offset would indicate a bug in QEMU. Compressed cluster descriptors contain two fields (host cluster offset and size of the compressed data) and the situation with them is similar. In this case the masks are not constant but are stored in the csize_mask and cluster_offset_mask fields of BDRVQcow2State. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 20200113161146.20099-1-berto@igalia.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-02-06 13:47:45 +01:00
Aarushi Mehta	daffeb027b	block/io_uring: adds userspace completion polling Signed-off-by: Aarushi Mehta <mehta.aaru20@gmail.com> Acked-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20200120141858.587874-11-stefanha@redhat.com Message-Id: <20200120141858.587874-11-stefanha@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-01-30 20:59:42 +00:00
Aarushi Mehta	d803f59050	block: add trace events for io_uring Signed-off-by: Aarushi Mehta <mehta.aaru20@gmail.com> Acked-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20200120141858.587874-10-stefanha@redhat.com Message-Id: <20200120141858.587874-10-stefanha@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-01-30 20:59:42 +00:00
Aarushi Mehta	c644751069	block/file-posix.c: extend to use io_uring Signed-off-by: Aarushi Mehta <mehta.aaru20@gmail.com> Reviewed-by: Maxim Levitsky <maximlevitsky@gmail.com> Acked-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20200120141858.587874-9-stefanha@redhat.com Message-Id: <20200120141858.587874-9-stefanha@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-01-30 20:59:42 +00:00
Aarushi Mehta	6663a0a337	block/io_uring: implements interfaces for io_uring Aborts when sqe fails to be set as sqes cannot be returned to the ring. Adds slow path for short reads for older kernels Signed-off-by: Aarushi Mehta <mehta.aaru20@gmail.com> Acked-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20200120141858.587874-5-stefanha@redhat.com Message-Id: <20200120141858.587874-5-stefanha@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-01-30 20:59:41 +00:00
Paolo Bonzini	3ba0e1a00c	block/io: take bs->reqs_lock in bdrv_mark_request_serialising bdrv_mark_request_serialising is writing the overlap_offset and overlap_bytes fields of BdrvTrackedRequest. Take bs->reqs_lock for the whole duration of it, and not just when waiting for serialising requests, so that tracked_request_overlaps does not look at a half-updated request. The new code does not unlock/relock around retries. This is unnecessary because a retry is always preceded by a CoQueue wait, which already releases and reacquires bs->reqs_lock. Reported-by: Peter Lieven <pl@kamp.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 1578495356-46219-4-git-send-email-pbonzini@redhat.com Message-Id: <1578495356-46219-4-git-send-email-pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-01-30 20:59:41 +00:00
Paolo Bonzini	18fbd0dec7	block/io: wait for serialising requests when a request becomes serialising Marking without waiting would not result in actual serialising behavior. Thus, make a call bdrv_mark_request_serialising sufficient for serialisation to happen. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 1578495356-46219-3-git-send-email-pbonzini@redhat.com Message-Id: <1578495356-46219-3-git-send-email-pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-01-30 20:59:41 +00:00
Paolo Bonzini	c53cb42769	block: eliminate BDRV_REQ_NO_SERIALISING It is unused since commit `00e30f0` ("block/backup: use backup-top instead of write notifiers", 2019-10-01), drop it to simplify the code. While at it, drop redundant assertions on flags. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 1578495356-46219-2-git-send-email-pbonzini@redhat.com Message-Id: <1578495356-46219-2-git-send-email-pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-01-30 20:59:41 +00:00
Kevin Wolf	5fbf1d56c2	iscsi: Don't access non-existent scsi_lba_status_descriptor In iscsi_co_block_status(), we may have received num_descriptors == 0 from the iscsi server. Therefore, we can't unconditionally access lbas->descriptors[0]. Add the missing check. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Felipe Franciosi <felipe@nutanix.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Peter Lieven <pl@kamp.de>	2020-01-27 17:19:53 +01:00
Felipe Franciosi	693fd2acdf	iscsi: Cap block count from GET LBA STATUS (CVE-2020-1711) When querying an iSCSI server for the provisioning status of blocks (via GET LBA STATUS), Qemu only validates that the response descriptor zero's LBA matches the one requested. Given the SCSI spec allows servers to respond with the status of blocks beyond the end of the LUN, Qemu may have its heap corrupted by clearing/setting too many bits at the end of its allocmap for the LUN. A malicious guest in control of the iSCSI server could carefully program Qemu's heap (by selectively setting the bitmap) and then smash it. This limits the number of bits that iscsi_co_block_status() will try to update in the allocmap so it can't overflow the bitmap. Fixes: CVE-2020-1711 Cc: qemu-stable@nongnu.org Signed-off-by: Felipe Franciosi <felipe@nutanix.com> Signed-off-by: Peter Turschmid <peter.turschm@nutanix.com> Signed-off-by: Raphael Norwitz <raphael.norwitz@nutanix.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-01-27 17:19:53 +01:00
Eiichi Tsukata	fb574de81b	block/backup: fix memory leak in bdrv_backup_top_append() bdrv_open_driver() allocates bs->opaque according to drv->instance_size. There is no need to allocate it and overwrite opaque in bdrv_backup_top_append(). Reproducer: $ QTEST_QEMU_BINARY=./x86_64-softmmu/qemu-system-x86_64 valgrind -q --leak-check=full tests/test-replication -p /replication/secondary/start ==29792== 24 bytes in 1 blocks are definitely lost in loss record 52 of 226 ==29792== at 0x483AB1A: calloc (vg_replace_malloc.c:762) ==29792== by 0x4B07CE0: g_malloc0 (in /usr/lib64/libglib-2.0.so.0.6000.7) ==29792== by 0x12BAB9: bdrv_open_driver (block.c:1289) ==29792== by 0x12BEA9: bdrv_new_open_driver (block.c:1359) ==29792== by 0x1D15CB: bdrv_backup_top_append (backup-top.c:190) ==29792== by 0x1CC11A: backup_job_create (backup.c:439) ==29792== by 0x1CD542: replication_start (replication.c:544) ==29792== by 0x1401B9: replication_start_all (replication.c:52) ==29792== by 0x128B50: test_secondary_start (test-replication.c:427) ... Fixes: `7df7868b96` ("block: introduce backup-top filter driver") Signed-off-by: Eiichi Tsukata <devel@etsukata.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-01-27 17:19:53 +01:00
Sergio Lopez	0abf258171	block/backup-top: Don't acquire context while dropping top All paths that lead to bdrv_backup_top_drop(), except for the call from backup_clean(), imply that the BDS AioContext has already been acquired, so doing it there too can potentially lead to QEMU hanging on AIO_WAIT_WHILE(). An easy way to trigger this situation is by issuing a two actions transaction, with a proper and a bogus blockdev-backup, so the second one will trigger a rollback. This will trigger a hang with an stack trace like this one: #0 0x00007fb680c75016 in __GI_ppoll (fds=0x55e74580f7c0, nfds=1, timeout=<optimized out>, timeout@entry=0x0, sigmask=sigmask@entry=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:39 #1 0x000055e743386e09 in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/bits/poll2.h:77 #2 0x000055e743386e09 in qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>, timeout=<optimized out>) at util/qemu-timer.c:336 #3 0x000055e743388dc4 in aio_poll (ctx=0x55e7458925d0, blocking=blocking@entry=true) at util/aio-posix.c:669 #4 0x000055e743305dea in bdrv_flush (bs=bs@entry=0x55e74593c0d0) at block/io.c:2878 #5 0x000055e7432be58e in bdrv_close (bs=0x55e74593c0d0) at block.c:4017 #6 0x000055e7432be58e in bdrv_delete (bs=<optimized out>) at block.c:4262 #7 0x000055e7432be58e in bdrv_unref (bs=bs@entry=0x55e74593c0d0) at block.c:5644 #8 0x000055e743316b9b in bdrv_backup_top_drop (bs=bs@entry=0x55e74593c0d0) at block/backup-top.c:273 #9 0x000055e74331461f in backup_job_create (job_id=0x0, bs=bs@entry=0x55e7458d5820, target=target@entry=0x55e74589f640, speed=0, sync_mode=MIRROR_SYNC_MODE_FULL, sync_bitmap=sync_bitmap@entry=0x0, bitmap_mode=BITMAP_SYNC_MODE_ON_SUCCESS, compress=false, filter_node_name=0x0, on_source_error=BLOCKDEV_ON_ERROR_REPORT, on_target_error=BLOCKDEV_ON_ERROR_REPORT, creation_flags=0, cb=0x0, opaque=0x0, txn=0x0, errp=0x7ffddfd1efb0) at block/backup.c:478 #10 0x000055e74315bc52 in do_backup_common (backup=backup@entry=0x55e746c066d0, bs=bs@entry=0x55e7458d5820, target_bs=target_bs@entry=0x55e74589f640, aio_context=aio_context@entry=0x55e7458a91e0, txn=txn@entry=0x0, errp=errp@entry=0x7ffddfd1efb0) at blockdev.c:3580 #11 0x000055e74315c37c in do_blockdev_backup (backup=backup@entry=0x55e746c066d0, txn=0x0, errp=errp@entry=0x7ffddfd1efb0) at /usr/src/debug/qemu-kvm-4.2.0-2.module+el8.2.0+5135+ed3b2489.x86_64/./qapi/qapi-types-block-core.h:1492 #12 0x000055e74315c449 in blockdev_backup_prepare (common=0x55e746a8de90, errp=0x7ffddfd1f018) at blockdev.c:1885 #13 0x000055e743160152 in qmp_transaction (dev_list=<optimized out>, has_props=<optimized out>, props=0x55e7467fe2c0, errp=errp@entry=0x7ffddfd1f088) at blockdev.c:2340 #14 0x000055e743287ff5 in qmp_marshal_transaction (args=<optimized out>, ret=<optimized out>, errp=0x7ffddfd1f0f8) at qapi/qapi-commands-transaction.c:44 #15 0x000055e74333de6c in do_qmp_dispatch (errp=0x7ffddfd1f0f0, allow_oob=<optimized out>, request=<optimized out>, cmds=0x55e743c28d60 <qmp_commands>) at qapi/qmp-dispatch.c:132 #16 0x000055e74333de6c in qmp_dispatch (cmds=0x55e743c28d60 <qmp_commands>, request=<optimized out>, allow_oob=<optimized out>) at qapi/qmp-dispatch.c:175 #17 0x000055e74325c061 in monitor_qmp_dispatch (mon=0x55e745908030, req=<optimized out>) at monitor/qmp.c:145 #18 0x000055e74325c6fa in monitor_qmp_bh_dispatcher (data=<optimized out>) at monitor/qmp.c:234 #19 0x000055e743385866 in aio_bh_call (bh=0x55e745807ae0) at util/async.c:117 #20 0x000055e743385866 in aio_bh_poll (ctx=ctx@entry=0x55e7458067a0) at util/async.c:117 #21 0x000055e743388c54 in aio_dispatch (ctx=0x55e7458067a0) at util/aio-posix.c:459 #22 0x000055e743385742 in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>, user_data=<optimized out>) at util/async.c:260 #23 0x00007fb68543e67d in g_main_dispatch (context=0x55e745893a40) at gmain.c:3176 #24 0x00007fb68543e67d in g_main_context_dispatch (context=context@entry=0x55e745893a40) at gmain.c:3829 #25 0x000055e743387d08 in glib_pollfds_poll () at util/main-loop.c:219 #26 0x000055e743387d08 in os_host_main_loop_wait (timeout=<optimized out>) at util/main-loop.c:242 #27 0x000055e743387d08 in main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:518 #28 0x000055e74316a3c1 in main_loop () at vl.c:1828 #29 0x000055e743016a72 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4504 Fix this by not acquiring the AioContext there, and ensuring all paths leading to it have it already acquired (backup_clean()). RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1782111 Signed-off-by: Sergio Lopez <slp@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-01-27 17:19:53 +01:00
Wangyong	2558cb8dd4	linux-aio: increasing MAX_EVENTS to a larger hardcoded value Since commit `6040aedddb` "virtio-blk: make queue size configurable",if the user set the queue size to more than 128 ,it will not take effect. That's because linux aio's maximum outstanding requests at a time is always less than or equal to 128. This patch simply increase MAX_EVENTS to a larger hardcoded value of 1024 as a shortterm fix. Signed-off-by: wangyong <wang.yongD@h3c.com> Message-id: faa5781afd354a96a0be152b288f636f@h3c.com Message-Id: <faa5781afd354a96a0be152b288f636f@h3c.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-01-13 16:41:45 +00:00
Max Reitz	503ca1262b	backup-top: Begin drain earlier When dropping backup-top, we need to drain the node before freeing the BlockCopyState. Otherwise, requests may still be in flight and then the assertion in shres_destroy() will fail. (This becomes visible in intermittent failure of 056.) Cc: qemu-stable@nongnu.org Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20191219182638.104621-1-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-01-06 14:26:23 +01:00
Andrey Shinkevich	0d483dce38	qcow2: Allow writing compressed data of multiple clusters QEMU currently supports writing compressed data of the size equal to one cluster. This patch allows writing QCOW2 compressed data that exceed one cluster. Now, we split buffered data into separate clusters and write them compressed using the block/aio_task API. Suggested-by: Pavel Butsykin <pbutsykin@virtuozzo.com> Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 1575288906-551879-3-git-send-email-andrey.shinkevich@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-01-06 13:43:07 +01:00
Andrey Shinkevich	f41388e0fb	block: introduce compress filter driver Allow writing all the data compressed through the filter driver. The written data will be aligned by the cluster size. Based on the QEMU current implementation, that data can be written to unallocated clusters only. May be used for a backup job. Suggested-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 1575288906-551879-2-git-send-email-andrey.shinkevich@virtuozzo.com [mreitz: Replace NULL bdrv_get_format_name() by "(no format)"] Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-01-06 13:43:07 +01:00
Vladimir Sementsov-Ogievskiy	a1db8733d2	qcow2-bitmaps: fix qcow2_can_store_new_dirty_bitmap qcow2_can_store_new_dirty_bitmap works wrong, as it considers only bitmaps already stored in the qcow2 image and ignores persistent BdrvDirtyBitmap objects. So, let's instead count persistent BdrvDirtyBitmaps. We load all qcow2 bitmaps on open, so there should not be any bitmap in the image for which we don't have BdrvDirtyBitmaps version. If it is - it's a kind of corruption, and no reason to check for corruptions here (open() and close() are better places for it). Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20191014115126.15360-2-vsementsov@virtuozzo.com Reviewed-by: Max Reitz <mreitz@redhat.com> Cc: qemu-stable@nongnu.org Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-01-06 13:43:06 +01:00
PanNengyuan	88be15a9e1	throttle-groups: fix memory leak in throttle_group_set_limit: This avoid a memory leak when qom-set is called to set throttle_group limits, here is an easy way to reproduce: 1. run qemu-iotests as follow and check the result with asan: ./check -qcow2 184 Following is the asan output backtrack: Direct leak of 912 byte(s) in 3 object(s) allocated from: #0 0xffff8d7ab3c3 in __interceptor_calloc (/lib64/libasan.so.4+0xd33c3) #1 0xffff8d4c31cb in g_malloc0 (/lib64/libglib-2.0.so.0+0x571cb) #2 0x190c857 in qobject_input_start_struct /mnt/sdc/qemu-master/qemu-4.2.0-rc0/qapi/qobject-input-visitor.c:295 #3 0x19070df in visit_start_struct /mnt/sdc/qemu-master/qemu-4.2.0-rc0/qapi/qapi-visit-core.c:49 #4 0x1948b87 in visit_type_ThrottleLimits qapi/qapi-visit-block-core.c:3759 #5 0x17e4aa3 in throttle_group_set_limits /mnt/sdc/qemu-master/qemu-4.2.0-rc0/block/throttle-groups.c:900 #6 0x1650eff in object_property_set /mnt/sdc/qemu-master/qemu-4.2.0-rc0/qom/object.c:1272 #7 0x1658517 in object_property_set_qobject /mnt/sdc/qemu-master/qemu-4.2.0-rc0/qom/qom-qobject.c:26 #8 0x15880bb in qmp_qom_set /mnt/sdc/qemu-master/qemu-4.2.0-rc0/qom/qom-qmp-cmds.c:74 #9 0x157e3e3 in qmp_marshal_qom_set qapi/qapi-commands-qom.c:154 Reported-by: Euler Robot <euler.robot@huawei.com> Signed-off-by: PanNengyuan <pannengyuan@huawei.com> Message-id: 1574835614-42028-1-git-send-email-pannengyuan@huawei.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-01-06 13:43:06 +01:00
Max Reitz	69c6449ff1	blkdebug: Allow taking/unsharing permissions Sometimes it is useful to be able to add a node to the block graph that takes or unshare a certain set of permissions for debugging purposes. This patch adds this capability to blkdebug. (Note that you cannot make blkdebug release or share permissions that it needs to take or cannot share, because this might result in assertion failures in the block layer. But if the blkdebug node has no parents, it will not take any permissions and share everything by default, so you can then freely choose what permissions to take and share.) Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20191108123455.39445-4-mreitz@redhat.com Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-01-06 13:43:06 +01:00
Peter Maydell	1010af540b	Block layer patches: - qemu-img: fix info --backing-chain --image-opts - Error out on image creation with conflicting size options - Fix external snapshot with VM state - hmp: Allow using qdev ID for qemu-io command - Misc code cleanup - Many iotests improvements -----BEGIN PGP SIGNATURE----- iQIcBAABAgAGBQJd+7H/AAoJEH8JsnLIjy/WwIoP/igzSCfiH7yOzWexk7J7DUh7 SplXRiBQSQAZ8KD2umUk80UO6/W19rHAl3bLLu50D5uzQ5PNz1wSW2S81JZsj0nQ ErUEr5yxLGAI/zHIw/LrjUSEygbqcZ21R7foNmIy3lInEXf6gGwPQEnZnZgGkH0x v5jwv1HpUpEyetnHunX2cK3JpSNEJCYsV+X1nzDwhjYFuGQq0nlhgUZ8BDt6+fEr b/S3TuHmj/FXMH+5wrSr0LiCKaIyAwPdREvC+61aklMNFuA8YwF33tOaJLoWeQCD qhxFA8jVd8b3dQ0NZXXbliXST5d6AzjpeqbNzsyKze1duVfcfF1RYsC6McojpQ7f 9qFIXPC9IHe3sNkUZpTtvONLnmCeHTc9lwVIyVp9B5KXcmcLedvYM04WqhZ+/eeJ BfgCvXOyF9sL2GaVPFD+98E2DOYG4dI4rmzqn98kQj+RAXDFvvJ+zxuDUC9omE6T pVvxczGPXmnxP2ITRNljJyLVCN1YgE2g9S0GAXNM2i4uOXvhFToEwrJLFytVeVsw UwtYkE0F8KJyqjje5N3HiXclBVa++PK3+jjNFA+3B6XZ9wcZZRS2ZXmzsyNOJ7qG AeIPtrqBCz+QKsHtmS9CwYpvLrPafobHA3WS2z0dsevcWBHYBQBEM2EgXRFC3N3a mymYGxMOsG6+8onwqwjJ =laVP -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging Block layer patches: - qemu-img: fix info --backing-chain --image-opts - Error out on image creation with conflicting size options - Fix external snapshot with VM state - hmp: Allow using qdev ID for qemu-io command - Misc code cleanup - Many iotests improvements # gpg: Signature made Thu 19 Dec 2019 17:23:11 GMT # gpg: using RSA key 7F09B272C88F2FD6 # gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" [full] # Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6 * remotes/kevin/tags/for-upstream: (30 commits) iotests: Test external snapshot with VM state hmp: Allow using qdev ID for qemu-io command block: Activate recursively even for already active nodes iotests: 211: Remove duplication with VM.blockdev_create() iotests: 207: Remove duplication with VM.blockdev_create() iotests: 266: Convert to VM.blockdev_create() iotests: 237: Convert to VM.blockdev_create() iotests: 213: Convert to VM.blockdev_create() iotests: 212: Convert to VM.blockdev_create() iotests: 210: Convert to VM.blockdev_create() iotests: 206: Convert to VM.blockdev_create() iotests: 255: Drop blockdev_create() iotests: Create VM.blockdev_create() qcow2: Move error check of local_err near its assignment iotests: Fix IMGOPTSSYNTAX for nbd iotests/273: Filter format-specific information iotests: Add more "_require_drivers" checks to the shell-based tests MAINTAINERS: fix qcow2-bitmap.c under Dirty Bitmaps header qcow2: Use offset_into_cluster() iotests: Support job-complete in run_job() ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2019-12-20 18:25:32 +00:00
Tuguoyi	66be5c3e78	qcow2: Move error check of local_err near its assignment The local_err check outside of the if block was necessary when it was introduced in commit `d1258dd0c8` because it needed to be executed even if qcow2_load_autoloading_dirty_bitmaps() returned false. After some modifications that all required the error check to remain where it is, commit `9c98f145df` finally moved the qcow2_load_dirty_bitmaps() call into the if block, so now the error check should be there, too. Signed-off-by: Guoyi Tu <tu.guoyi@h3c.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-12-19 14:31:52 +01:00
Alberto Garcia	74e60fb56a	qcow2: Use offset_into_cluster() There's a couple of places left in the qcow2 code that still do the calculation manually, so let's replace them. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-12-18 11:21:16 +01:00
Kevin Wolf	3b65081638	qcow2: Declare BDRV_REQ_NO_FALLBACK supported In the common case, qcow2_co_pwrite_zeroes() already only modifies metadata case, so we're fine with or without BDRV_REQ_NO_FALLBACK set. The only exception is when using an external data file, where the request is passed down to the block driver of the external data file. We are forwarding the BDRV_REQ_NO_FALLBACK flag there, though, so this is fine, too. Declare the flag supported therefore. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com>	2019-12-18 11:21:16 +01:00
Vladimir Sementsov-Ogievskiy	d936613547	nbd: assert that Error** is not NULL in nbd_iter_channel_error All callers of nbd_iter_channel_error() pass the address of a local_err variable, and only call this function if an error has already occurred, using this function to propagate that error. This is already implied by its name (local_err instead of the classic errp), but it is worth additionally stressing this by adding an assertion to make it part of the function contract. The local_err parameter is not here to return information about nbd_iter_channel_error failure. Instead it's assumed to be filled when passed to the function. This is already stressed by its name (local_err, instead of classic errp). Stress it additionally by assertion. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20191205174635.18758-22-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>	2019-12-18 08:43:19 +01:00
Vladimir Sementsov-Ogievskiy	e53a578a8b	block/snapshot: rename Error ** parameter to more common errp Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@Redhat.com> Message-Id: <20191205174635.18758-11-vsementsov@virtuozzo.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>	2019-12-18 08:43:19 +01:00
Vladimir Sementsov-Ogievskiy	f56281abd9	block/qcow2-bitmap: fix crash bug in qcow2_co_remove_persistent_dirty_bitmap Here is double bug: First, return error but not set errp. This may lead to: qmp block-dirty-bitmap-remove may report success when actually failed block-dirty-bitmap-remove used in a transaction will crash, as qmp_transaction will think that it returned success and will call block_dirty_bitmap_remove_commit which will crash, as state->bitmap is NULL Second (like in anecdote), this case is not an error at all. As it is documented in the comment above bdrv_co_remove_persistent_dirty_bitmap definition, absence of bitmap is not an error, and similar case handled at start of qcow2_co_remove_persistent_dirty_bitmap, it returns 0 when there is no bitmaps at all. But when there are some bitmaps, but not the requested one, it return error with errp unset. Fix that. Trigger: 1. create persistent bitmap A 2. shutdown vm (bitmap A is synced) 3. start vm 4. create persistent bitmap B 5. remove bitmap B - it fails (and crashes if in transaction) Potential workaround (rather invasive to ask clients to implement it): 1. create persistent bitmap A 2. shutdown vm 3. start vm 4. create persistent bitmap B 5. remember, that we want to remove bitmap B after vm shutdown ... some other operations ... 6. vm shutdown 7. start vm in stopped mode, and remove all bitmaps marked for removing 8. stop vm Fixes: `b56a1e3175` Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20191205193049.30666-1-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> [eblake: commit message tweaks] Signed-off-by: Eric Blake <eblake@redhat.com>	2019-12-09 09:23:04 -06:00
Markus Armbruster	cb09104ea8	block/file-posix: Fix laio_init() error handling crash bug raw_aio_attach_aio_context() passes uninitialized Error *local_err by reference to laio_init() via aio_setup_linux_aio(). When laio_init() fails, it passes it on to error_setg_errno(), tripping error_setv()'s assertion unless @local_err is null by dumb luck. Fix by initializing @local_err properly. Fixes: `ed6e216171` Cc: Nishanth Aravamudan <naravamudan@digitalocean.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Cc: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20191130194240.10517-4-armbru@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com>	2019-12-02 16:14:41 +01:00
Vladimir Sementsov-Ogievskiy	a505475b95	block/qcow2-bitmap: fix bitmap migration Fix bitmap migration with dirty-bitmaps capability enabled and shared storage. We should ignore IN_USE bitmaps in the image on target, when migrating bitmaps through migration channel, see new comment below. Fixes: `74da6b9435` Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20191125125229.13531-2-vsementsov@virtuozzo.com Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-11-26 14:18:01 +01:00
Eric Blake	93676c88d7	nbd: Don't send oversize strings Qemu as server currently won't accept export names larger than 256 bytes, nor create dirty bitmap names longer than 1023 bytes, so most uses of qemu as client or server have no reason to get anywhere near the NBD spec maximum of a 4k limit per string. However, we weren't actually enforcing things, ignoring when the remote side violates the protocol on input, and also having several code paths where we send oversize strings on output (for example, qemu-nbd --description could easily send more than 4k). Tighten things up as follows: client: - Perform bounds check on export name and dirty bitmap request prior to handing it to server - Validate that copied server replies are not too long (ignoring NBD_INFO_* replies that are not copied is not too bad) server: - Perform bounds check on export name and description prior to advertising it to client - Reject client name or metadata query that is too long - Adjust things to allow full 4k name limit rather than previous 256 byte limit Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20191114024635.11363-4-eblake@redhat.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>	2019-11-18 16:01:34 -06:00
Eric Blake	cf7c49cf6a	bitmap: Enforce maximum bitmap name length We document that for qcow2 persistent bitmaps, the name cannot exceed 1023 bytes. It is inconsistent if transient bitmaps do not have to abide by the same limit, and it is unlikely that any existing client even cares about using bitmap names this long. It's time to codify that ALL bitmaps managed by qemu (whether persistent in qcow2 or not) have a documented maximum length. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20191114024635.11363-3-eblake@redhat.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>	2019-11-18 16:01:34 -06:00
Max Reitz	24552feb6a	qcow2: Fix QCOW2_COMPRESSED_SECTOR_MASK Masks for L2 table entries should have 64 bit. Fixes: `b6c246942b` Buglink: https://bugs.launchpad.net/qemu/+bug/1850000 Cc: qemu-stable@nongnu.org Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20191028161841.1198-2-mreitz@redhat.com Reviewed-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-11-07 14:37:46 +01:00
Tuguoyi	570542ecb1	qcow2-bitmap: Fix uint64_t left-shift overflow There are two issues in In check_constraints_on_bitmap(), 1) The sanity check on the granularity will cause uint64_t integer left-shift overflow when cluster_size is 2M and the granularity is BIGGER than 32K. 2) The way to calculate image size that the maximum bitmap supported can map to is a bit incorrect. This patch fix it by add a helper function to calculate the number of bytes needed by a normal bitmap in image and compare it to the maximum bitmap bytes supported by qemu. Fixes: `5f72826e7f` Signed-off-by: Guoyi Tu <tu.guoyi@h3c.com> Message-id: 4ba40cd1e7ee4a708b40899952e49f22@h3c.com Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Cc: qemu-stable@nongnu.org Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-11-07 14:37:33 +01:00
Max Reitz	292d06b925	block/file-posix: Let post-EOF fallocate serialize The XFS kernel driver has a bug that may cause data corruption for qcow2 images as of qemu commit `c8bb23cbdb`. We can work around it by treating post-EOF fallocates as serializing up until infinity (INT64_MAX in practice). Cc: qemu-stable@nongnu.org Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20191101152510.11719-4-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-11-04 09:33:51 +01:00
Max Reitz	c28107e9e5	block: Add bdrv_co_get_self_request() Cc: qemu-stable@nongnu.org Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20191101152510.11719-3-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-11-04 09:32:51 +01:00
Max Reitz	304d9d7f03	block: Make wait/mark serialising requests public Make both bdrv_mark_request_serialising() and bdrv_wait_serialising_requests() public so they can be used from block drivers. Cc: qemu-stable@nongnu.org Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20191101152510.11719-2-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-11-04 09:29:15 +01:00
Vladimir Sementsov-Ogievskiy	dcfbece684	block/block-copy: fix s->copy_size for compressed cluster `0e2402452f` allowed writes larger than cluster, but that's unsupported for compressed write. Fix it. Fixes: `0e2402452f` Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20191029150934.26416-1-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-11-04 09:21:45 +01:00
Peter Maydell	aaffb85335	Block patches for softfreeze: - iotest patches - Improve performance of the mirror block job in write-blocking mode - Limit memory usage for the backup block job - Add discard and write-zeroes support to the NVMe host block driver - Fix a bug in the mirror job - Prevent the qcow2 driver from creating technically non-compliant qcow2 v3 images (where there is not enough extra data for snapshot table entries) - Allow callers of bdrv_truncate() (etc.) to determine whether the file must be resized to the exact given size or whether it is OK for block devices not to shrink -----BEGIN PGP SIGNATURE----- iQFGBAABCAAwFiEEkb62CjDbPohX0Rgp9AfbAGHVz0AFAl2224ESHG1yZWl0ekBy ZWRoYXQuY29tAAoJEPQH2wBh1c9AeXMH/RXKEX4BZYMRKCe41P18tJC9Bl2x0T20 YeOsZVvpARlr7o/36BF2kGFF4MnL0OQ+9ELuyROX865rk/VL2rWqnHDE5oQM889a dFwMs+0zvNbig3iLNcw0H5OkE2mrdM+a1EUdn/lBe/39Z8dPqPxRGqIYHq38Ugdu emwSy1nWen7o0f71HRJfyVtI3KcrzXx71FrA/FY2yL/eHz+zRYGZj2SpAdFPkXP/ lgaz+m0tWhnSW1QzEOXB0Gh69ULt/DczCinYmv5qUY1noW5TPPtiDNCQTts5O4ba oJsR3AJv5/l9m65JTmiyQSqnQfPcstrQ5FqOcSnP637cfqUFyWsvdks= =L7v1 -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/maxreitz/tags/pull-block-2019-10-28' into staging Block patches for softfreeze: - iotest patches - Improve performance of the mirror block job in write-blocking mode - Limit memory usage for the backup block job - Add discard and write-zeroes support to the NVMe host block driver - Fix a bug in the mirror job - Prevent the qcow2 driver from creating technically non-compliant qcow2 v3 images (where there is not enough extra data for snapshot table entries) - Allow callers of bdrv_truncate() (etc.) to determine whether the file must be resized to the exact given size or whether it is OK for block devices not to shrink # gpg: Signature made Mon 28 Oct 2019 12:13:53 GMT # gpg: using RSA key 91BEB60A30DB3E8857D11829F407DB0061D5CF40 # gpg: issuer "mreitz@redhat.com" # gpg: Good signature from "Max Reitz <mreitz@redhat.com>" [full] # Primary key fingerprint: 91BE B60A 30DB 3E88 57D1 1829 F407 DB00 61D5 CF40 * remotes/maxreitz/tags/pull-block-2019-10-28: (69 commits) qemu-iotests: restrict 264 to qcow2 only Revert "qemu-img: Check post-truncation size" block: Pass truncate exact=true where reasonable block: Let format drivers pass @exact block: Evaluate @exact in protocol drivers block: Add @exact parameter to bdrv_co_truncate() block: Do not truncate file node when formatting block/cor: Drop cor_co_truncate() block: Handle filter truncation like native impl. iotests: Test qcow2's snapshot table handling iotests: Add peek_file* functions qcow2: Fix v3 snapshot table entry compliancy qcow2: Repair snapshot table with too many entries qcow2: Fix overly long snapshot tables qcow2: Keep track of the snapshot table length qcow2: Fix broken snapshot table entries qcow2: Add qcow2_check_fix_snapshot_table() qcow2: Separate qcow2_check_read_snapshot_table() qcow2: Write v3-compliant snapshot list on upgrade qcow2: Put qcow2_upgrade() into its own function ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2019-10-28 14:40:01 +00:00
Max Reitz	e8d04f9237	block: Pass truncate exact=true where reasonable This is a change in behavior, so all instances need a good justification. The comments added here should explain my reasoning. qed already had a comment that suggests it always expected bdrv_truncate()/blk_truncate() to behave as if exact=true were passed (`c743849bee` came eight months before `55b949c847`), so it was simply broken until now. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20190918095144.955-8-mreitz@redhat.com Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> [mreitz: Changed comment in qed.c to explain why a new QED file must be empty, as requested and suggested by Maxim] Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-28 12:08:45 +01:00
Max Reitz	e61a28a9b6	block: Let format drivers pass @exact When truncating a format node, the @exact parameter is generally handled simply by virtue of the format storing the new size in the image metadata. Such formats do not need to pass on the parameter to their file nodes. There are exceptions, though: - raw and crypto cannot store the image size, and thus must pass on @exact. - When using qcow2 with an external data file, it just makes sense to keep its size in sync with the qcow2 virtual disk (because the external data file is the virtual disk). Therefore, we should pass @exact when truncating it. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20190918095144.955-7-mreitz@redhat.com Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-28 12:05:30 +01:00
Max Reitz	82325ae5f2	block: Evaluate @exact in protocol drivers We have two protocol drivers that return success when trying to shrink a block device even though they cannot shrink it. This behavior is now only allowed with exact=false, so they should return an error with exact=true. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20190918095144.955-6-mreitz@redhat.com Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-28 12:05:24 +01:00
Max Reitz	c80d8b06cf	block: Add @exact parameter to bdrv_co_truncate() We have two drivers (iscsi and file-posix) that (in some cases) return success from their .bdrv_co_truncate() implementation if the block device is larger than the requested offset, but cannot be shrunk. Some callers do not want that behavior, so this patch adds a new parameter that they can use to turn off that behavior. This patch just adds the parameter and lets the block/io.c and block/block-backend.c functions pass it around. All other callers always pass false and none of the implementations evaluate it, so that this patch does not change existing behavior. Future patches take care of that. Suggested-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20190918095144.955-5-mreitz@redhat.com Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-28 12:00:07 +01:00
Max Reitz	26536c7fc2	block: Do not truncate file node when formatting There is no reason why the format drivers need to truncate the protocol node when formatting it. When using the old .bdrv_co_create_ops() interface, the file will be created with no size option anyway, which generally gives it a size of 0. (Exceptions are block devices, which cannot be truncated anyway.) When using blockdev-create, the user must have given the file node some size anyway, so there is no reason why we should override that. qed is an exception, it needs the file to start completely empty (as explained by `c743849bee`). Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20190918095144.955-4-mreitz@redhat.com Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-28 11:59:57 +01:00
Max Reitz	bb8160eb78	block/cor: Drop cor_co_truncate() No other filter driver has a .bdrv_co_truncate() implementation, and there is no need to because the general block layer code can handle it just as well. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20190918095144.955-3-mreitz@redhat.com Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-28 11:59:51 +01:00
Max Reitz	6b7e8f8b1c	block: Handle filter truncation like native impl. Make the filter truncation (passing it through to bs->file) a first-class citizen and handle it exactly as if it was the filter driver's native implementation of .bdrv_co_truncate(). I do not see a reason not to, it makes the code a bit shorter, and may be even more correct because this gets us to finish the write_req that we prepared before (may be important to e.g. bring dirty bitmaps to the correct size). Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20190918095144.955-2-mreitz@redhat.com Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-28 11:59:45 +01:00
Max Reitz	e40e6e88f6	qcow2: Fix v3 snapshot table entry compliancy qcow2 v3 images require every snapshot table entry to have at least 16 bytes of extra data. If they do not, let qemu-img check -r all fix it. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 20191011152814.14791-15-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-28 11:54:09 +01:00
Max Reitz	d2b1d1ec73	qcow2: Repair snapshot table with too many entries The user cannot choose which snapshots are removed. This is fine because we have chosen the maximum snapshot table size to be so large (65536 entries) that it cannot be reasonably reached. If the snapshot table exceeds this size, the image has probably been corrupted in some way; in this case, it is most important to just make the image usable such that the user can copy off at least the active layer. (Also note that the snapshots will be removed only with "-r all", so a plain "check" or "check -r leaks" will not delete any data.) Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 20191011152814.14791-14-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-28 11:54:08 +01:00
Max Reitz	099febf3ac	qcow2: Fix overly long snapshot tables We currently refuse to open qcow2 images with overly long snapshot tables. This patch makes qemu-img check -r all drop all offending entries past what we deem acceptable. The user cannot choose which snapshots are removed. This is fine because we have chosen the maximum snapshot table size to be so large (64 MB) that it cannot be reasonably reached. If the snapshot table exceeds this size, the image has probably been corrupted in some way; in this case, it is most important to just make the image usable such that the user can copy off at least the active layer. (Also note that the snapshots will be removed only with "-r all", so a plain "check" or "check -r leaks" will not delete any data.) Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 20191011152814.14791-13-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-28 11:54:04 +01:00
Max Reitz	624143355c	qcow2: Keep track of the snapshot table length When repairing the snapshot table, we truncate entries that have too much extra data. This frees up space that we do not have to count towards the snapshot table size. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 20191011152814.14791-12-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-28 11:54:02 +01:00
Max Reitz	f91f1f159b	qcow2: Fix broken snapshot table entries The only case where we currently reject snapshot table entries is when they have too much extra data. Fix them with qemu-img check -r all by counting it as a corruption, reducing their extra_data_size, and then letting qcow2_check_fix_snapshot_table() do the rest. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 20191011152814.14791-11-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-28 11:54:02 +01:00
Max Reitz	fe446b5da2	qcow2: Add qcow2_check_fix_snapshot_table() qcow2_check_read_snapshot_table() can perform consistency checks, but it cannot fix everything. Specifically, it cannot allocate new clusters, because that should wait until the refcount structures are known to be consistent (i.e., after qcow2_check_refcounts()). Thus, it cannot call qcow2_write_snapshots(). Do that in qcow2_check_fix_snapshot_table(), which is called after qcow2_check_refcounts(). Currently, there is nothing that would set result->corruptions, so this is a no-op. A follow-up patch will change that. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 20191011152814.14791-10-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-28 11:54:01 +01:00
Max Reitz	8bc584fe03	qcow2: Separate qcow2_check_read_snapshot_table() Reading the snapshot table can fail. That is a problem when we want to repair the image. Therefore, stop reading the snapshot table in qcow2_do_open() in check mode. Instead, add a new function qcow2_check_read_snapshot_table() that reads the snapshot table at a later point. In the future, we want to handle errors here and fix them. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 20191011152814.14791-9-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-28 11:54:00 +01:00
Max Reitz	0a85af351d	qcow2: Write v3-compliant snapshot list on upgrade qcow2 v3 requires every snapshot table entry to have two extra data fields: The 64-bit VM state size, and the virtual disk size. Both are optional for v2 images, so they may not be present. qcow2_upgrade() therefore should update the snapshot table to ensure all entries have these extra data fields. Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1727347 Reported-by: Eric Blake <eblake@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 20191011152814.14791-8-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-28 11:53:52 +01:00

... 7 8 9 10 11 ...

5360 Commits