qemu-e2k

Commit Graph

Author	SHA1	Message	Date
Vladimir Sementsov-Ogievskiy	ae9d441706	block: bdrv_append(): don't consume reference We have too much comments for this feature. It seems better just don't do it. Most of real users (tests don't count) have to create additional reference. Drop also comment in external_snapshot_prepare: - bdrv_append doesn't "remove" old bs in common sense, it sounds strange - the fact that bdrv_append can fail is obvious from the context - the fact that we must rollback all changes in transaction abort is known (it's the direct role of abort) Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20210428151804.439460-5-vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-04-30 12:27:47 +02:00
Vladimir Sementsov-Ogievskiy	0267101af6	block/nbd: fix possible use after free of s->connect_thread If on nbd_close() we detach the thread (in nbd_co_establish_connection_cancel() thr->state becomes CONNECT_THREAD_RUNNING_DETACHED), after that point we should not use s->connect_thread (which is set to NULL), as running thread may free it at any time. Still nbd_co_establish_connection() does exactly this: it saves s->connect_thread to local variable (just for better code style) and use it even after yield point, when thread may be already detached. Fix that. Also check thr to be non-NULL on nbd_co_establish_connection() start for safety. After this patch "case CONNECT_THREAD_RUNNING_DETACHED" becomes impossible in the second switch in nbd_co_establish_connection(). Still, don't add extra abort() just before the release. If it somehow possible to reach this "case:" it won't hurt. Anyway, good refactoring of all this reconnect mess will come soon. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210406155114.1057355-1-vsementsov@virtuozzo.com> Reviewed-by: Roman Kagan <rvkagan@yandex-team.ru> Signed-off-by: Max Reitz <mreitz@redhat.com>	2021-04-13 15:35:12 +02:00
Max Reitz	00769414cd	mirror: Do not enter a paused job on completion Currently, it is impossible to complete jobs on standby (i.e. paused ready jobs), but actually the only thing in mirror_complete() that does not work quite well with a paused job is the job_enter() at the end. If we make it conditional, this function works just fine even if the mirror job is paused. So technically this is a no-op, but obviously the intention is to accept block-job-complete even for jobs on standby, which we need this patch for first. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210409120422.144040-3-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-04-09 18:00:29 +02:00
Max Reitz	c41f5b96ee	mirror: Move open_backing_file to exit_common This is a graph change and therefore should be done in job-finalize (which is what invokes mirror_exit_common()). Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210409120422.144040-2-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-04-09 18:00:29 +02:00
Stefano Garzarella	b084b420d9	block/rbd: fix memory leak in qemu_rbd_co_create_opts() When we allocate 'q_namespace', we forgot to set 'has_q_namespace' to true. This can cause several issues, including a memory leak, since qapi_free_BlockdevCreateOptions() does not deallocate that memory, as reported by valgrind: 13 bytes in 1 blocks are definitely lost in loss record 7 of 96 at 0x4839809: malloc (vg_replace_malloc.c:307) by 0x48CEBB8: g_malloc (in /usr/lib64/libglib-2.0.so.0.6600.8) by 0x48E3FE3: g_strdup (in /usr/lib64/libglib-2.0.so.0.6600.8) by 0x180010: qemu_rbd_co_create_opts (rbd.c:446) by 0x1AE72C: bdrv_create_co_entry (block.c:492) by 0x241902: coroutine_trampoline (coroutine-ucontext.c:173) by 0x57530AF: ??? (in /usr/lib64/libc-2.32.so) by 0x1FFEFFFA6F: ??? Fix setting 'has_q_namespace' to true when we allocate 'q_namespace'. Fixes: `19ae9ae014` ("block/rbd: Add support for ceph namespaces") Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Message-Id: <20210329150129.121182-3-sgarzare@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-04-09 18:00:29 +02:00
Stefano Garzarella	c1c1f6cf51	block/rbd: fix memory leak in qemu_rbd_connect() In qemu_rbd_connect(), 'mon_host' is allocated by qemu_rbd_mon_host() using g_strjoinv(), but it's only freed in the error path, leaking memory in the success path as reported by valgrind: 80 bytes in 4 blocks are definitely lost in loss record 5,028 of 6,516 at 0x4839809: malloc (vg_replace_malloc.c:307) by 0x5315BB8: g_malloc (in /usr/lib64/libglib-2.0.so.0.6600.8) by 0x532B6FF: g_strjoinv (in /usr/lib64/libglib-2.0.so.0.6600.8) by 0x87D07E: qemu_rbd_mon_host (rbd.c:538) by 0x87D07E: qemu_rbd_connect (rbd.c:562) by 0x87E1CE: qemu_rbd_open (rbd.c:740) by 0x840EB1: bdrv_open_driver (block.c:1528) by 0x8453A9: bdrv_open_common (block.c:1802) by 0x8453A9: bdrv_open_inherit (block.c:3444) by 0x8464C2: bdrv_open (block.c:3537) by 0x8108CD: qmp_blockdev_add (blockdev.c:3569) by 0x8EA61B: qmp_marshal_blockdev_add (qapi-commands-block-core.c:1086) by 0x90B528: do_qmp_dispatch_bh (qmp-dispatch.c:131) by 0x907EA4: aio_bh_poll (async.c:164) Fix freeing 'mon_host' also when qemu_rbd_connect() ends correctly. Fixes: `0a55679b4a` Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Message-Id: <20210329150129.121182-2-sgarzare@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-04-09 18:00:29 +02:00
David Edmondson	07ee2ab4fd	block/vdi: Don't assume that blocks are larger than VdiHeader Given that the block size is read from the header of the VDI file, a wide variety of sizes might be seen. Rather than re-using a block sized memory region when writing the VDI header, allocate an appropriately sized buffer. Signed-off-by: David Edmondson <david.edmondson@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Max Reitz <mreitz@redhat.com> Message-id: 20210325112941.365238-3-pbonzini@redhat.com Message-Id: <20210309144015.557477-3-david.edmondson@oracle.com> Acked-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2021-03-31 10:44:21 +01:00
David Edmondson	574b8304cf	block/vdi: When writing new bmap entry fails, don't leak the buffer If a new bitmap entry is allocated, requiring the entire block to be written, avoiding leaking the buffer allocated for the block should the write fail. Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: David Edmondson <david.edmondson@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Max Reitz <mreitz@redhat.com> Message-id: 20210325112941.365238-2-pbonzini@redhat.com Message-Id: <20210309144015.557477-2-david.edmondson@oracle.com> Acked-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2021-03-31 10:44:21 +01:00
Max Reitz	484108293d	qcow2: Force preallocation with data-file-raw Setting the qcow2 data-file-raw bit means that you can ignore the qcow2 metadata when reading from the external data file. It does not mean that you have to ignore it, though. Therefore, the data read must be the same regardless of whether you interpret the metadata or whether you ignore it, and thus the L1/L2 tables must all be present and give a 1:1 mapping. This patch changes 244's output: First, the qcow2 file is larger right after creation, because of metadata preallocation. Second, the qemu-img map output changes: Everything that was not explicitly discarded or zeroed is now a data area. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210326145509.163455-2-mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2021-03-30 13:02:10 +02:00
Max Reitz	53431b9086	block/mirror: Fix mirror_top's permissions mirror_top currently shares all permissions, and takes only the WRITE permission (if some parent has taken that permission, too). That is wrong, though; mirror_top is a filter, so it should take permissions like any other filter does. For example, if the parent needs CONSISTENT_READ, we need to take that, too, and if it cannot share the WRITE permission, we cannot share it either. The exception is when mirror_top is used for active commit, where we cannot take CONSISTENT_READ (because it is deliberately unshared above the base node) and where we must share WRITE (so that it is shared for all images in the backing chain, so the mirror job can take it for the target BB). Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210211172242.146671-2-mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>	2021-03-29 18:09:00 +02:00
Pavel Dovgalyuk	ad0ce64279	qcow2: use external virtual timers Regular virtual timers are used to emulate timings related to vCPU and peripheral states. QCOW2 uses timers to clean the cache. These timers should have external flag. In the opposite case they affect the execution and it can't be recorded and replayed. This patch adds external flag to the timer for qcow2 cache clean. Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgalyuk@ispras.ru> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <161700516327.1141158.8366564693714562536.stgit@pasha-ThinkPad-X280> Signed-off-by: Max Reitz <mreitz@redhat.com>	2021-03-29 18:04:29 +02:00
Markus Armbruster	bdabafc683	block: Remove monitor command block_passwd Command block_passwd always fails since Commit `c01c214b69` "block: remove all encryption handling APIs" (v2.10.0) turned block_passwd into a stub that always fails, and hardcoded encryption_key_missing to false in query-named-block-nodes and query-block. Commit `ad1324e044` "block: remove 'encryption_key_missing' flag from QAPI" just landed. Complete the cleanup job: remove block_passwd. Cc: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20210323101951.3686029-1-armbru@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>	2021-03-23 22:31:56 +01:00
Stefan Hajnoczi	6f4b1996b4	block/export: disable VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD for now The vhost-user in-flight shmfd feature has not been tested with qemu-storage-daemon's vhost-user-blk server. Disable this optional feature for now because it requires MFD_ALLOW_SEALING, which is not available in some CI environments. If we need this feature in the future it can be re-enabled after testing. Reported-by: Peter Maydell <peter.maydell@linaro.org> Cc: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20210309094106.196911-2-stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-03-19 10:15:06 +01:00
Max Reitz	0f418a2076	curl: Disconnect sockets from CURLState When a curl transfer is finished, that does not mean that CURL lets go of all the sockets it used for it. We therefore must not free a CURLSocket object before CURL has invoked curl_sock_cb() to tell us to remove it. Otherwise, we may get a use-after-free, as described in this bug report: https://bugs.launchpad.net/qemu/+bug/1916501 (Reproducer from that report: $ qemu-img convert -f qcow2 -O raw \ https://download.cirros-cloud.net/0.4.0/cirros-0.4.0-x86_64-disk.img \ out.img ) (Alternatively, it might seem logical to force-drop all sockets that have been used for a state when the respective transfer is done, kind of like it is done now, but including unsetting the AIO handlers. Unfortunately, doing so makes the driver just hang instead of crashing, which seems to evidence that CURL still uses those sockets.) Make the CURLSocket object independent of "its" CURLState by putting all sockets into a hash table belonging to the BDRVCURLState instead of a list that belongs to a CURLState. Do not touch any sockets in curl_clean_state(). Testing, it seems like all sockets are indeed gone by the time the curl BDS is closed, so it seems like there really was no point in freeing any socket just because a transfer is done. libcurl does invoke curl_sock_cb() with CURL_POLL_REMOVE for every socket it has. Buglink: https://bugs.launchpad.net/qemu/+bug/1916501 Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210309130541.37540-3-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-03-19 10:15:06 +01:00
Max Reitz	3663dca461	curl: Store BDRVCURLState pointer in CURLSocket A socket does not really belong to any specific state. We do not need to store a pointer to "its" state in it, a pointer to the common BDRVCURLState is sufficient. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210309130541.37540-2-mreitz@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-03-19 10:15:06 +01:00
Kevin Wolf	1bf26076d6	stream: Don't crash when node permission is denied The image streaming block job restricts shared permissions of the nodes it accesses. This can obviously fail when other users already got these permissions. &error_abort is therefore wrong and can crash. Handle these errors gracefully and just fail starting the block job. Reported-by: Nini Gu <ngu@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20210309173451.45152-1-kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-03-19 10:15:06 +01:00
Daniel P. Berrangé	8d17adf34f	block: remove support for using "file" driver with block/char devices The 'host_device' and 'host_cdrom' drivers must be used instead. Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>	2021-03-18 09:22:55 +00:00
Daniel P. Berrangé	e67d8e2928	block: remove 'dirty-bitmaps' field from 'BlockInfo' struct The same data is available in the 'BlockDeviceInfo' struct. Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>	2021-03-18 09:22:55 +00:00
Daniel P. Berrangé	81cbfd5088	block: remove dirty bitmaps 'status' field The same information is available via the 'recording' and 'busy' fields. Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>	2021-03-18 09:22:55 +00:00
Daniel P. Berrangé	ad1324e044	block: remove 'encryption_key_missing' flag from QAPI This has been hardcoded to "false" since 2.10.0, since secrets required to unlock block devices are now always provided up front instead of using interactive prompts. Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>	2021-03-18 09:22:55 +00:00
Peter Maydell	6f34661b6c	Pull request -----BEGIN PGP SIGNATURE----- iQJGBAABCAAwFiEEzS913cjjpNwuT1Fz8ww4vT8vvjwFAmBJQHkSHGxhdXJlbnRA dml2aWVyLmV1AAoJEPMMOL0/L748EdsP/2U2CGTM95tjDunTs9uZV/7zM6PWt85M vAPItNVU2jYPfzmaJN8twrzlj0PEDhvB9Q+OJjE4HEGxEbPcdblLg/R6Zs/EaWuY N6oKHPXnOnHb+e80UUJdiAq+Y5RUnJbb5L3ArycnVzBgws+Oj3DtqjB2VDccY4C/ Gkt23tZ7ikU4958e5VBqW2NUUrr+BQO0mqsW+sbbeE3WPj75NQc6srvS3TWvsg7W OYEyVYwm52/q2W/1a3Knfv/YO6UU9NGMpGyDLD2kwQwKbgUWYLW2BiWVwOAUldo9 De3nfKbKnFezLCZAZro20lfCa/aKwNGCOXWzlrKxqUQCmGYUx7gM1+3ahrSd5N0v zUgLdZm7O428ZHL6GujWGLA1UwwzpM9X3P3yo4c0S1J6fHypbI6a9jtewrUFvFgP TuQ7dp6cn2DTBYUcsrWilPHbTZMADYQNRD/xUtKqalYBEWy3FX5W75+OYBJKKh+X Qip68m6JBzgkszXhCcu6xlLb8ynZJr2VsHvtvIgf4NnLqNOIEgVLcMtoMZT8DPrp rIoRc5oUFz8zj5lHnJuLADBUvlCMqoCCoU3h2aqHwH8a7RGb180f+82BW9aBcb2u Jk+WgAhBUjWBBC97ReFgrINUD/qZRXVoOq8LthTuQSSyr/i1zq+oLM1F0EDXcMDm ssATku2IxL24 =moUF -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/vivier2/tags/trivial-branch-for-6.0-pull-request' into staging Pull request # gpg: Signature made Wed 10 Mar 2021 21:56:09 GMT # gpg: using RSA key CD2F75DDC8E3A4DC2E4F5173F30C38BD3F2FBE3C # gpg: issuer "laurent@vivier.eu" # gpg: Good signature from "Laurent Vivier <lvivier@redhat.com>" [full] # gpg: aka "Laurent Vivier <laurent@vivier.eu>" [full] # gpg: aka "Laurent Vivier (Red Hat) <lvivier@redhat.com>" [full] # Primary key fingerprint: CD2F 75DD C8E3 A4DC 2E4F 5173 F30C 38BD 3F2F BE3C * remotes/vivier2/tags/trivial-branch-for-6.0-pull-request: (22 commits) sysemu: Let VMChangeStateHandler take boolean 'running' argument sysemu/runstate: Let runstate_is_running() return bool hw/lm32/Kconfig: Have MILKYMIST select LM32_DEVICES hw/lm32/Kconfig: Rename CONFIG_LM32 -> CONFIG_LM32_DEVICES hw/lm32/Kconfig: Introduce CONFIG_LM32_EVR for lm32-evr/uclinux boards qemu-common.h: Update copyright string to 2021 tests/fp/fp-test: Replace the word 'blacklist' qemu-options: Replace the word 'blacklist' seccomp: Replace the word 'blacklist' scripts/tracetool: Replace the word 'whitelist' ui: Replace the word 'whitelist' virtio-gpu: Adjust code space style exec/memory: Use struct Object typedef fuzz-test: remove unneccessary debugging flags net: Use id_generate() in the network subsystem, too MAINTAINERS: Fix the location of tools manuals vhost_user_gpu: Drop dead check for g_malloc() failure backends/dbus-vmstate: Fix short read error handling target/hexagon/gen_tcg_funcs: Fix a typo hw/elf_ops: Fix a typo ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2021-03-11 18:55:27 +00:00
Peter Maydell	9abda42bf2	nbd patches for 2021-03-09 - Add Vladimir as NBD co-maintainer - Fix reporting of holes in NBD_CMD_BLOCK_STATUS - Improve command-line parsing accuracy of large numbers (anything going through qemu_strtosz), including the deprecation of hex+suffix - Improve some error reporting in the block layer -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEccLMIrHEYCkn0vOqp6FrSiUnQ2oFAmBHlmIACgkQp6FrSiUn Q2q2cQgAqJWNb4J/ShjvzocDDPzJ0iBitFbg0huFPfbt4DScubEZo5wBJG7vOhOW hIHrWCRzGvRgsn0tcSfrgFaegmHKrLgjkibM7ou8ni9NC1kUBd3R/3FBNIMxhYf7 Q8Kfspl0LRfMJDKF9jdCnQ4Gxcd6h2OIYZqiWVg8V4Tc8WdCpIVOah7e7wjuW8bT vgZvfboUWm5AmIF9j/MxuMn+HFZ4ArSuFVL80ZaXlD00vRra7u3HZ8pUfcOlOujg 7HeouM1E5j3NNE6aZSN++x/EQ3sg0zmirbWUCcgAyRfdRkAmB15uh2PUzPxEIJKH UHUIW5LvNtz2+yzOAz2yK29OE523Yg== =blE1 -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/ericb/tags/pull-nbd-2021-03-09' into staging nbd patches for 2021-03-09 - Add Vladimir as NBD co-maintainer - Fix reporting of holes in NBD_CMD_BLOCK_STATUS - Improve command-line parsing accuracy of large numbers (anything going through qemu_strtosz), including the deprecation of hex+suffix - Improve some error reporting in the block layer # gpg: Signature made Tue 09 Mar 2021 15:38:10 GMT # gpg: using RSA key 71C2CC22B1C4602927D2F3AAA7A16B4A2527436A # gpg: Good signature from "Eric Blake <eblake@redhat.com>" [full] # gpg: aka "Eric Blake (Free Software Programmer) <ebb9@byu.net>" [full] # gpg: aka "[jpeg image of size 6874]" [full] # Primary key fingerprint: 71C2 CC22 B1C4 6029 27D2 F3AA A7A1 6B4A 2527 436A * remotes/ericb/tags/pull-nbd-2021-03-09: block/qcow2: refactor qcow2_update_options_prepare error paths block/qed: bdrv_qed_do_open: deal with errp block/qcow2: simplify qcow2_co_invalidate_cache() block/qcow2: read_cache_sizes: return status value block/qcow2-bitmap: return status from qcow2_store_persistent_dirty_bitmaps block/qcow2-bitmap: improve qcow2_load_dirty_bitmaps() interface block/qcow2: qcow2_get_specific_info(): drop error propagation blockjob: return status from block_job_set_speed() block/mirror: drop extra error propagation in commit_active_start() block: drop extra error propagation for bdrv_set_backing_hd blockdev: fix drive_backup_prepare() missed error block: check return value of bdrv_open_child and drop error propagation utils: Deprecate hex-with-suffix sizes utils: Improve qemu_strtosz() to have 64 bits of precision utils: Enhance testsuite for do_strtosz() nbd: server: Report holes for raw images MAINTAINERS: add Vladimir as co-maintainer of NBD Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2021-03-11 13:57:08 +00:00
Philippe Mathieu-Daudé	538f049704	sysemu: Let VMChangeStateHandler take boolean 'running' argument The 'running' argument from VMChangeStateHandler does not require other value than 0 / 1. Make it a plain boolean. Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Acked-by: David Gibson <david@gibson.dropbear.id.au> Message-Id: <20210111152020.1422021-3-philmd@redhat.com> Signed-off-by: Laurent Vivier <laurent@vivier.eu>	2021-03-09 23:13:57 +01:00
Peter Maydell	a557b00469	Block layer patches: - qemu-storage-daemon: add --pidfile option - qemu-storage-daemon: CLI error messages include the option name now - vhost-user-blk export: Misc fixes - docs: Improvements for qemu-storage-daemon documentation - parallels: load bitmap extension - backup-top: Don't crash on post-finalize accesses - Improve error messages related to node-name options - iotests improvements -----BEGIN PGP SIGNATURE----- iQJFBAABCAAvFiEE3D3rFZqa+V09dFb+fwmycsiPL9YFAmBGWHURHGt3b2xmQHJl ZGhhdC5jb20ACgkQfwmycsiPL9ZpyxAAk0gRiayMUidSzgvzU/CeUhzBsC4ayEkn dLtTZ8hl7cW/w3GjDK1Wri4MANRN/0YHjiLSzO38lfVpK0z8SJr5aU4CwhRlOKVm VWgx+OLlV4Azht9fMNF4SwUXgXhl7pUNiFMNnomb++gvqhjMCedDZcWlnVKhbuQ+ O3TKGO4tToSGaXP85jCM4xukw5HZ//4QMYg6MH0gDk8ahfE2MhyTHz64oDp412os qhxvc4bU2S5xGLaBfLGhsc6VPQFKjblG704P/Y73zeoxq12A0L2Ru98WvrNaXw7Z m54jJUINiDkJ7ZOl6W04zdeiLvs3BOrNe+7mxawOTmdkBsLOKErrhrTO1gJmHHmX kJLWEh9VYWxVbvE7C3KQt9bclR6wt+aPup4X1XE8pHtocPVONVq5bvctrVgxgK0b btN06NcK+2jQxcQkG4MnBJ8S41qmxHyIEQlQWKyUWXvKt6zsFU/NuWKMQrAfYZZi 5J+RPU/fB073LY4lpAgou0OP1/RIvQmi5zWzjWm/Qbp3JpgC+azcYvxn7UU7J71P +u8IEQ4+Q9s0gvXQAh/U8AQg2eOqAwEAyFUJl9wpPN56O03dbI8KyCV5ECIRJu49 CC8uKlJxZkbw9ZBs11SAmm/0J64WcNb2AMWxDPC8Z6oQbVaRRRznoRwRP2H6odUu uBolS43+5cI= =eAjH -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging Block layer patches: - qemu-storage-daemon: add --pidfile option - qemu-storage-daemon: CLI error messages include the option name now - vhost-user-blk export: Misc fixes - docs: Improvements for qemu-storage-daemon documentation - parallels: load bitmap extension - backup-top: Don't crash on post-finalize accesses - Improve error messages related to node-name options - iotests improvements # gpg: Signature made Mon 08 Mar 2021 17:01:41 GMT # gpg: using RSA key DC3DEB159A9AF95D3D7456FE7F09B272C88F2FD6 # gpg: issuer "kwolf@redhat.com" # gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" [full] # Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6 * remotes/kevin/tags/for-upstream: (30 commits) blockdev: Clarify error messages pertaining to 'node-name' block: Clarify error messages pertaining to 'node-name' docs: qsd: Explain --export nbd,name=... default MAINTAINERS: update parallels block driver iotests: add parallels-read-bitmap test iotests.py: add unarchive_sample_image() helper parallels: support bitmap extension for read-only mode block/parallels: BDRVParallelsState: add cluster_size field parallels.txt: fix bitmap L1 table description qcow2-bitmap: make bytes_covered_by_bitmap_cluster() public block/export: port virtio-blk read/write range check block/export: port virtio-blk discard/write zeroes input validation block/export: fix vhost-user-blk export sector number calculation block/export: use VIRTIO_BLK_SECTOR_BITS block/export: fix blk_size double byteswap libqtest: add qtest_remove_abrt_handler() libqtest: add qtest_kill_qemu() libqtest: add qtest_socket_server() vhost-user-blk: fix blkcfg->num_queues endianness docs: replace insecure /tmp examples in qsd docs ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2021-03-09 21:31:18 +00:00
Vladimir Sementsov-Ogievskiy	1184b41101	block/qcow2: refactor qcow2_update_options_prepare error paths Keep setting ret close to setting errp and don't merge different error paths into one. This way it's more obvious that we don't return error without setting errp. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-Id: <20210202124956.63146-15-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-03-08 16:04:46 -06:00
Vladimir Sementsov-Ogievskiy	15ce94a68c	block/qed: bdrv_qed_do_open: deal with errp Always set errp on failure. The generic bdrv_open_driver supports driver functions which can return a negative value but forget to set errp. That's a strange thing. Let's improve bdrv_qed_do_open to not behave this way. This allows the simplification of code in bdrv_qed_co_invalidate_cache(). Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Greg Kurz <groug@kaod.org> Message-Id: <20210202124956.63146-14-vsementsov@virtuozzo.com> [eblake: commit message grammar tweak] Signed-off-by: Eric Blake <eblake@redhat.com>	2021-03-08 16:03:32 -06:00
Vladimir Sementsov-Ogievskiy	e6247c9c9f	block/qcow2: simplify qcow2_co_invalidate_cache() qcow2_do_open correctly sets errp on each failure path. So, we can simplify code in qcow2_co_invalidate_cache() and drop explicit error propagation. Add ERRP_GUARD() as mandated by the documentation in include/qapi/error.h so that error_prepend() is actually called even if errp is &error_fatal. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Greg Kurz <groug@kaod.org> Message-Id: <20210202124956.63146-13-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-03-08 16:03:27 -06:00
Vladimir Sementsov-Ogievskiy	772c4cad13	block/qcow2: read_cache_sizes: return status value It's better to return status together with setting errp. It allows to reduce error propagation. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-Id: <20210202124956.63146-12-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-03-08 16:03:23 -06:00
Vladimir Sementsov-Ogievskiy	526e31de99	block/qcow2-bitmap: return status from qcow2_store_persistent_dirty_bitmaps It's better to return status together with setting errp. It makes possible to avoid error propagation. While being here, put ERRP_GUARD() to fix error_prepend(errp, ...) usage inside qcow2_store_persistent_dirty_bitmaps() (see the comment above ERRP_GUARD() definition in include/qapi/error.h) Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-Id: <20210202124956.63146-11-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-03-08 16:03:21 -06:00
Vladimir Sementsov-Ogievskiy	0c1e9d2a9a	block/qcow2-bitmap: improve qcow2_load_dirty_bitmaps() interface It's recommended for bool functions with errp to return true on success and false on failure. Non-standard interfaces don't help to understand the code. The change is also needed to reduce error propagation. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Greg Kurz <groug@kaod.org> Message-Id: <20210202124956.63146-10-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-03-08 16:01:47 -06:00
Vladimir Sementsov-Ogievskiy	83bad8cbf5	block/qcow2: qcow2_get_specific_info(): drop error propagation Don't use error propagation in qcow2_get_specific_info(). For this refactor qcow2_get_bitmap_info_list, its current interface is rather weird. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210202124956.63146-9-vsementsov@virtuozzo.com> Reviewed-by: Alberto Garcia <berto@igalia.com> [eblake: separate local 'tail' variable from 'info_list' parameter] Signed-off-by: Eric Blake <eblake@redhat.com>	2021-03-08 15:16:11 -06:00
Vladimir Sementsov-Ogievskiy	eb5becc18f	block/mirror: drop extra error propagation in commit_active_start() Let's check return value of mirror_start_job to check for failure instead of local_err. Rename ret to job, as ret is usually integer variable. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-Id: <20210202124956.63146-7-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-03-08 15:14:14 -06:00
Vladimir Sementsov-Ogievskiy	bc52024959	block: check return value of bdrv_open_child and drop error propagation This patch is generated by cocci script: @@ symbol bdrv_open_child, errp, local_err; expression file; @@ file = bdrv_open_child(..., - &local_err + errp ); - if (local_err) + if (!file) { ... - error_propagate(errp, local_err); ... } with command spatch --sp-file x.cocci --macro-file scripts/cocci-macro-file.h \ --in-place --no-show-diff --max-width 80 --use-gitgrep block Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-Id: <20210202124956.63146-4-vsementsov@virtuozzo.com> [eblake: fix qcow2_do_open() to use ERRP_GUARD, necessary as the only caller to pass allow_none=true] Signed-off-by: Eric Blake <eblake@redhat.com>	2021-03-08 15:07:09 -06:00
Vladimir Sementsov-Ogievskiy	baefd97700	parallels: support bitmap extension for read-only mode Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210224104707.88430-5-vsementsov@virtuozzo.com> Reviewed-by: Denis V. Lunev <den@openvz.org> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-03-08 14:56:55 +01:00
Vladimir Sementsov-Ogievskiy	e0b5207f54	block/parallels: BDRVParallelsState: add cluster_size field We are going to use it in more places, calculating "s->tracks << BDRV_SECTOR_BITS" doesn't look good. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210224104707.88430-4-vsementsov@virtuozzo.com> Reviewed-by: Denis V. Lunev <den@openvz.org> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-03-08 14:56:54 +01:00
Vladimir Sementsov-Ogievskiy	35f428ba39	qcow2-bitmap: make bytes_covered_by_bitmap_cluster() public Rename bytes_covered_by_bitmap_cluster() to bdrv_dirty_bitmap_serialization_coverage() and make it public. It is needed as we are going to share it with bitmap loading in parallels format. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Denis V. Lunev <den@openvz.org> Message-Id: <20210224104707.88430-2-vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-03-08 14:56:54 +01:00
Stefan Hajnoczi	05ae4e674e	block/export: port virtio-blk read/write range check Check that the sector number and byte count are valid. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20210223144653.811468-13-stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-03-08 14:56:54 +01:00
Stefan Hajnoczi	db4eadf9f1	block/export: port virtio-blk discard/write zeroes input validation Validate discard/write zeroes the same way we do for virtio-blk. Some of these checks are mandated by the VIRTIO specification, others are internal to QEMU. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20210223144653.811468-11-stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-03-08 14:56:54 +01:00
Stefan Hajnoczi	e44362ce31	block/export: fix vhost-user-blk export sector number calculation The driver is supposed to honor the blk_size field but the protocol still uses 512-byte sector numbers. It is incorrect to multiply req->sector_num by blk_size. VIRTIO 1.1 5.2.5 Device Initialization says: blk_size can be read to determine the optimal sector size for the driver to use. This does not affect the units used in the protocol (always 512 bytes), but awareness of the correct value can affect performance. Fixes: `3578389bcf` ("block/export: vhost-user block device backend server") Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20210223144653.811468-10-stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-03-08 14:56:54 +01:00
Stefan Hajnoczi	524bac0744	block/export: use VIRTIO_BLK_SECTOR_BITS Use VIRTIO_BLK_SECTOR_BITS and VIRTIO_BLK_SECTOR_SIZE when dealing with virtio-blk sector numbers. Although the values happen to be the same as BDRV_SECTOR_BITS and BDRV_SECTOR_SIZE, they are conceptually different. This makes it clearer when we are dealing with virtio-blk sector units. Use VIRTIO_BLK_SECTOR_BITS in vu_blk_initialize_config(). Later patches will use it the new constants the virtqueue request processing code path. Suggested-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20210223144653.811468-9-stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-03-08 14:56:54 +01:00
Stefan Hajnoczi	a4f1542af5	block/export: fix blk_size double byteswap The config->blk_size field is little-endian. Use the native-endian blk_size variable to avoid double byteswapping. Fixes: `11f60f7eae` ("block/export: make vhost-user-blk config space little-endian") Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20210223144653.811468-8-stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-03-08 14:56:54 +01:00
Max Reitz	705dde27c6	backup-top: Refuse I/O in inactive state When the backup-top node transitions from active to inactive in bdrv_backup_top_drop(), the BlockCopyState is freed and the filtered child is removed, so the node effectively becomes unusable. However, noone told its I/O functions this, so they will happily continue accessing bs->backing and s->bcs. Prevent that by aborting early when s->active is false. (After the preceding patch, the node should be gone after bdrv_backup_top_drop(), so this should largely be a theoretical problem. But still, better to be safe than sorry, and also I think it just makes sense to check s->active in the I/O functions.) Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210219153348.41861-3-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-03-08 14:55:18 +01:00
Max Reitz	bdc4c4c5e3	backup: Remove nodes from job in .clean() The block job holds a reference to the backup-top node (because it is passed as the main job BDS to block_job_create()). Therefore, bdrv_backup_top_drop() cannot delete the backup-top node (replacing it by its child does not affect the job parent, because that has .stay_at_node set). That is a problem, because all of its I/O functions assume the BlockCopyState (s->bcs) to be valid and that it has a filtered child; but after bdrv_backup_top_drop(), neither of those things are true. It does not make sense to add new parents to backup-top after backup_clean(), so we should detach it from the job before bdrv_backup_top_drop(). Because there is no function to do that for a single node, just detach all of the job's nodes -- the job does not do anything past backup_clean() anyway. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210219153348.41861-2-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-03-08 14:55:18 +01:00
Paolo Bonzini	f7544edcd3	qemu-config: add error propagation to qemu_config_parse This enables some simplification of vl.c via error_fatal, and improves error messages. Before: $ ./qemu-system-x86_64 -readconfig . qemu-system-x86_64: error reading file qemu-system-x86_64: -readconfig .: read config .: Invalid argument $ /usr/libexec/qemu-kvm -readconfig foo qemu-kvm: -readconfig foo: read config foo: No such file or directory After: $ ./qemu-system-x86_64 -readconfig . qemu-system-x86_64: -readconfig .: Cannot read config file: Is a directory $ ./qemu-system-x86_64 -readconfig foo qemu-system-x86_64: -readconfig foo: Could not open 'foo': No such file or directory Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20210226170816.231173-1-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-03-06 11:41:54 +01:00
Maxim Levitsky	6094cbeb72	block: qcow2: remove the created file on initialization error If the qcow initialization fails, we should remove the file if it was already created, to avoid leaving stale files around. We already do this for luks raw images. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-Id: <20201217170904.946013-4-mlevitsk@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-02-15 15:10:14 +01:00
Maxim Levitsky	a890f08e58	block: add bdrv_co_delete_file_noerr This function wraps bdrv_co_delete_file for the common case of removing a file, which was just created by format driver, on an error condition. It hides the -ENOTSUPP error, and reports all other errors otherwise. Use it in luks driver Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-Id: <20201217170904.946013-3-mlevitsk@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-02-15 15:10:14 +01:00
Maxim Levitsky	dcb6699512	crypto: luks: Fix tiny memory leak When the underlying block device doesn't support the bdrv_co_delete_file interface, an 'Error' object was leaked. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20201217170904.946013-2-mlevitsk@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-02-15 15:10:14 +01:00
Peter Maydell	392b9a74b9	bitmaps patches for 2021-02-12 - add 'transform' member to manipulate bitmaps across migration - work towards better error handling during bdrv_open -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEccLMIrHEYCkn0vOqp6FrSiUnQ2oFAmAnDQsACgkQp6FrSiUn Q2qc5Qf/SKVdpX4j7OnHF6sBuf/8LVWz4KazSqEU0ohazBJmafgJpH2EA5pXMXR4 frZDWeanGmhj1MjMkta/++uvEBU/TMpW2z98mZvjErteXdnRQAlII/hOCI+QZJvg viQ5t1EyrkyXzUePOjs+AwqA5KHWbCKt6QqyItQ78HvI23sw/fuvHj0G67KbVzXZ VcSrVr0J7PXnZV/hWfg+C+Nn9Ro9tsVdn79awLYVQ7/SDro3hzylpcHMQaHMK2oe mX4D2kNq7s21E27Zb6vlknUhQPkMdETk0gfEbpn7sTVMEc58GRLC7Tqfx7l0JIFK 5izVyA5vndKVxDGYPkbDK6VL2uDg4A== =+Epy -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/ericb/tags/pull-bitmaps-2021-02-12' into staging bitmaps patches for 2021-02-12 - add 'transform' member to manipulate bitmaps across migration - work towards better error handling during bdrv_open # gpg: Signature made Fri 12 Feb 2021 23:19:39 GMT # gpg: using RSA key 71C2CC22B1C4602927D2F3AAA7A16B4A2527436A # gpg: Good signature from "Eric Blake <eblake@redhat.com>" [full] # gpg: aka "Eric Blake (Free Software Programmer) <ebb9@byu.net>" [full] # gpg: aka "[jpeg image of size 6874]" [full] # Primary key fingerprint: 71C2 CC22 B1C4 6029 27D2 F3AA A7A1 6B4A 2527 436A * remotes/ericb/tags/pull-bitmaps-2021-02-12: block: use return status of bdrv_append() block: return status from bdrv_append and friends qemu-iotests: 300: Add test case for modifying persistence of bitmap migration: dirty-bitmap: Allow control of bitmap persistence migration: dirty-bitmap: Use struct for alias map inner members Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2021-02-13 21:26:00 +00:00
Vladimir Sementsov-Ogievskiy	934aee14d3	block: use return status of bdrv_append() Now bdrv_append returns status and we can drop all the local_err things around it. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-Id: <20210202124956.63146-3-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-02-12 15:39:44 -06:00
Vladimir Sementsov-Ogievskiy	ff789bf5a9	block/backup: implement .cancel job handler Cancel in-flight io on target to not waste the time. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20210205163720.887197-10-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-02-12 12:17:08 -06:00
Vladimir Sementsov-Ogievskiy	521ff8b779	block/mirror: implement .cancel job handler Cancel in-flight io on target to not waste the time. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20210205163720.887197-6-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-02-12 11:29:40 -06:00
Vladimir Sementsov-Ogievskiy	3fc1ec3725	block/raw-format: implement .bdrv_cancel_in_flight handler We are going to cancel in-flight requests on mirror nbd target on job cancel. Still nbd is often used not directly but as raw-format child. So, add pass-through handler here. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20210205163720.887197-4-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-02-12 09:45:18 -06:00
Vladimir Sementsov-Ogievskiy	c4f7f24e1f	block/nbd: implement .bdrv_cancel_in_flight Just stop waiting for connection in existing requests. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20210205163720.887197-3-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-02-12 09:45:18 -06:00
Vladimir Sementsov-Ogievskiy	bd54669a4a	block: add new BlockDriver handler: bdrv_cancel_in_flight It will be used to stop retrying NBD requests on mirror cancel. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20210205163720.887197-2-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-02-12 09:45:18 -06:00
Daniel P. Berrangé	3d3e9b1f66	block: rename and alter bdrv_all_find_snapshot semantics Currently bdrv_all_find_snapshot() will return 0 if it finds a snapshot, -1 if an error occurs, or if it fails to find a snapshot. New callers to be added want to distinguish between the error scenario and failing to find a snapshot. Rename it to bdrv_all_has_snapshot and make it return -1 on error, 0 if no snapshot is found and 1 if snapshot is found. Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <20210204124834.774401-7-berrange@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-02-08 11:19:51 +00:00
Daniel P. Berrangé	c22d644ca7	block: allow specifying name of block device for vmstate storage Currently the vmstate will be stored in the first block device that supports snapshots. Historically this would have usually been the root device, but with UEFI it might be the variable store. There needs to be a way to override the choice of block device to store the state in. Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <20210204124834.774401-6-berrange@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-02-08 11:19:51 +00:00
Daniel P. Berrangé	cf3a74c94f	block: add ability to specify list of blockdevs during snapshot When running snapshot operations, there are various rules for which blockdevs are included/excluded. While this provides reasonable default behaviour, there are scenarios that are not well handled by the default logic. Some of the conditions do not have a single correct answer. Thus there needs to be a way for the mgmt app to provide an explicit list of blockdevs to perform snapshots across. This can be achieved by passing a list of node names that should be used. Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <20210204124834.774401-5-berrange@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-02-08 11:19:51 +00:00
Daniel P. Berrangé	e26f98e209	block: push error reporting into bdrv_all__snapshot functions The bdrv_all__snapshot functions return a BlockDriverState pointer for the invalid backend, which the callers then use to report an error message. In some cases multiple callers are reporting the same error message, but with slightly different text. In the future there will be more error scenarios for some of these methods, which will benefit from fine grained error message reporting. So it is helpful to push error reporting down a level. Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> [PMD: Initialize variables] Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20210204124834.774401-2-berrange@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-02-08 11:19:51 +00:00
Roman Kagan	ddde5ee769	block/nbd: only enter connection coroutine if it's present When an NBD block driver state is moved from one aio_context to another (e.g. when doing a drain in a migration thread), nbd_client_attach_aio_context_bh is executed that enters the connection coroutine. However, the assumption that ->connection_co is always present here appears incorrect: the connection may have encountered an error other than -EIO in the underlying transport, and thus may have decided to quit rather than keep trying to reconnect, and therefore it may have terminated the connection coroutine. As a result an attempt to reassign the client in this state (NBD_CLIENT_QUIT) to a different aio_context leads to a null pointer dereference: #0 qio_channel_detach_aio_context (ioc=0x0) at /build/qemu-gYtjVn/qemu-5.0.1/io/channel.c:452 #1 0x0000562a242824b3 in bdrv_detach_aio_context (bs=0x562a268d6a00) at /build/qemu-gYtjVn/qemu-5.0.1/block.c:6151 #2 bdrv_set_aio_context_ignore (bs=bs@entry=0x562a268d6a00, new_context=new_context@entry=0x562a260c9580, ignore=ignore@entry=0x7feeadc9b780) at /build/qemu-gYtjVn/qemu-5.0.1/block.c:6230 #3 0x0000562a24282969 in bdrv_child_try_set_aio_context (bs=bs@entry=0x562a268d6a00, ctx=0x562a260c9580, ignore_child=<optimized out>, errp=<optimized out>) at /build/qemu-gYtjVn/qemu-5.0.1/block.c:6332 #4 0x0000562a242bb7db in blk_do_set_aio_context (blk=0x562a2735d0d0, new_context=0x562a260c9580, update_root_node=update_root_node@entry=true, errp=errp@entry=0x0) at /build/qemu-gYtjVn/qemu-5.0.1/block/block-backend.c:1989 #5 0x0000562a242be0bd in blk_set_aio_context (blk=<optimized out>, new_context=<optimized out>, errp=errp@entry=0x0) at /build/qemu-gYtjVn/qemu-5.0.1/block/block-backend.c:2010 #6 0x0000562a23fbd953 in virtio_blk_data_plane_stop (vdev=<optimized out>) at /build/qemu-gYtjVn/qemu-5.0.1/hw/block/dataplane/virtio-blk.c:292 #7 0x0000562a241fc7bf in virtio_bus_stop_ioeventfd (bus=0x562a260dbf08) at /build/qemu-gYtjVn/qemu-5.0.1/hw/virtio/virtio-bus.c:245 #8 0x0000562a23fefb2e in virtio_vmstate_change (opaque=0x562a260dbf90, running=0, state=<optimized out>) at /build/qemu-gYtjVn/qemu-5.0.1/hw/virtio/virtio.c:3220 #9 0x0000562a2402ebfd in vm_state_notify (running=running@entry=0, state=state@entry=RUN_STATE_FINISH_MIGRATE) at /build/qemu-gYtjVn/qemu-5.0.1/softmmu/vl.c:1275 #10 0x0000562a23f7bc02 in do_vm_stop (state=RUN_STATE_FINISH_MIGRATE, send_stop=<optimized out>) at /build/qemu-gYtjVn/qemu-5.0.1/cpus.c:1032 #11 0x0000562a24209765 in migration_completion (s=0x562a260e83a0) at /build/qemu-gYtjVn/qemu-5.0.1/migration/migration.c:2914 #12 migration_iteration_run (s=0x562a260e83a0) at /build/qemu-gYtjVn/qemu-5.0.1/migration/migration.c:3275 #13 migration_thread (opaque=opaque@entry=0x562a260e83a0) at /build/qemu-gYtjVn/qemu-5.0.1/migration/migration.c:3439 #14 0x0000562a2435ca96 in qemu_thread_start (args=<optimized out>) at /build/qemu-gYtjVn/qemu-5.0.1/util/qemu-thread-posix.c:519 #15 0x00007feed31466ba in start_thread (arg=0x7feeadc9c700) at pthread_create.c:333 #16 0x00007feed2e7c41d in __GI___sysctl (name=0x0, nlen=608471908, oldval=0x562a2452b138, oldlenp=0x0, newval=0x562a2452c5e0 <__func__.28102>, newlen=0) at ../sysdeps/unix/sysv/linux/sysctl.c:30 #17 0x0000000000000000 in ?? () Fix it by checking that the connection coroutine is non-null before trying to enter it. If it is null, no entering is needed, as the connection is probably going down anyway. Signed-off-by: Roman Kagan <rvkagan@yandex-team.ru> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210129073859.683063-3-rvkagan@yandex-team.ru> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-02-03 08:17:12 -06:00
Roman Kagan	3b5e4db673	block/nbd: only detach existing iochannel from aio_context When the reconnect in NBD client is in progress, the iochannel used for NBD connection doesn't exist. Therefore an attempt to detach it from the aio_context of the parent BlockDriverState results in a NULL pointer dereference. The problem is triggerable, in particular, when an outgoing migration is about to finish, and stopping the dataplane tries to move the BlockDriverState from the iothread aio_context to the main loop. If the NBD connection is lost before this point, and the NBD client has entered the reconnect procedure, QEMU crashes: #0 qemu_aio_coroutine_enter (ctx=0x5618056c7580, co=0x0) at /build/qemu-6MF7tq/qemu-5.0.1/util/qemu-coroutine.c:109 #1 0x00005618034b1b68 in nbd_client_attach_aio_context_bh ( opaque=0x561805ed4c00) at /build/qemu-6MF7tq/qemu-5.0.1/block/nbd.c:164 #2 0x000056180353116b in aio_wait_bh (opaque=0x7f60e1e63700) at /build/qemu-6MF7tq/qemu-5.0.1/util/aio-wait.c:55 #3 0x0000561803530633 in aio_bh_call (bh=0x7f60d40a7e80) at /build/qemu-6MF7tq/qemu-5.0.1/util/async.c:136 #4 aio_bh_poll (ctx=ctx@entry=0x5618056c7580) at /build/qemu-6MF7tq/qemu-5.0.1/util/async.c:164 #5 0x0000561803533e5a in aio_poll (ctx=ctx@entry=0x5618056c7580, blocking=blocking@entry=true) at /build/qemu-6MF7tq/qemu-5.0.1/util/aio-posix.c:650 #6 0x000056180353128d in aio_wait_bh_oneshot (ctx=0x5618056c7580, cb=<optimized out>, opaque=<optimized out>) at /build/qemu-6MF7tq/qemu-5.0.1/util/aio-wait.c:71 #7 0x000056180345c50a in bdrv_attach_aio_context (new_context=0x5618056c7580, bs=0x561805ed4c00) at /build/qemu-6MF7tq/qemu-5.0.1/block.c:6172 #8 bdrv_set_aio_context_ignore (bs=bs@entry=0x561805ed4c00, new_context=new_context@entry=0x5618056c7580, ignore=ignore@entry=0x7f60e1e63780) at /build/qemu-6MF7tq/qemu-5.0.1/block.c:6237 #9 0x000056180345c969 in bdrv_child_try_set_aio_context ( bs=bs@entry=0x561805ed4c00, ctx=0x5618056c7580, ignore_child=<optimized out>, errp=<optimized out>) at /build/qemu-6MF7tq/qemu-5.0.1/block.c:6332 #10 0x00005618034957db in blk_do_set_aio_context (blk=0x56180695b3f0, new_context=0x5618056c7580, update_root_node=update_root_node@entry=true, errp=errp@entry=0x0) at /build/qemu-6MF7tq/qemu-5.0.1/block/block-backend.c:1989 #11 0x00005618034980bd in blk_set_aio_context (blk=<optimized out>, new_context=<optimized out>, errp=errp@entry=0x0) at /build/qemu-6MF7tq/qemu-5.0.1/block/block-backend.c:2010 #12 0x0000561803197953 in virtio_blk_data_plane_stop (vdev=<optimized out>) at /build/qemu-6MF7tq/qemu-5.0.1/hw/block/dataplane/virtio-blk.c:292 #13 0x00005618033d67bf in virtio_bus_stop_ioeventfd (bus=0x5618056d9f08) at /build/qemu-6MF7tq/qemu-5.0.1/hw/virtio/virtio-bus.c:245 #14 0x00005618031c9b2e in virtio_vmstate_change (opaque=0x5618056d9f90, running=0, state=<optimized out>) at /build/qemu-6MF7tq/qemu-5.0.1/hw/virtio/virtio.c:3220 #15 0x0000561803208bfd in vm_state_notify (running=running@entry=0, state=state@entry=RUN_STATE_FINISH_MIGRATE) at /build/qemu-6MF7tq/qemu-5.0.1/softmmu/vl.c:1275 #16 0x0000561803155c02 in do_vm_stop (state=RUN_STATE_FINISH_MIGRATE, send_stop=<optimized out>) at /build/qemu-6MF7tq/qemu-5.0.1/cpus.c:1032 #17 0x00005618033e3765 in migration_completion (s=0x5618056e6960) at /build/qemu-6MF7tq/qemu-5.0.1/migration/migration.c:2914 #18 migration_iteration_run (s=0x5618056e6960) at /build/qemu-6MF7tq/qemu-5.0.1/migration/migration.c:3275 #19 migration_thread (opaque=opaque@entry=0x5618056e6960) at /build/qemu-6MF7tq/qemu-5.0.1/migration/migration.c:3439 #20 0x0000561803536ad6 in qemu_thread_start (args=<optimized out>) at /build/qemu-6MF7tq/qemu-5.0.1/util/qemu-thread-posix.c:519 #21 0x00007f61085d06ba in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #22 0x00007f610830641d in sysctl () from /lib/x86_64-linux-gnu/libc.so.6 #23 0x0000000000000000 in ?? () Fix it by checking that the iochannel is non-null before trying to detach it from the aio_context. If it is null, no detaching is needed, and it will get reattached in the proper aio_context once the connection is reestablished. Signed-off-by: Roman Kagan <rvkagan@yandex-team.ru> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210129073859.683063-2-rvkagan@yandex-team.ru> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-02-03 08:17:12 -06:00
Vladimir Sementsov-Ogievskiy	a5215b8fdf	block/io: use int64_t bytes in copy_range We are generally moving to int64_t for both offset and bytes parameters on all io paths. Main motivation is realization of 64-bit write_zeroes operation for fast zeroing large disk chunks, up to the whole disk. We chose signed type, to be consistent with off_t (which is signed) and with possibility for signed return type (where negative value means error). So, convert now copy_range parameters which are already 64bit to signed type. It's safe as we don't work with requests overflowing BDRV_MAX_LENGTH (which is less than INT64_MAX), and do check the requests in bdrv_co_copy_range_internal() (by bdrv_check_request32(), which calls bdrv_check_request()). Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20201211183934.169161-17-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-02-03 08:17:12 -06:00
Vladimir Sementsov-Ogievskiy	e9e52efdc5	block/io: support int64_t bytes in read/write wrappers We are generally moving to int64_t for both offset and bytes parameters on all io paths. Main motivation is realization of 64-bit write_zeroes operation for fast zeroing large disk chunks, up to the whole disk. We chose signed type, to be consistent with off_t (which is signed) and with possibility for signed return type (where negative value means error). Now, since bdrv_co_preadv_part() and bdrv_co_pwritev_part() have been updated, update all their wrappers. For all of them type of 'bytes' is widening, so callers are safe. We have update request_fn in blkverify.c simultaneously. Still it's just a pointer to one of bdrv_co_pwritev() or bdrv_co_preadv(), and type is widening for callers of the request_fn anyway. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20201211183934.169161-16-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> [eblake: grammar tweak] Signed-off-by: Eric Blake <eblake@redhat.com>	2021-02-03 08:17:12 -06:00
Vladimir Sementsov-Ogievskiy	37e9403ea8	block/io: support int64_t bytes in bdrv_co_p{read,write}v_part() We are generally moving to int64_t for both offset and bytes parameters on all io paths. Main motivation is realization of 64-bit write_zeroes operation for fast zeroing large disk chunks, up to the whole disk. We chose signed type, to be consistent with off_t (which is signed) and with possibility for signed return type (where negative value means error). So, prepare bdrv_co_preadv_part() and bdrv_co_pwritev_part() and their remaining dependencies now. bdrv_pad_request() is updated simultaneously, as pointer to bytes passed to it both from bdrv_co_pwritev_part() and bdrv_co_preadv_part(). So, all callers of bdrv_pad_request() are updated to pass 64bit bytes. bdrv_pad_request() is already good for 64bit requests, add corresponding assertion. Look at bdrv_co_preadv_part() and bdrv_co_pwritev_part(). Type is widening, so callers are safe. Let's look inside the functions. In bdrv_co_preadv_part() and bdrv_aligned_pwritev() we only pass bytes to other already int64_t interfaces (and some obviously safe calculations), it's OK. In bdrv_co_do_zero_pwritev() aligned_bytes may become large now, still it's passed to bdrv_aligned_pwritev which supports int64_t bytes. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20201211183934.169161-15-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-02-03 08:17:11 -06:00
Vladimir Sementsov-Ogievskiy	8b0c5d7659	block/io: support int64_t bytes in bdrv_aligned_preadv() We are generally moving to int64_t for both offset and bytes parameters on all io paths. Main motivation is realization of 64-bit write_zeroes operation for fast zeroing large disk chunks, up to the whole disk. We chose signed type, to be consistent with off_t (which is signed) and with possibility for signed return type (where negative value means error). So, prepare bdrv_aligned_preadv() now. Make the bytes variable in bdrv_padding_rmw_read() int64_t, as it is only used for pass-through to bdrv_aligned_preadv(). All bdrv_aligned_preadv() callers are safe as type is widening. Let's look inside: - add a new-style assertion that request is good. - callees bdrv_is_allocated(), bdrv_co_do_copy_on_readv() supports int64_t bytes - conversion of bytes_remaining is OK, as we never have requests overflowing BDRV_MAX_LENGTH - looping through bytes_remaining is ok, num is updated to int64_t - for bdrv_driver_preadv we have same limit of max_transfer - qemu_iovec_memset is OK, as bytes+qiov_offset should not overflow qiov->size anyway (thanks to bdrv_check_qiov_request()) Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20201211183934.169161-14-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> [eblake: grammar tweak] Signed-off-by: Eric Blake <eblake@redhat.com>	2021-02-03 08:17:11 -06:00
Vladimir Sementsov-Ogievskiy	9df5afbdd1	block/io: support int64_t bytes in bdrv_co_do_copy_on_readv() We are generally moving to int64_t for both offset and bytes parameters on all io paths. Main motivation is realization of 64-bit write_zeroes operation for fast zeroing large disk chunks, up to the whole disk. We chose signed type, to be consistent with off_t (which is signed) and with possibility for signed return type (where negative value means error). So, prepare bdrv_co_do_copy_on_readv() now. 'bytes' type widening, so callers are safe. Look at the function itself: bytes, skip_bytes and progress become int64_t. bdrv_round_to_clusters() is OK, cluster_bytes now may be large. trace_bdrv_co_do_copy_on_readv() is OK looping through cluster_bytes is still OK. pnum is still capped to max_transfer, and to MAX_BOUNCE_BUFFER when we are going to do COR operation. Therefor calculations in qemu_iovec_from_buf() and bdrv_driver_preadv() should not change. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20201211183934.169161-13-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-02-03 08:17:11 -06:00
Vladimir Sementsov-Ogievskiy	fcfd9ade68	block/io: support int64_t bytes in bdrv_aligned_pwritev() We are generally moving to int64_t for both offset and bytes parameters on all io paths. Main motivation is realization of 64-bit write_zeroes operation for fast zeroing large disk chunks, up to the whole disk. We chose signed type, to be consistent with off_t (which is signed) and with possibility for signed return type (where negative value means error). So, prepare bdrv_aligned_pwritev() now and convert the dependencies: bdrv_co_write_req_prepare() and bdrv_co_write_req_finish() to signed type bytes. Conversion of bdrv_co_write_req_prepare() and bdrv_co_write_req_finish() is definitely safe, as all requests in block/io must not overflow BDRV_MAX_LENGTH. Still add assertions. For bdrv_aligned_pwritev() 'bytes' type is widened, so callers are safe. Let's check usage of the parameter inside the function. Passing to bdrv_co_write_req_prepare() and bdrv_co_write_req_finish() is OK. Passing to qemu_iovec_* is OK after new assertion. All other callees are already updated to int64_t. Checking alignment is not changed, offset + bytes and qiov_offset + bytes calculations are safe (thanks to new assertions). max_transfer is kept to be int for now. It has a default of INT_MAX here, and some drivers may rely on it. It's to be refactored later. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20201211183934.169161-12-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-02-03 08:16:03 -06:00
Vladimir Sementsov-Ogievskiy	5ae07b1410	block/io: support int64_t bytes in bdrv_co_do_pwrite_zeroes() We are generally moving to int64_t for both offset and bytes parameters on all io paths. Main motivation is realization of 64-bit write_zeroes operation for fast zeroing large disk chunks, up to the whole disk. We chose signed type, to be consistent with off_t (which is signed) and with possibility for signed return type (where negative value means error). So, prepare bdrv_co_do_pwrite_zeroes() now. Callers are safe, as converting int to int64_t is safe. Concentrate on 'bytes' usage in the function (thx to Eric Blake): compute 'int tail' via % 'int alignment' - safe fragmentation loop 'int num' - still fragments with a cap on max_transfer use of 'num' within the loop MIN(bytes, max_transfer) as well as %alignment - still works, so calculations in if (head) {} are safe clamp size by 'int max_write_zeroes' - safe drv->bdrv_co_pwrite_zeroes(int) - safe because of clamping clamp size by 'int max_transfer' - safe buf allocation is still clamped to max_transfer qemu_iovec_init_buf(size_t) - safe because of clamping bdrv_driver_pwritev(uint64_t) - safe Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20201211183934.169161-11-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-02-03 08:16:03 -06:00
Vladimir Sementsov-Ogievskiy	17abcbeee2	block/io: use int64_t bytes in driver wrappers We are generally moving to int64_t for both offset and bytes parameters on all io paths. Main motivation is realization of 64-bit write_zeroes operation for fast zeroing large disk chunks, up to the whole disk. We chose signed type, to be consistent with off_t (which is signed) and with possibility for signed return type (where negative value means error). So, convert driver wrappers parameters which are already 64bit to signed type. Requests in block/io.c must never exceed BDRV_MAX_LENGTH (which is less than INT64_MAX), which makes the conversion to signed 64bit type safe. Add corresponding assertions. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20201211183934.169161-10-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-02-03 08:16:03 -06:00
Eric Blake	8024726459	block: use int64_t as bytes type in tracked requests We are generally moving to int64_t for both offset and bytes parameters on all io paths. Main motivation is realization of 64-bit write_zeroes operation for fast zeroing large disk chunks, up to the whole disk. We chose signed type, to be consistent with off_t (which is signed) and with possibility for signed return type (where negative value means error). All requests in block/io must not overflow BDRV_MAX_LENGTH, all external users of BdrvTrackedRequest already have corresponding assertions, so we are safe. Add some assertions still. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20201211183934.169161-9-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-02-03 08:14:15 -06:00
Vladimir Sementsov-Ogievskiy	63f4ad1186	block/io: improve bdrv_check_request: check qiov too Operations with qiov add more restrictions on bytes, let's cover it. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20201211183934.169161-8-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-02-03 08:14:00 -06:00
Vladimir Sementsov-Ogievskiy	801625e69d	block/throttle-groups: throttle_group_co_io_limits_intercept(): 64bit bytes The function is called from 64bit io handlers, and bytes is just passed to throttle_account() which is 64bit too (unsigned though). So, let's convert intermediate argument to 64bit too. This patch is a first in the 64-bit-blocklayer series, so we are generally moving to int64_t for both offset and bytes parameters on all io paths. Main motivation is realization of 64-bit write_zeroes operation for fast zeroing large disk chunks, up to the whole disk. We chose signed type, to be consistent with off_t (which is signed) and with possibility for signed return type (where negative value means error). Patch-correctness audit by Eric Blake: Caller has 32-bit, this patch now causes widening which is safe: block/block-backend.c: blk_do_preadv() passes 'unsigned int' block/block-backend.c: blk_do_pwritev_part() passes 'unsigned int' block/throttle.c: throttle_co_pwrite_zeroes() passes 'int' block/throttle.c: throttle_co_pdiscard() passes 'int' Caller has 64-bit, this patch fixes potential bug where pre-patch could narrow, except it's easy enough to trace that callers are still capped at 2G actions: block/throttle.c: throttle_co_preadv() passes 'uint64_t' block/throttle.c: throttle_co_pwritev() passes 'uint64_t' Implementation in question: block/throttle-groups.c throttle_group_co_io_limits_intercept() takes 'unsigned int bytes' and uses it: argument to util/throttle.c throttle_account(uint64_t) All safe: it patches a latent bug, and does not introduce any 64-bit gotchas once throttle_co_p{read,write}v are relaxed, and assuming throttle_account() is not buggy. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-Id: <20201211183934.169161-7-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-02-03 08:14:00 -06:00
Vladimir Sementsov-Ogievskiy	98ca45494f	block/io: bdrv_pad_request(): support qemu_iovec_init_extended failure Make bdrv_pad_request() honest: return error if qemu_iovec_init_extended() failed. Update also bdrv_padding_destroy() to clean the structure for safety. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20201211183934.169161-6-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-02-03 08:14:00 -06:00
Vladimir Sementsov-Ogievskiy	f0deecff82	block/io: refactor bdrv_pad_request(): move bdrv_pad_request() up Prepare for the following patch when bdrv_pad_request() will be able to fail. Update the comments. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20201211183934.169161-5-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> [eblake: grammar tweak] Signed-off-by: Eric Blake <eblake@redhat.com>	2021-02-03 08:00:52 -06:00
Vladimir Sementsov-Ogievskiy	a56ed80c42	block: fix theoretical overflow in bdrv_init_padding() Calculation of sum may theoretically overflow, so use 64bit type and add some good assertions. Use int64_t constantly. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20201211183934.169161-4-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> [eblake: tweak assertion order] Signed-off-by: Eric Blake <eblake@redhat.com>	2021-02-03 08:00:33 -06:00
Vladimir Sementsov-Ogievskiy	4c002cef0e	util/iov: make qemu_iovec_init_extended() honest Actually, we can't extend the io vector in all cases. Handle possible MAX_IOV and size_t overflows. For now add assertion to callers (actually they rely on success anyway) and fix them in the following patch. Add also some additional good assertions to qemu_iovec_init_slice() while being here. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20201211183934.169161-3-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-02-03 08:00:33 -06:00
Vladimir Sementsov-Ogievskiy	69b55e03f7	block: refactor bdrv_check_request: add errp It's better to pass &error_abort than just assert that result is 0: on crash, we'll immediately see the reason in the backtrace. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20201211183934.169161-2-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> [eblake: fix iotest 206 fallout] Signed-off-by: Eric Blake <eblake@redhat.com>	2021-02-03 08:00:33 -06:00
Kevin Wolf	26513a0174	block: Fix VM size column width in bdrv_snapshot_dump() size_to_str() can return a size like "4.24 MiB", with a single digit integer part and two fractional digits. This is eight characters, but commit `b39847a5` changed the format string to only reserve seven characters for the column. This can result in unaligned columns, which in turn changes the output of iotests case 267 because exceeding the column size defeats the attempt to filter the size out of the output (observed with the ppc64 emulator). The resulting change is only a whitespace change, but since commit `f203080b` this is enough for iotests to consider the test failed. Taking a character away from the tag name column and adding it to the VM size column doesn't change anything in the common case (the tag name is left justified, the VM size is right justified), but fixes this case. Fixes: `b39847a505` Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20210202155911.179865-1-kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-02-02 17:23:55 +01:00
Philippe Mathieu-Daudé	fcc8672aca	block/nvme: Trace NVMe spec version supported by the controller NVMe controllers implement different versions of the spec, and different features of it. It is useful to gather this information when debugging. Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20210127212137.3482291-3-philmd@redhat.com> Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-02-02 17:05:38 +01:00
Philippe Mathieu-Daudé	97b709f32e	block/nvme: Properly display doorbell stride length in trace event Commit `15b2260bef` ("block/nvme: Trace controller capabilities") misunderstood the doorbell stride value from the datasheet, use the correct one. The 'doorbell_scale' variable used few lines later is correct. Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20210127212137.3482291-2-philmd@redhat.com> Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-02-02 17:05:38 +01:00
Peter Maydell	7e7eb9f852	QAPI patches patches for 2021-01-28 -----BEGIN PGP SIGNATURE----- iQJGBAABCAAwFiEENUvIs9frKmtoZ05fOHC0AOuRhlMFAmASY10SHGFybWJydUBy ZWRoYXQuY29tAAoJEDhwtADrkYZT4M4P+gKN64+WaErotLHHsqtiA0aoTwbTFXin OEyR5du0+PjX96qYHHV+ZDn5uxxKI57/SRNooPndjU63sYgAbNApfsu+wUDZC844 qMSlrmyw2+Lw1EIykoLXK49+pEDU0XpVIciL5+zEdtCgjiJRjrOOJ/JRBcKoQNHn UArGNQ8y0D+0i8uXyJjyvQeHdz6KUr9sX1vqwRGMt9axEMDJks0+Si4Zg3z2wlWJ Sc3WsXEhikxK1qkF2/6VsopOgNGB0UUvV6q1GO6ngdqag1Hb6mACzSv9mtIShGjh a2MISBhxF8h4wfO8U5TiS9vBgYR3elA3kRGsn4FOfD3sSilt/SWLPHWXdlO1aL2E TollRPtYBqn2YIYQP1SEp7NIqaWC/QaGkP/mH8Jvv0YlL64RK879lv6KiHKzfvI7 HBD7WGZBwMQqPczuw308tqDTQPKUsPDYoEJAFRywkLry86wL8DBOlkQ0lWUjF06s UQk/i09nhrcNLo0GbmgAOHUVj4m03zLyMW/fYmsQ8xe9/b6GBwJvtm2v5wKwg0HE ixxj4oBIk5YV5Xwt7DKLkT0voPAAgNK13a6ywzbyfsigwaJaO9tLtZ0PMuaT9kgs b/OBdeeIYpFdIT/DlcMWpIFi53VYe0McX8MmprHcMZb1133wk5Z5gk+FAWLMifrw 2ltmoUPoB1dC =djiE -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/armbru/tags/pull-qapi-2021-01-28' into staging QAPI patches patches for 2021-01-28 # gpg: Signature made Thu 28 Jan 2021 07:10:21 GMT # gpg: using RSA key 354BC8B3D7EB2A6B68674E5F3870B400EB918653 # gpg: issuer "armbru@redhat.com" # gpg: Good signature from "Markus Armbruster <armbru@redhat.com>" [full] # gpg: aka "Markus Armbruster <armbru@pond.sub.org>" [full] # Primary key fingerprint: 354B C8B3 D7EB 2A6B 6867 4E5F 3870 B400 EB91 8653 * remotes/armbru/tags/pull-qapi-2021-01-28: qapi: More complex uses of QAPI_LIST_APPEND qapi: Use QAPI_LIST_APPEND in trivial cases qapi: Introduce QAPI_LIST_APPEND qapi: A couple more QAPI_LIST_PREPEND() stragglers net: Clarify early exit condition Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2021-01-28 22:43:18 +00:00
Eric Blake	95b3a8c8a8	qapi: More complex uses of QAPI_LIST_APPEND These cases require a bit more thought to review; in each case, the code was appending to a list, but not with a FOOList **tail variable. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210113221013.390592-6-eblake@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> [Flawed change to qmp_guest_network_get_interfaces() dropped] Signed-off-by: Markus Armbruster <armbru@redhat.com>	2021-01-28 08:08:45 +01:00
Eric Blake	c3033fd372	qapi: Use QAPI_LIST_APPEND in trivial cases The easiest spots to use QAPI_LIST_APPEND are where we already have an obvious pointer to the tail of a list. While at it, consistently use the variable name 'tail' for that purpose. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20210113221013.390592-5-eblake@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>	2021-01-28 08:08:45 +01:00
Kevin Wolf	86b1cf3227	block: Separate blk_is_writable() and blk_supports_write_perm() Currently, blk_is_read_only() tells whether a given BlockBackend can only be used in read-only mode because its root node is read-only. Some callers actually try to answer a slightly different question: Is the BlockBackend configured to be writable, by taking write permissions on the root node? This can differ, for example, for CD-ROM devices which don't take write permissions, but may be backed by a writable image file. scsi-cd allows write requests to the drive if blk_is_read_only() returns false. However, the write request will immediately run into an assertion failure because the write permission is missing. This patch introduces separate functions for both questions. blk_supports_write_perm() answers the question whether the block node/image file can support writable devices, whereas blk_is_writable() tells whether the BlockBackend is currently configured to be writable. All calls of blk_is_read_only() are converted to one of the two new functions. Fixes: https://bugs.launchpad.net/bugs/1906693 Cc: qemu-stable@nongnu.org Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20210118123448.307825-2-kwolf@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-01-27 20:45:20 +01:00
David Edmondson	797e3e3805	block: report errno when flock fcntl fails When a call to fcntl(2) for the purpose of adding file locks fails with an error other than EAGAIN or EACCES, report the error returned by fcntl. EAGAIN or EACCES are elided as they are considered to be common failures, indicating that a conflicting lock is held by another process. No errors are elided when removing file locks. Signed-off-by: David Edmondson <david.edmondson@oracle.com> Message-Id: <20210113164447.2545785-1-david.edmondson@oracle.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2021-01-26 14:36:37 +01:00
Vladimir Sementsov-Ogievskiy	143a6384f5	block/block-copy: drop unused argument of block_copy() Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210116214705.822267-21-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2021-01-26 14:36:37 +01:00
Vladimir Sementsov-Ogievskiy	5b49c2bdc1	block/block-copy: drop unused block_copy_set_progress_callback() Drop unused code. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210116214705.822267-20-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2021-01-26 14:36:37 +01:00
Vladimir Sementsov-Ogievskiy	71eed4cebe	backup: move to block-copy This brings async request handling and block-status driven chunk sizes to backup out of the box, which improves backup performance. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210116214705.822267-18-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2021-01-26 14:36:37 +01:00
Vladimir Sementsov-Ogievskiy	511e7d31bf	block/backup: drop extra gotos from backup_run() Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210116214705.822267-17-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2021-01-26 14:36:37 +01:00
Vladimir Sementsov-Ogievskiy	d51590fc3e	block/block-copy: make progress_bytes_callback optional We are going to stop use of this callback in the following commit. Still the callback handling code will be dropped in a separate commit. So, for now let's make it optional. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210116214705.822267-16-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2021-01-26 14:36:37 +01:00
Vladimir Sementsov-Ogievskiy	2c59fd833a	qapi: backup: add max-chunk and max-workers to x-perf struct Add new parameters to configure future backup features. The patch doesn't introduce aio backup requests (so we actually have only one worker) neither requests larger than one cluster. Still, formally we satisfy these maximums anyway, so add the parameters now, to facilitate further patch which will really change backup job behavior. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210116214705.822267-11-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2021-01-26 14:36:37 +01:00
Vladimir Sementsov-Ogievskiy	a6d23d56df	block/block-copy: add block_copy_cancel Add function to cancel running async block-copy call. It will be used in backup. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210116214705.822267-8-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2021-01-26 14:36:37 +01:00
Vladimir Sementsov-Ogievskiy	7e032df0ea	block/block-copy: add ratelimit to block-copy We are going to directly use one async block-copy operation for backup job, so we need rate limiter. We want to maintain current backup behavior: only background copying is limited and copy-before-write operations only participate in limit calculation. Therefore we need one rate limiter for block-copy state and boolean flag for block-copy call state for actual limitation. Note, that we can't just calculate each chunk in limiter after successful copying: it will not save us from starting a lot of async sub-requests which will exceed limit too much. Instead let's use the following scheme on sub-request creation: 1. If at the moment limit is not exceeded, create the request and account it immediately. 2. If at the moment limit is already exceeded, drop create sub-request and handle limit instead (by sleep). With this approach we'll never exceed the limit more than by one sub-request (which pretty much matches current backup behavior). Note also, that if there is in-flight block-copy async call, block_copy_kick() should be used after set-speed to apply new setup faster. For that block_copy_kick() published in this patch. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210116214705.822267-7-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2021-01-26 14:36:37 +01:00
Vladimir Sementsov-Ogievskiy	2e099a9d29	block/block-copy: add list of all call-states It simplifies debugging. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210116214705.822267-6-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2021-01-26 14:36:37 +01:00
Vladimir Sementsov-Ogievskiy	26be9d62dd	block/block-copy: add max_chunk and max_workers parameters They will be used for backup. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210116214705.822267-5-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2021-01-26 14:36:37 +01:00
Vladimir Sementsov-Ogievskiy	de4641b46b	block/block-copy: implement block_copy_async We'll need async block-copy invocation to use in backup directly. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210116214705.822267-4-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2021-01-26 14:36:37 +01:00
Vladimir Sementsov-Ogievskiy	3b8c2329b5	block/block-copy: More explicit call_state Refactor common path to use BlockCopyCallState pointer as parameter, to prepare it for use in asynchronous block-copy (at least, we'll need to run block-copy in a coroutine, passing the whole parameters as one pointer). Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210116214705.822267-3-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2021-01-26 14:36:37 +01:00
Vladimir Sementsov-Ogievskiy	86c6a3b690	qapi: backup: add perf.use-copy-range parameter Experiments show, that copy_range is not always making things faster. So, to make experimentation simpler, let's add a parameter. Some more perf parameters will be added soon, so here is a new struct. For now, add new backup qmp parameter with x- prefix for the following reasons: - We are going to add more performance parameters, some will be related to the whole block-copy process, some only to background copying in backup (ignored for copy-before-write operations). - On the other hand, we are going to use block-copy interface in other block jobs, which will need performance options as well.. And it should be the same structure or at least somehow related. So, there are too much unclean things about how the interface and now we need the new options mostly for testing. Let's keep them experimental for a while. In do_backup_common() new x-perf parameter handled in a way to make further options addition simpler. We add use-copy-range with default=true, and we'll change the default in further patch, after moving backup to use block-copy. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210116214705.822267-2-vsementsov@virtuozzo.com> [mreitz: s/5\.2/6.0/] Signed-off-by: Max Reitz <mreitz@redhat.com>	2021-01-26 14:36:37 +01:00
Andrey Shinkevich	205736f488	block: apply COR-filter to block-stream jobs This patch completes the series with the COR-filter applied to block-stream operations. Adding the filter makes it possible in future implement discarding copied regions in backing files during the block-stream job, to reduce the disk overuse (we need control on permissions). Also, the filter now is smart enough to do copy-on-read with specified base, so we have benefit on guest reads even when doing block-stream of the part of the backing chain. Several iotests are slightly modified due to filter insertion. Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20201216061703.70908-14-vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2021-01-26 14:36:37 +01:00
Vladimir Sementsov-Ogievskiy	0f6c94988a	block/stream: add s->target_bs Add a direct link to target bs for convenience and to simplify following commit which will insert COR filter above target bs. This is a part of original commit written by Andrey. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20201216061703.70908-13-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2021-01-26 14:36:37 +01:00
Vladimir Sementsov-Ogievskiy	7f4a396d76	qapi: block-stream: add "bottom" argument The code already don't freeze base node and we try to make it prepared for the situation when base node is changed during the operation. In other words, block-stream doesn't own base node. Let's introduce a new interface which should replace the current one, which will in better relations with the code. Specifying bottom node instead of base, and requiring it to be non-filter gives us the following benefits: - drop difference between above_base and base_overlay, which will be renamed to just bottom, when old interface dropped - clean way to work with parallel streams/commits on the same backing chain, which otherwise become a problem when we introduce a filter for stream job - cleaner interface. Nobody will surprised the fact that base node may disappear during block-stream, when there is no word about "base" in the interface. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20201216061703.70908-11-vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2021-01-26 14:36:37 +01:00
Andrey Shinkevich	000e5a1cda	stream: rework backing-file changing Stream in stream_prepare calls bdrv_change_backing_file() to change backing-file in the metadata of bs. It may use either backing-file parameter given by user or just take filename of base on job start. Backing file format is determined by base on job finish. There are some problems with this design, we solve only two by this patch: 1. Consider scenario with backing-file unset. Current concept of stream supports changing of the base during the job (we don't freeze link to the base). So, we should not save base filename at job start, - let's determine name of the base on job finish. 2. Using direct base to determine filename and format is not very good: base node may be a filter, so its filename may be JSON, and format_name is not good for storing into qcow2 metadata as backing file format. - let's use unfiltered_base Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> [vsementsov: change commit subject, change logic in stream_prepare] Message-Id: <20201216061703.70908-10-vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2021-01-26 14:36:37 +01:00
Andrey Shinkevich	e275458b29	copy-on-read: skip non-guest reads if no copy needed If the flag BDRV_REQ_PREFETCH was set, skip idling read/write operations in COR-driver. It can be taken into account for the COR-algorithms optimization. That check is being made during the block stream job by the moment. Add the BDRV_REQ_PREFETCH flag to the supported_read_flags of the COR-filter. block: Modify the comment for the flag BDRV_REQ_PREFETCH as we are going to use it alone and pass it to the COR-filter driver for further processing. Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20201216061703.70908-9-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2021-01-26 14:36:37 +01:00
Andrey Shinkevich	897dd0ec4f	block: include supported_read_flags into BDS structure Add the new member supported_read_flags to the BlockDriverState structure. It will control the flags set for copy-on-read operations. Make the block generic layer evaluate supported read flags before they go to a block driver. Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> [vsementsov: use assert instead of abort] Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20201216061703.70908-8-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2021-01-26 14:36:37 +01:00
Andrey Shinkevich	e4c8fddde7	qapi: copy-on-read filter: add 'bottom' option Add an option to limit copy-on-read operations to specified sub-chain of backing-chain, to make copy-on-read filter useful for block-stream job. Suggested-by: Max Reitz <mreitz@redhat.com> Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> [vsementsov: change subject, modified to freeze the chain, do some fixes] Message-Id: <20201216061703.70908-6-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2021-01-26 11:26:54 +01:00
Andrey Shinkevich	880747a887	qapi: add filter-node-name to block-stream Provide the possibility to pass the 'filter-node-name' parameter to the block-stream job as it is done for the commit block job. Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> [vsementsov: comment indentation, s/Since: 5.2/Since: 6.0/] Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20201216061703.70908-5-vsementsov@virtuozzo.com> [mreitz: s/commit/stream/] Signed-off-by: Max Reitz <mreitz@redhat.com>	2021-01-26 11:26:54 +01:00
Andrey Shinkevich	16e09a21af	copy-on-read: add filter drop function Provide API for the COR-filter removal. Also, drop the filter child permissions for an inactive state when the filter node is being removed. To insert the filter, the block generic layer function bdrv_insert_node() can be used. The new function bdrv_cor_filter_drop() may be considered as an intermediate solution before the QEMU permission update system has overhauled. Then we are able to implement the API function bdrv_remove_node() on the block generic layer. Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20201216061703.70908-4-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2021-01-26 11:26:54 +01:00
Andrey Shinkevich	1252e03b8e	copy-on-read: support preadv/pwritev_part functions Add support for the recently introduced functions bdrv_co_preadv_part() and bdrv_co_pwritev_part() to the COR-filter driver. Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20201216061703.70908-2-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2021-01-26 11:26:54 +01:00
Peter Maydell	45240eed4f	Yank patches patches for 2021-01-13 -----BEGIN PGP SIGNATURE----- iQJGBAABCAAwFiEENUvIs9frKmtoZ05fOHC0AOuRhlMFAl/+vJoSHGFybWJydUBy ZWRoYXQuY29tAAoJEDhwtADrkYZTyv8P/3Dqb9sM8p2WIeoUt5KjgkgWto8anCXM /vzAvVdOrPcLXgHF1HOEkcYkp5ZDzFb1PP+LNWsbIB82HmGUGfA7CiOpCpXEoDJs Z9OYR3K8W5fvSKYTI/m+s7d+9aYqRZajI6ON5M4Eqem0ZwV93/SZMBHOcs1GmvIR diXIztaWVgHjU1Q37MlvJTM4lLN3RH1kTWEdfp3dkNMO6HxBet0B1g7xwaPxKrgb 4y8D/kk9TA1m4wnwrr9s1l3UnfDiZ7mSfsEsXMcTMmQQrAtonD/xiX2YFsHwf6+U 9cX9BCG2XP65t3ynY5goddcjVX3R6SuP4YWSgYUpJGrSx+GVxXrtF0ZumgiCk+4T uv8sOkJdPe9/aRy86UkNzJx7V50nCZnJ2if9neukVjHGW4Hw4txcYV5/7ZuuDqKl tF5NdF/LcHEZLKkBt+4g4TpJ7vxEmJc8/ukn9niLT381SkRBFnnP+Bnl9taSlHOY xNtQJ3Jrcd/cvBjAPCKgtE4Fx6wzQG3c7Yg4WHxbZzcYZBPp4fifUTLCK5XKHqhb rlqCQIO9DzGz5tqOgG7hWmlodSafGiDsHo9tJVpyF5pSSUv3A4KzX2xW4FZZLKJn 7uBrcV0bLmR4tyw+fr+u2EW0ClYrs/JxeXAnsnTp9JrzkXILf5RjEuK0Sc1ZIuZW cmuPa8027ybj =fLO1 -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/armbru/tags/pull-yank-2021-01-13' into staging Yank patches patches for 2021-01-13 # gpg: Signature made Wed 13 Jan 2021 09:25:46 GMT # gpg: using RSA key 354BC8B3D7EB2A6B68674E5F3870B400EB918653 # gpg: issuer "armbru@redhat.com" # gpg: Good signature from "Markus Armbruster <armbru@redhat.com>" [full] # gpg: aka "Markus Armbruster <armbru@pond.sub.org>" [full] # Primary key fingerprint: 354B C8B3 D7EB 2A6B 6867 4E5F 3870 B400 EB91 8653 * remotes/armbru/tags/pull-yank-2021-01-13: tests/test-char.c: Wait for the chardev to connect in char_socket_client_dupid_test io: Document qmp oob suitability of qio_channel_shutdown and io_shutdown io/channel-tls.c: make qio_channel_tls_shutdown thread-safe migration: Add yank feature chardev/char-socket.c: Add yank feature block/nbd.c: Add yank feature Introduce yank feature Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2021-01-13 14:19:24 +00:00
Lukas Straub	fee091cdff	block/nbd.c: Add yank feature Register a yank function which shuts down the socket and sets s->state = NBD_CLIENT_QUIT. This is the same behaviour as if an error occured. Signed-off-by: Lukas Straub <lukasstraub2@web.de> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <b73eb07db6d1fcd00667beb13ae6117260f002c3.1609167865.git.lukasstraub2@web.de> Signed-off-by: Markus Armbruster <armbru@redhat.com>	2021-01-13 10:21:17 +01:00
Roman Bolshakov	3eacf70bb5	meson: Propagate gnutls dependency crypto/tlscreds.h includes GnuTLS headers if CONFIG_GNUTLS is set, but GNUTLS_CFLAGS, that describe include path, are not propagated transitively to all users of crypto and build fails if GnuTLS headers reside in non-standard directory (which is a case for homebrew on Apple Silicon). Signed-off-by: Roman Bolshakov <r.bolshakov@yadro.com> Message-Id: <20210102125213.41279-1-r.bolshakov@yadro.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-01-12 12:38:03 +01:00
Peter Maydell	729cc68373	Remove superfluous timer_del() calls This commit is the result of running the timer-del-timer-free.cocci script on the whole source tree. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Acked-by: Corey Minyard <cminyard@mvista.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20201215154107.3255-4-peter.maydell@linaro.org	2021-01-08 15:13:38 +00:00
Paolo Bonzini	9db405a335	libiscsi: convert to meson Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-01-02 21:03:37 +01:00
Paolo Bonzini	8e4e2b551d	curl: remove compatibility code, require 7.29.0 cURL 7.16.0 was released in October 2006. Just remove code that is in all likelihood not being used anywhere, and require the oldest version found in currently supported distros, which is 7.29.0 from CentOS 7. pkg-config is enough for QEMU, since it does not need extra information such as the path for certicate authorities. All supported platforms today will all have pkg-config for curl, so we can drop curl-config. Suggested-by: Daniel Berrangé <berrange@redhat.com> Reviewed-by: Daniel Berrangé <berrange@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-01-02 21:03:37 +01:00
Paolo Bonzini	2f2a376a42	meson: use dependency to gate block modules This allows converting the dependencies to meson options one by one. Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-01-02 21:03:36 +01:00
Peter Maydell	1f7c02797f	QAPI patches patches for 2020-12-19 -----BEGIN PGP SIGNATURE----- iQJGBAABCAAwFiEENUvIs9frKmtoZ05fOHC0AOuRhlMFAl/dynUSHGFybWJydUBy ZWRoYXQuY29tAAoJEDhwtADrkYZT3igP/3bWwsKR5vKVsDUTmMfrhcgaFvQiaYoG F29Bond8Xy0Zd0gl7OWh/5jKL0vGlrEVPrKfYLUjMnfkeRec/pOkIB2oOmIxpnPs 9zi4kh2hQ3dEoRBuvSnnZzedetYPTuCpWMIjlztkgfxgcimqm8TPNVSxRaSApjC3 Y8108wGwBWVf2C0rhKO9E2xA51uo6khy05i1psUtqUlC+PuDQ/OwzQHM2dnWdDB6 kUwBDK17nhL6WwsYqCyKLSiDModReYfDiY8GS5MDLo74dzwXiatEefCR7+sbM4xq eX/SBoqoeS1jLPNuCryNeGNKvNA2KAbEJTnbQA2NxBXHgZ9/1SxVZFxuPp4nDMSQ N7BDuDI8YtJE479RjT/ZzRG65xadGBSe/HXkXM9mZwh1zitop8SVZ9fArFBHvNzw Y5zAv3fQd54+87psffg4dYFK0wGmqTabLEEuVzM8KIVqcAdYA2yC2b2EHy+vsxuq GMkr0WaA6Sq2gthXmzdTjmUPuHdan/NIhuV6d66SbPNH2oH31piptFxuznyFWSKV isciFFdUrkg5QrF8DSt2nmdwMFf8QGbszqP8QIGMzhJCCS9GXIiGG8f149++q8X8 HO1lFAdLQJdrDwCYmfx36tOvi2rS/rcoTGgvg66UX3xKko1ruoxR1ZWcS54obJN6 vEQDZ+PxubDg =vGLy -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/armbru/tags/pull-qapi-2020-12-19' into staging QAPI patches patches for 2020-12-19 # gpg: Signature made Sat 19 Dec 2020 09:40:05 GMT # gpg: using RSA key 354BC8B3D7EB2A6B68674E5F3870B400EB918653 # gpg: issuer "armbru@redhat.com" # gpg: Good signature from "Markus Armbruster <armbru@redhat.com>" [full] # gpg: aka "Markus Armbruster <armbru@pond.sub.org>" [full] # Primary key fingerprint: 354B C8B3 D7EB 2A6B 6867 4E5F 3870 B400 EB91 8653 * remotes/armbru/tags/pull-qapi-2020-12-19: (33 commits) qobject: Make QString immutable block: Use GString instead of QString to build filenames keyval: Use GString to accumulate value strings json: Use GString instead of QString to accumulate strings migration: Replace migration's JSON writer by the general one qobject: Factor JSON writer out of qobject_to_json() qobject: Factor quoted_str() out of to_json() qobject: Drop qstring_get_try_str() qobject: Drop qobject_get_try_str() Revert "qobject: let object_property_get_str() use new API" block: Avoid qobject_get_try_str() qmp: Fix tracing of non-string command IDs qobject: Move internals to qobject-internal.h hw/rdma: Replace QList by GQueue Revert "qstring: add qstring_free()" qobject: Change qobject_to_json()'s value to GString qobject: Use GString instead of QString to accumulate JSON qobject: Make qobject_to_json_pretty() take a pretty argument monitor: Use GString instead of QString for output buffer hmp: Simplify how qmp_human_monitor_command() gets output ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2021-01-01 14:33:03 +00:00
Peter Maydell	26f6b15e26	Block patches: - New block filter: preallocate (which, on writes beyond an image file's end, allocates big chunks of data so that such post-EOF writes will occur less frequently) - write-zeroes and block-status support for Quorum - Implementation of truncate for the nvme block driver similarly to the existing implementations for host block devices and iscsi devices - Block layer refactoring: Drop the tighten_restrictions concept in the block permission functions - iotest fixes -----BEGIN PGP SIGNATURE----- iQFGBAABCAAwFiEEkb62CjDbPohX0Rgp9AfbAGHVz0AFAl/cwIoSHG1yZWl0ekBy ZWRoYXQuY29tAAoJEPQH2wBh1c9AnvMH+gOnZCwEUKWuBxGX3Wjb/kqV1OuhAhcP IVrKLRnqdarCYMQ9M4SZL6pedfsujHA7vClTV7NTrenXBsEIradBQ59ztQ0oDirS 4ipIjVtNqj7m86l+IRZDq5HlwOYwwFnWogmLo2bcmNJGLpPQQfrhL2vRJ1wLgFYk WjeAVlkkYcHnTIDvs4ne9WRSlxGVBWJ4X5nSlRdZqeyUcMY9v4wL4P9Wc4ZuORmq /5HRcT5JKGaT2bAueaqAGEdtPFGbazEP5uU7MTTK/fueDKIRAXO2d0gqhANtOOJQ 7hMmKhwOPOOhrrpCVi9nxsVwdCOHfurV0km6cOs+Iprm/Wm2UtuS/A8= =z+7k -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/maxreitz/tags/pull-block-2020-12-18' into staging Block patches: - New block filter: preallocate (which, on writes beyond an image file's end, allocates big chunks of data so that such post-EOF writes will occur less frequently) - write-zeroes and block-status support for Quorum - Implementation of truncate for the nvme block driver similarly to the existing implementations for host block devices and iscsi devices - Block layer refactoring: Drop the tighten_restrictions concept in the block permission functions - iotest fixes # gpg: Signature made Fri 18 Dec 2020 14:45:30 GMT # gpg: using RSA key 91BEB60A30DB3E8857D11829F407DB0061D5CF40 # gpg: issuer "mreitz@redhat.com" # gpg: Good signature from "Max Reitz <mreitz@redhat.com>" [full] # Primary key fingerprint: 91BE B60A 30DB 3E88 57D1 1829 F407 DB00 61D5 CF40 * remotes/maxreitz/tags/pull-block-2020-12-18: (30 commits) iotests: Fix _send_qemu_cmd with bash 5.1 iotests/102: Pass $QEMU_HANDLE to _send_qemu_cmd block/nvme: Implement fake truncate() coroutine quorum: Implement bdrv_co_pwrite_zeroes() quorum: Implement bdrv_co_block_status() scripts/simplebench: add bench_prealloc.py simplebench/results_to_text: make executable simplebench/results_to_text: add difference line to the table simplebench/results_to_text: improve view of the table simplebench: move results_to_text() into separate file simplebench: rename ascii() to results_to_text() scripts/simplebench: use standard deviation for +- error scripts/simplebench: support iops scripts/simplebench: fix grammar: s/successed/succeeded/ iotests: add 298 to test new preallocate filter driver iotests.py: execute_setup_common(): add required_fmts argument iotests: qemu_io_silent: support --image-opts qemu-io: add preallocate mode parameter for truncate command block: introduce preallocate filter block: bdrv_check_perm(): process children anyway ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2020-12-31 23:26:46 +00:00
Markus Armbruster	eab3a4678b	qobject: Change qobject_to_json()'s value to GString qobject_to_json() and qobject_to_json_pretty() build a GString, then covert it to QString. Just one of the callers actually needs a QString: qemu_rbd_parse_filename(). A few others need a string they can modify: qmp_send_response(), qga's send_response(), to_json_str(), and qmp_fd_vsend_fds(). The remainder just need a string. Change qobject_to_json() and qobject_to_json_pretty() to return the GString. qemu_rbd_parse_filename() now has to convert to QString. All others save a QString temporary. to_json_str() actually becomes a bit simpler, because GString provides more convenient modification functions. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20201211171152.146877-6-armbru@redhat.com>	2020-12-19 10:38:43 +01:00
Eric Blake	54aa3de72e	qapi: Use QAPI_LIST_PREPEND() where possible Anywhere we create a list of just one item or by prepending items (typically because order doesn't matter), we can use QAPI_LIST_PREPEND(). But places where we must keep the list in order by appending remain open-coded until later patches. Note that as a side effect, this also performs a cleanup of two minor issues in qga/commands-posix.c: the old code was performing new = g_malloc0(sizeof(*ret)); which 1) is confusing because you have to verify whether 'new' and 'ret' are variables with the same type, and 2) would conflict with C++ compilation (not an actual problem for this file, but makes copy-and-paste harder). Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20201113011340.463563-5-eblake@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> [Straightforward conflicts due to commit `a8aa94b5f8` "qga: update schema for guest-get-disks 'dependents' field" and commit `a10b453a52` "target/mips: Move mips_cpu_add_definition() from helper.c to cpu.c" resolved. Commit message tweaked.] Signed-off-by: Markus Armbruster <armbru@redhat.com>	2020-12-19 10:20:14 +01:00
Markus Armbruster	be7c5ddd0d	block/vpc: Use sizeof() instead of HEADER_SIZE for footer size Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20201217162003.1102738-10-armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-12-18 12:43:30 +01:00
Markus Armbruster	a3d2761719	block/vpc: Pass footer buffers as VHDFooter * instead of uint8_t * Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20201217162003.1102738-9-armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-12-18 12:43:28 +01:00
Markus Armbruster	275734e479	block/vpc: Pad VHDFooter, replace uint8_t[] buffers Pad VHDFooter as specified in the "Virtual Hard Disk Image Format Specification" version 1.0[]. Change footer buffers from uint8_t[HEADER_SIZE] to VHDFooter. Their size remains the same. The VHDFooter variables pointing to a VHDFooter variable right next to it are now silly. Eliminate them, and shorten the remaining variables' names. Most variables pointing to s->footer are now also silly. Eliminate them, too. [*] http://download.microsoft.com/download/f/f/e/ffef50a5-07dd-4cf8-aaa3-442c0673a029/Virtual%20Hard%20Disk%20Format%20Spec_10_18_06.doc Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20201217162003.1102738-8-armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-12-18 12:43:26 +01:00
Markus Armbruster	3d6101a3f2	block/vpc: Use sizeof() instead of 1024 for dynamic header size Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20201217162003.1102738-7-armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-12-18 12:43:23 +01:00
Markus Armbruster	e326f0783e	block/vpc: Pad VHDDynDiskHeader, replace uint8_t[] buffers Pad VHDDynDiskHeader as specified in the "Virtual Hard Disk Image Format Specification" version 1.0[]. Change dynamic disk header buffers from uint8_t[1024] to VHDDynDiskHeader. Their size remains the same. The VHDDynDiskHeader variables pointing to a VHDDynDiskHeader variable right next to it are now silly. Eliminate them. [*] http://download.microsoft.com/download/f/f/e/ffef50a5-07dd-4cf8-aaa3-442c0673a029/Virtual%20Hard%20Disk%20Format%20Spec_10_18_06.doc Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20201217162003.1102738-6-armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-12-18 12:43:18 +01:00
Markus Armbruster	7550379ded	block/vpc: Make vpc_checksum() take void * Some of the next commits will checksum structs. Change vpc_checksum() to take void * instead of uint8_t, to save us pointless casts to uint8_t *. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20201217162003.1102738-5-armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-12-18 12:43:16 +01:00
Markus Armbruster	a18dc3a14d	block/vpc: Don't abuse the footer buffer for dynamic header create_dynamic_disk() takes a buffer holding the footer as first argument. It writes out the footer (512 bytes), then reuses the buffer to initialize and write out the dynamic header (1024 bytes). Works, because the caller passes a buffer that is large enough for both purposes. I hate that. Use a separate buffer for the dynamic header, and adjust the caller's buffer. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20201217162003.1102738-4-armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-12-18 12:43:14 +01:00
Markus Armbruster	b0ce8cb0e8	block/vpc: Don't abuse the footer buffer as BAT sector buffer create_dynamic_disk() takes a buffer holding the footer as first argument. It writes out the footer (512 bytes), then reuses the buffer to initialize and write out the dynamic header (1024 bytes), then reuses it again to initialize and write out BAT sectors (512). Works, because the caller passes a buffer that is large enough for all three purposes. I hate that. Use a separate buffer for writing out BAT sectors. The next commit will do the same for the dynamic header. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20201217162003.1102738-3-armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-12-18 12:43:06 +01:00
Markus Armbruster	02df95c4a1	block/vpc: Make vpc_open() read the full dynamic header The dynamic header's size is 1024 bytes. vpc_open() reads only the 512 bytes of the dynamic header into buf[]. Works, because it doesn't actually access the second half. However, a colleague told me that GCC 11 warns: ../block/vpc.c:358:51: error: array subscript 'struct VHDDynDiskHeader[0]' is partly outside array bounds of 'uint8_t[512]' [-Werror=array-bounds] Clean up to read the full header. Rename buf[] to dyndisk_header_buf[] while there. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20201217162003.1102738-2-armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-12-18 12:42:34 +01:00
Philippe Mathieu-Daudé	c8807c5edc	block/nvme: Implement fake truncate() coroutine NVMe drive cannot be shrunk. Since commit `c80d8b06cf` we can use the @exact parameter (set to false) to return success if the block device is larger than the requested offset (even if we can not be shrunk). Use this parameter to implement the NVMe truncate() coroutine, similarly how it is done for the iscsi and file-posix drivers (see commit `82325ae5f2` "Evaluate @exact in protocol drivers"). Reported-by: Xueqiang Wei <xuwei@redhat.com> Suggested-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20201210125202.858656-1-philmd@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-12-18 12:35:55 +01:00
Alberto Garcia	5cddb2e95f	quorum: Implement bdrv_co_pwrite_zeroes() This simply calls bdrv_co_pwrite_zeroes() in all children. bs->supported_zero_flags is also set to the flags that are supported by all children. Signed-off-by: Alberto Garcia <berto@igalia.com> Message-Id: <2f09c842781fe336b4c2e40036bba577b7430190.1605286097.git.berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-12-18 12:35:55 +01:00
Alberto Garcia	ef9bba1484	quorum: Implement bdrv_co_block_status() The quorum driver does not implement bdrv_co_block_status() and because of that it always reports to contain data even if all its children are known to be empty. One consequence of this is that if we for example create a quorum with a size of 10GB and we mirror it to a new image the operation will write 10GB of actual zeroes to the destination image wasting a lot of time and disk space. Since a quorum has an arbitrary number of children of potentially different formats there is no way to report all possible allocation status flags in a way that makes sense, so this implementation only reports when a given region is known to contain zeroes (BDRV_BLOCK_ZERO) or not (BDRV_BLOCK_DATA). If all children agree that a region contains zeroes then we can return BDRV_BLOCK_ZERO using the smallest size reported by the children (because all agree that a region of at least that size contains zeroes). If at least one child disagrees we have to return BDRV_BLOCK_DATA. In this case we use the largest of the sizes reported by the children that didn't return BDRV_BLOCK_ZERO (because we know that there won't be an agreement for at least that size). Signed-off-by: Alberto Garcia <berto@igalia.com> Tested-by: Tao Xu <tao3.xu@intel.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <db83149afcf0f793effc8878089d29af4c46ffe1.1605286097.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-12-18 12:35:55 +01:00
Vladimir Sementsov-Ogievskiy	33fa2222eb	block: introduce preallocate filter It's intended to be inserted between format and protocol nodes to preallocate additional space (expanding protocol file) on writes crossing EOF. It improves performance for file-systems with slow allocation. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20201021145859.11201-9-vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> [mreitz: Two comment fixes, and bumped the version from 5.2 to 6.0] Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-12-18 12:35:55 +01:00
Vladimir Sementsov-Ogievskiy	d1a764d126	block: introduce BDRV_REQ_NO_WAIT flag Add flag to make serialising request no wait: if there are conflicting requests, just return error immediately. It's will be used in upcoming preallocate filter. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20201021145859.11201-7-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-12-18 12:35:55 +01:00
Vladimir Sementsov-Ogievskiy	8ac5aab255	block: bdrv_mark_request_serialising: split non-waiting function We'll need a separate function, which will only "mark" request serialising with specified align but not wait for conflicting requests. So, it will be like old bdrv_mark_request_serialising(), before merging bdrv_wait_serialising_requests_locked() into it. To reduce the possible mess, let's do the following: Public function that does both marking and waiting will be called bdrv_make_request_serialising, and private function which will only "mark" will be called tracked_request_set_serialising(). Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20201021145859.11201-6-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-12-18 12:35:55 +01:00
Vladimir Sementsov-Ogievskiy	ec1c886831	block/io: bdrv_wait_serialising_requests_locked: drop extra bs arg bs is linked in req, so no needs to pass it separately. Most of tracked-requests API doesn't have bs argument. Actually, after this patch only tracked_request_begin has it, but it's for purpose. While being here, also add a comment about what "_locked" is. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20201021145859.11201-5-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-12-18 12:35:55 +01:00
Vladimir Sementsov-Ogievskiy	3183937ff9	block/io: split out bdrv_find_conflicting_request To be reused in separate. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20201021145859.11201-4-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-12-18 12:35:55 +01:00
Vladimir Sementsov-Ogievskiy	2e36da62cf	block/io.c: drop assertion on double waiting for request serialisation The comments states, that on misaligned request we should have already been waiting. But for bdrv_padding_rmw_read, we called bdrv_mark_request_serialising with align = request_alignment, and now we serialise with align = cluster_size. So we may have to wait again with larger alignment. Note, that the only user of BDRV_REQ_SERIALISING is backup which issues cluster-aligned requests, so seems the assertion should not fire for now. But it's wrong anyway. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20201021145859.11201-3-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-12-18 12:35:55 +01:00
Peter Lieven	182454dc63	block/nfs: fix int overflow in nfs_client_open_qdict nfs_client_open returns the file size in sectors. This effectively makes it impossible to open files larger than 1TB. Fixes: `c22a034545` Cc: qemu-stable@nongnu.org Signed-off-by: Peter Lieven <pl@kamp.de> Message-Id: <20201209121735.16437-1-pl@kamp.de> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-12-18 11:48:39 +01:00
Pan Nengyuan	cb8d0851f1	block/file-posix: fix a possible undefined behavior local_err is not initialized to NULL, it will cause a assert error as below: qemu/util/error.c:59: error_setv: Assertion `*errp == NULL' failed. Fixes: `c644751069` Reported-by: Euler Robot <euler.robot@huawei.com> Signed-off-by: Pan Nengyuan <pannengyuan@huawei.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Li Qiang <liq3ea@gmail.com> Signed-off-by: Chen Qun <kuhn.chenqun@huawei.com> Message-Id: <20201023061218.2080844-8-kuhn.chenqun@huawei.com> Signed-off-by: Laurent Vivier <laurent@vivier.eu>	2020-12-13 23:56:16 +01:00
Kevin Wolf	960d5fb3e8	block: Fix deadlock in bdrv_co_yield_to_drain() If bdrv_co_yield_to_drain() is called for draining a block node that runs in a different AioContext, it keeps that AioContext locked while it yields and schedules a BH in the AioContext to do the actual drain. As long as executing the BH is the very next thing that the event loop of the node's AioContext does, this actually happens to work, but when it tries to execute something else that wants to take the AioContext lock, it will deadlock. (In the bug report, this other thing is a virtio-scsi device running virtio_scsi_data_plane_handle_cmd().) Instead, always drop the AioContext lock across the yield and reacquire it only when the coroutine is reentered. The BH needs to unconditionally take the lock for itself now. This fixes the 'block_resize' QMP command on a block node that runs in an iothread. Cc: qemu-stable@nongnu.org Fixes: `eb94b81a94` Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1903511 Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20201203172311.68232-4-kwolf@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-12-11 17:52:40 +01:00
Vladimir Sementsov-Ogievskiy	8b1170012b	block: introduce BDRV_MAX_LENGTH We are going to modify block layer to work with 64bit requests. And first step is moving to int64_t type for both offset and bytes arguments in all block request related functions. It's mostly safe (when widening signed or unsigned int to int64_t), but switching from uint64_t is questionable. So, let's first establish the set of requests we want to work with. First signed int64_t should be enough, as off_t is signed anyway. Then, obviously offset + bytes should not overflow. And most interesting: (offset + bytes) being aligned up should not overflow as well. Aligned to what alignment? First thing that comes in mind is bs->bl.request_alignment, as we align up request to this alignment. But there is another thing: look at bdrv_mark_request_serialising(). It aligns request up to some given alignment. And this parameter may be bdrv_get_cluster_size(), which is often a lot greater than bs->bl.request_alignment. Note also, that bdrv_mark_request_serialising() uses signed int64_t for calculations. So, actually, we already depend on some restrictions. Happily, bdrv_get_cluster_size() returns int and bs->bl.request_alignment has 32bit unsigned type, but defined to be a power of 2 less than INT_MAX. So, we may establish, that INT_MAX is absolute maximum for any kind of alignment that may occur with the request. Note, that bdrv_get_cluster_size() is not documented to return power of 2, still bdrv_mark_request_serialising() behaves like it is. Also, backup uses bdi.cluster_size and is not prepared to it not being power of 2. So, let's establish that Qemu supports only power-of-2 clusters and alignments. So, alignment can't be greater than 2^30. Finally to be safe with calculations, to not calculate different maximums for different nodes (depending on cluster size and request_alignment), let's simply set QEMU_ALIGN_DOWN(INT64_MAX, 2^30) as absolute maximum bytes length for Qemu. Actually, it's not much less than INT64_MAX. OK, then, let's apply it to block/io. Let's consider all block/io entry points of offset/bytes: 4 bytes/offset interface functions: bdrv_co_preadv_part(), bdrv_co_pwritev_part(), bdrv_co_copy_range_internal() and bdrv_co_pdiscard() and we check them all with bdrv_check_request(). We also have one entry point with only offset: bdrv_co_truncate(). Check the offset. And one public structure: BdrvTrackedRequest. Happily, it has only three external users: file-posix.c: adopted by this patch write-threshold.c: only read fields test-write-threshold.c: sets obviously small constant values Better is to make the structure private and add corresponding interfaces.. Still it's not obvious what kind of interface is needed for file-posix.c. Let's keep it public but add corresponding assertions. After this patch we'll convert functions in block/io.c to int64_t bytes and offset parameters. We can assume that offset/bytes pair always satisfy new restrictions, and make corresponding assertions where needed. If we reach some offset/bytes point in block/io.c missing bdrv_check_request() it is considered a bug. As well, if block/io.c modifies a offset/bytes request, expanding it more then aligning up to request_alignment, it's a bug too. For all io requests except for discard we keep for now old restriction of 32bit request length. iotest 206 output error message changed, as now test disk size is larger than new limit. Add one more test case with new maximum disk size to cover too-big-L1 case. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20201203222713.13507-5-vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-12-11 17:52:40 +01:00
Vladimir Sementsov-Ogievskiy	f4dad307ef	block/io: bdrv_check_byte_request(): drop bdrv_is_inserted() Move bdrv_is_inserted() calls into callers. We are going to make bdrv_check_byte_request() a clean thing. bdrv_is_inserted() is not about checking the request, it's about checking the bs. So, it should be separate. With this patch we probably change error path for some failure scenarios. But depending on the fact that querying too big request on empty cdrom (or corrupted qcow2 node with no drv) will result in EIO and not ENOMEDIUM would be very strange. More over, we are going to move to 64bit requests, so larger requests will be allowed anyway. More over, keeping in mind that cdrom is the only driver that has .bdrv_is_inserted() handler it's strange that we should care so much about it in generic block layer, intuitively we should just do read and write, and cdrom driver should return correct errors if it is not inserted. But it's a work for another series. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20201203222713.13507-4-vsementsov@virtuozzo.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-12-11 17:52:40 +01:00
Vladimir Sementsov-Ogievskiy	33985614bd	block/io: bdrv_refresh_limits(): use ERRP_GUARD This simplifies following commit. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20201203222713.13507-3-vsementsov@virtuozzo.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-12-11 17:52:40 +01:00
Vladimir Sementsov-Ogievskiy	9b100af30f	block/file-posix: fix workaround in raw_do_pwrite_zeroes() We should not set overlap_bytes: 1. Don't worry: it is calculated by bdrv_mark_request_serialising() and will be equal to or greater than bytes anyway. 2. If the request was already aligned up to some greater alignment, than we may break things: we reduce overlap_bytes, and further bdrv_mark_request_serialising() may not help, as it will not restore old bigger alignment. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20201203222713.13507-2-vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-12-11 17:52:40 +01:00
Li Feng	eb43ea16dc	file-posix: check the use_lock before setting the file lock The scenario is that when accessing a volume on an NFS filesystem without supporting the file lock, Qemu will complain "Failed to lock byte 100", even when setting the file.locking = off. We should do file lock related operations only when the file.locking is enabled, otherwise, the syscall of 'fcntl' will return non-zero. Signed-off-by: Li Feng <fengli@smartx.com> Message-Id: <1607341446-85506-1-git-send-email-fengli@smartx.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-12-11 17:52:40 +01:00
Max Reitz	df4ea7091b	fuse: Implement hole detection through lseek This is a relatively new feature in libfuse (available since 3.8.0, which was released in November 2019), so we have to add a dedicated check whether it is available before making use of it. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20201027190600.192171-7-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-12-11 17:52:40 +01:00
Max Reitz	4ca37a96a7	fuse: (Partially) implement fallocate() This allows allocating areas after the (old) EOF as part of a growing resize, writing zeroes, and discarding. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20201027190600.192171-6-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-12-11 17:52:40 +01:00
Max Reitz	4fba06d594	fuse: Allow growable exports These will behave more like normal files in that writes beyond the EOF will automatically grow the export size. As an optimization, keep the RESIZE permission for growable exports so we do not have to take it for every post-EOF write. (This permission is not released when the export is destroyed, because at that point the BlockBackend is destroyed altogether anyway.) Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20201027190600.192171-5-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-12-11 17:52:40 +01:00
Max Reitz	41429e3d79	fuse: Implement standard FUSE operations This makes the export actually useful instead of only producing errors whenever it is accessed. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20201027190600.192171-4-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-12-11 17:52:39 +01:00
Max Reitz	0c9b70d590	fuse: Allow exporting BDSs via FUSE block-export-add type=fuse allows mounting block graph nodes via FUSE on some existing regular file. That file should then appears like a raw disk image, and accesses to it result in accesses to the exported BDS. Right now, we only implement the necessary block export functions to set it up and shut it down. We do not implement any access functions, so accessing the mount point only results in errors. This will be addressed by a followup patch. We keep a hash table of exported mount points, because we want to be able to detect when users try to use a mount point twice. This is because we invoke stat() to check whether the given mount point is a regular file, but if that file is served by ourselves (because it is already used as a mount point), then this stat() would have to be served by ourselves, too, which is impossible to do while we (as the caller) are waiting for it to settle. Therefore, keep track of mount point paths to at least catch the most obvious instances of that problem. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20201027190600.192171-3-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-12-11 17:52:39 +01:00
Gan Qixin	c208b0ef96	block/iscsi: Use lock guard macros Replace manual lock()/unlock() calls with lock guard macros (QEMU_LOCK_GUARD/WITH_QEMU_LOCK_GUARD) in block/iscsi. Signed-off-by: Gan Qixin <ganqixin@huawei.com> Message-Id: <20201203075055.127773-5-ganqixin@huawei.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-12-11 17:52:39 +01:00
Gan Qixin	3af613ebdb	block/throttle-groups: Use lock guard macros Replace manual lock()/unlock() calls with lock guard macros (QEMU_LOCK_GUARD/WITH_QEMU_LOCK_GUARD) in block/throttle-groups. Signed-off-by: Gan Qixin <ganqixin@huawei.com> Message-Id: <20201203075055.127773-4-ganqixin@huawei.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-12-11 17:52:39 +01:00
Gan Qixin	f5056b70e6	block/curl: Use lock guard macros Replace manual lock()/unlock() calls with lock guard macros (QEMU_LOCK_GUARD/WITH_QEMU_LOCK_GUARD) in block/curl. Signed-off-by: Gan Qixin <ganqixin@huawei.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20201203075055.127773-3-ganqixin@huawei.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-12-11 17:52:39 +01:00
Gan Qixin	c37c973660	block/accounting: Use lock guard macros Replace manual lock()/unlock() calls with lock guard macros (QEMU_LOCK_GUARD/WITH_QEMU_LOCK_GUARD) in block/accounting. Signed-off-by: Gan Qixin <ganqixin@huawei.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20201203075055.127773-2-ganqixin@huawei.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-12-11 17:52:39 +01:00
Markus Armbruster	6cc0667d9b	Tweak a few "Parameter 'NAME' expects THING" error message Change to "expects a THING" where that's an obvious improvement Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20201113082626.2725812-11-armbru@redhat.com>	2020-12-10 17:16:44 +01:00
Stefan Hajnoczi	552c2c4c10	block/export: avoid g_return_val_if() input validation Do not validate input with g_return_val_if(). This API is intended for checking programming errors and is compiled out with -DG_DISABLE_CHECKS. Use an explicit if statement for input validation so it cannot accidentally be compiled out. Suggested-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20201118091644.199527-5-stefanha@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2020-12-09 13:04:17 -05:00
Marc-André Lureau	0df750e9d3	libvhost-user: make it a meson subproject By making libvhost-user a subproject, check it builds standalone (without the global QEMU cflags etc). Note that the library still relies on QEMU include/qemu/atomic.h and linux_headers/. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20201125100640.366523-6-marcandre.lureau@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2020-12-08 13:48:58 -05:00
Maxim Levitsky	c8bf9a9169	qcow2: Fix corruption on write_zeroes with MAY_UNMAP Commit `205fa50750` ("qcow2: Add subcluster support to zero_in_l2_slice()") introduced a subtle change to code in zero_in_l2_slice: It swapped the order of 1. qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice); 2. set_l2_entry(s, l2_slice, l2_index + i, QCOW_OFLAG_ZERO); 3. qcow2_free_any_clusters(bs, old_offset, 1, QCOW2_DISCARD_REQUEST); To 1. qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice); 2. qcow2_free_any_clusters(bs, old_offset, 1, QCOW2_DISCARD_REQUEST); 3. set_l2_entry(s, l2_slice, l2_index + i, QCOW_OFLAG_ZERO); It seems harmless, however the call to qcow2_free_any_clusters can trigger a cache flush which can mark the L2 table as clean, and assuming that this was the last write to it, a stale version of it will remain on the disk. Now we have a valid L2 entry pointing to a freed cluster. Oops. Fixes: `205fa50750` ("qcow2: Add subcluster support to zero_in_l2_slice()") Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> [ kwolf: Fixed to restore the correct original order from before 205fa50750; added comments like in discard_in_l2_slice(). ] Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20201124092815.39056-1-kwolf@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-11-24 11:29:41 +01:00
Peter Maydell	683685e72d	Pull request for 5.2 NVMe fixes to solve IOMMU issues on non-x86 and error message/tracing improvements. Elena Afanasova's ioeventfd fixes are also included. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEhpWov9P5fNqsNXdanKSrs4Grc8gFAl+ixjgACgkQnKSrs4Gr c8iZYgf+OB2eAGsdZO97fKh6VUUoRKa+BgWKuh37Cfpp3q+dLuIFMSKfU/UgprLc aowt6uTFfwudDV9KltUB2EiXIzpuf7JhMNOiDRkyEvYSj4KHRPsQmFCd35Nrjezy VvxSGafe2Z60Qnvcx+iGeMATSFX9YTcTZeHttC07v7dWn/yEK3b1hobcmjCcwWeR Ud8pjMyh5E2z/NpW8E669/byJf9iahx3LSQxSWt+9PVTPuftAB0Suu+m6svz1wvk sjVfIbtVWCp2BdGf5U6a2rEqF3+kIcFkfHp+MwgE0EdMz1wfjudaPl13a0C4DSun PSt9E+Ct5BTrDUvqCHvQDOaFiMZTPg== =Poyb -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/stefanha-gitlab/tags/block-pull-request' into staging Pull request for 5.2 NVMe fixes to solve IOMMU issues on non-x86 and error message/tracing improvements. Elena Afanasova's ioeventfd fixes are also included. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> # gpg: Signature made Wed 04 Nov 2020 15:18:16 GMT # gpg: using RSA key 8695A8BFD3F97CDAAC35775A9CA4ABB381AB73C8 # gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>" [full] # gpg: aka "Stefan Hajnoczi <stefanha@gmail.com>" [full] # Primary key fingerprint: 8695 A8BF D3F9 7CDA AC35 775A 9CA4 ABB3 81AB 73C8 * remotes/stefanha-gitlab/tags/block-pull-request: (33 commits) util/vfio-helpers: Assert offset is aligned to page size util/vfio-helpers: Convert vfio_dump_mapping to trace events util/vfio-helpers: Improve DMA trace events util/vfio-helpers: Trace where BARs are mapped util/vfio-helpers: Trace PCI BAR region info util/vfio-helpers: Trace PCI I/O config accesses util/vfio-helpers: Improve reporting unsupported IOMMU type block/nvme: Fix nvme_submit_command() on big-endian host block/nvme: Fix use of write-only doorbells page on Aarch64 arch block/nvme: Align iov's va and size on host page size block/nvme: Change size and alignment of prp_list_pages block/nvme: Change size and alignment of queue block/nvme: Change size and alignment of IDENTIFY response buffer block/nvme: Correct minimum device page size block/nvme: Set request_alignment at initialization block/nvme: Simplify nvme_cmd_sync() block/nvme: Simplify ADMIN queue access block/nvme: Correctly initialize Admin Queue Attributes block/nvme: Use definitions instead of magic values in add_io_queue() block/nvme: Introduce Completion Queue definitions ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2020-11-23 13:03:13 +00:00
Peter Maydell	c8e5c4b246	Patches for 5.2.0-rc2: - quorum: Fix crash with rewrite-corrupted and without read-write user - io_uring: do not use pointer after free - file-posix: Use fallback path for -EBUSY from FALLOC_FL_PUNCH_HOLE - iotests: Fix failure on Python 3.9 due to use of a deprecated function - char-stdio: Fix QMP default for 'signal' -----BEGIN PGP SIGNATURE----- iQJFBAABCAAvFiEE3D3rFZqa+V09dFb+fwmycsiPL9YFAl+zt1URHGt3b2xmQHJl ZGhhdC5jb20ACgkQfwmycsiPL9ZxyxAAu8GIOeAb7atQvc+KpeBTUG4A+tfAXkC+ iUYdIpFeWWgmGf7myu3nlaAkeTDk6qHalmzkGRHi3yhX4eNIh5Sdff1YwPcZwf+q GLIqFFTW0z1Bd36N8G7Mkf04nKX4QTHqp6THHtSt9jNs56h5OP3axPXVA/3v9y8B 4ZAkOOvwnwO+U94crhy5y5pX/Vwafv/Dz4DH9hEupE+EI9AuzjZLBrS+sgkxjhmu gvHpDSqm6NXwWQA5a24J6NzCy3n/Fw/rqmnoOrN8eRz+4DSCMVDnTDDEMFLa/UoK Ci7AqWfG/MnQ4GrGsOx80KJhAFLTmI60vfnUizKtEjL/HJyK5PDyM+VxHz+P/Tkq 4hqQsHEsll4mAQiKCrrKOOXhn+YC4DhY/5O1EzEfhqfUjI+BFE9iC7LuqQevwKPL gytup7eoZjIHMtnKwY1B2ApAqHtodswjHkefcjEcvSlhqGi/BvwuWmeYlFXmA3r0 YO8fvbYJrwHwJy7CzMb5Rgs2461QGERmXoCsBxLAiqXU9rhpOZ6gKXIjjlYojZ8M W0kqbaccTRPuhooFdEQ9RTPSkX7AX2bI0nOoPxfz3YD/siw35YwnUkJqvQbckvJd vpPkCL5jt3d9sfO0z1xjSH2ey9bevSReYpCsk+kIZl7V2XoDAW0Nbi0Td3pW4j6x dEkg/+sjF+o= =0pFF -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging Patches for 5.2.0-rc2: - quorum: Fix crash with rewrite-corrupted and without read-write user - io_uring: do not use pointer after free - file-posix: Use fallback path for -EBUSY from FALLOC_FL_PUNCH_HOLE - iotests: Fix failure on Python 3.9 due to use of a deprecated function - char-stdio: Fix QMP default for 'signal' # gpg: Signature made Tue 17 Nov 2020 11:43:17 GMT # gpg: using RSA key DC3DEB159A9AF95D3D7456FE7F09B272C88F2FD6 # gpg: issuer "kwolf@redhat.com" # gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" [full] # Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6 * remotes/kevin/tags/for-upstream: iotests/081: Test rewrite-corrupted without WRITE iotests/081: Filter image format after testdir quorum: Require WRITE perm with rewrite-corrupted io_uring: do not use pointer after free file-posix: allow -EBUSY errors during write zeros on raw block devices iotests: Replace deprecated ConfigParser.readfp() char-stdio: Fix QMP default for 'signal' Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2020-11-17 15:58:51 +00:00
Peter Maydell	1c7ab0930a	pc,vhost: fixes Fixes all over the place. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> -----BEGIN PGP SIGNATURE----- iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAl+zlRgPHG1zdEByZWRo YXQuY29tAAoJECgfDbjSjVRpmf8H/0BEjxnINJCN12Te+Mot8K9fjwc0zE0SUuYY 25LogfJMCfVy0SZk0ZQV9z33GEL5XyMlXQjEpLmlX4d3mOBLcbutI6UVLhu8+Ixj 89+jFphxIQPDOpA7BnPOD4AJ6TlhbewZ41QBR/J/qv946HayFW9QCAUywuj6H80m T3lw0FmPkd6/YupUdUm0pPgJjowckGis+cAa9UkTlqp8jpzFur28N02fE0L6QO3Z lR6zsk4yEvsVoeXSkEkmSqZGNcwoQCf4BhmDuD7lBLZ0LBvmd37CCoakStpdnQPH Swunmf7Q1H6LRtF7s8ZKXBB/ecVnss3kFTFj5KWx3fJH2SJuHG8= =v205 -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging pc,vhost: fixes Fixes all over the place. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> # gpg: Signature made Tue 17 Nov 2020 09:17:12 GMT # gpg: using RSA key 5D09FD0871C8F85B94CA8A0D281F0DB8D28D5469 # gpg: issuer "mst@redhat.com" # gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" [full] # gpg: aka "Michael S. Tsirkin <mst@redhat.com>" [full] # Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67 # Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469 * remotes/mst/tags/for_upstream: vhost-user-blk/scsi: Fix broken error handling for socket call contrib/libvhost-user: Fix bad printf format specifiers hw/i386/acpi-build: Fix maybe-uninitialized error when ACPI hotplug off configure: mark vhost-user Linux-only vhost-user-blk-server: depend on CONFIG_VHOST_USER meson: move vhost_user_blk_server to meson.build vhost-user: fix VHOST_USER_ADD/REM_MEM_REG truncation Signed-off-by: Peter Maydell <peter.maydell@linaro.org> # Conflicts: # meson.build	2020-11-17 11:50:11 +00:00
Max Reitz	9ca5b0e842	quorum: Require WRITE perm with rewrite-corrupted Using rewrite-corrupted means quorum may issue writes to its children just from receiving read requests from its parents. Thus, it must take the WRITE permission when rewrite-corrupted is used. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20201113211718.261671-2-mreitz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-11-17 12:38:28 +01:00
Paolo Bonzini	bd89f93603	io_uring: do not use pointer after free Even though only the pointer value is only printed, it is untidy and Coverity complains. Cc: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20201113154102.1460459-1-pbonzini@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-11-17 12:26:48 +01:00
Maxim Levitsky	ece4fa9152	file-posix: allow -EBUSY errors during write zeros on raw block devices On Linux, fallocate(fd, FALLOC_FL_PUNCH_HOLE) when it is used on a block device, without O_DIRECT can return -EBUSY if it races with another write to the same page. Since this is rare and discard is not a critical operation, ignore this error Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20201111153913.41840-2-mlevitsk@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-11-17 12:26:48 +01:00
Chetan Pant	61f3c91a67	nomaintainer: Fix Lesser GPL version number There is no "version 2" of the "Lesser" General Public License. It is either "GPL version 2.0" or "Lesser GPL version 2.1". This patch replaces all occurrences of "Lesser GPL version 2" with "Lesser GPL version 2.1" in comment section. This patch contains all the files, whose maintainer I could not get from ‘get_maintainer.pl’ script. Signed-off-by: Chetan Pant <chetan4windows@gmail.com> Message-Id: <20201023124424.20177-1-chetan4windows@gmail.com> Reviewed-by: Thomas Huth <thuth@redhat.com> [thuth: Adapted exec.c and qdev-monitor.c to new location] Signed-off-by: Thomas Huth <thuth@redhat.com>	2020-11-15 17:04:40 +01:00
Stefan Hajnoczi	e5e856c1eb	meson: move vhost_user_blk_server to meson.build The --enable/disable-vhost-user-blk-server options were implemented in ./configure. There has been confusion about them and part of the problem is that the shell syntax used for setting the default value is not easy to read. Move the option over to meson where the conditions are easier to understand: have_vhost_user_blk_server = (targetos == 'linux') if get_option('vhost_user_blk_server').enabled() if targetos != 'linux' error('vhost_user_blk_server requires linux') endif elif get_option('vhost_user_blk_server').disabled() or not have_system have_vhost_user_blk_server = false endif This patch does not change behavior. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20201110171121.1265142-2-stefanha@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>	2020-11-12 09:19:40 -05:00
shiliyang	5f14f31d2b	block: Fix some code style problems, "foo* bar" should be "foo bar" There have some code style problems be found when read the block driver code. So I fixes some problems of this error, ERROR: "foo bar" should be "foo *bar". Signed-off-by: Liyang Shi <shiliyang@huawei.com> Reported-by: Euler Robot <euler.robot@huawei.com> Message-Id: <3211f389-6d22-46c1-4a16-e6a2ba66f070@huawei.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-11-09 18:42:47 +01:00
Yonggang Luo	c63b0201ae	block: Fixes nfs compiling error on msys2/mingw These compiling errors are fixed: ../block/nfs.c:27:10: fatal error: poll.h: No such file or directory 27 \| #include <poll.h> \| ^~~~~~~~ compilation terminated. ../block/nfs.c:63:5: error: unknown type name 'blkcnt_t' 63 \| blkcnt_t st_blocks; \| ^~~~~~~~ ../block/nfs.c: In function 'nfs_client_open': ../block/nfs.c:550:27: error: 'struct _stat64' has no member named 'st_blocks' 550 \| client->st_blocks = st.st_blocks; \| ^ ../block/nfs.c: In function 'nfs_get_allocated_file_size': ../block/nfs.c:751:41: error: 'struct _stat64' has no member named 'st_blocks' 751 \| return (task.ret < 0 ? task.ret : st.st_blocks * 512); \| ^ ../block/nfs.c: In function 'nfs_reopen_prepare': ../block/nfs.c:805:31: error: 'struct _stat64' has no member named 'st_blocks' 805 \| client->st_blocks = st.st_blocks; \| ^ ../block/nfs.c: In function 'nfs_get_allocated_file_size': ../block/nfs.c:752:1: error: control reaches end of non-void function [-Werror=return-type] 752 \| } \| ^ On msys2/mingw, there is no st_blocks in struct _stat64 yet, we disable the usage of it on msys2/mingw, and create a typedef long long blkcnt_t; for further implementation Signed-off-by: Yonggang Luo <luoyonggang@gmail.com> Message-Id: <20201105123116.674-2-luoyonggang@gmail.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-11-09 15:44:21 +01:00
Alberto Garcia	3441ad4bc4	qcow2: Document and enforce the QCowL2Meta invariants The QCowL2Meta structure is used to store information about a part of a write request that touches clusters that need changes in their L2 entries. This happens with newly-allocated clusters or subclusters. This structure has changed a bit since it was first created and its current documentation is not quite up-to-date. A write request can span a region consisting of a combination of clusters of different types, and qcow2_alloc_host_offset() can repeatedly call handle_copied() and handle_alloc() to add more clusters to the mix as long as they all are contiguous on the image file. Because of this a write request has a list of QCowL2Meta structures, one for each part of the request that needs changes in the L2 metadata. Each one of them spans nb_clusters and has two copy-on-write regions located immediately before and after the middle region touched by that part of the write request. Even when those regions themselves are empty their offsets must be correct because they are used to know the location of the middle region. This was not always the case but it is not a problem anymore because the only two places where QCowL2Meta structures are created (calculate_l2_meta() and qcow2_co_truncate()) ensure that the copy-on-write regions are correctly defined, and so do assertions like the ones in perform_cow(). The conditional initialization of the 'written_to' variable is therefore unnecessary and is removed by this patch. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20201007161323.4667-1-berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-11-09 15:44:21 +01:00
AlexChen	3d86af858e	block: Remove unused include The "qemu-common.h" include is not used, remove it. Reported-by: Euler Robot <euler.robot@huawei.com> Signed-off-by: AlexChen <alex.chen@huawei.com> Message-Id: <5F8FFB94.3030209@huawei.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-11-09 15:44:21 +01:00
Peter Maydell	85c3ed4417	pc,pci,vhost,virtio: fixes Lots of fixes all over the place. virtio-mem and virtio-iommu patches are kind of fixes but it seems better to just make them behave sanely than try to educate users about the limitations ... Signed-off-by: Michael S. Tsirkin <mst@redhat.com> -----BEGIN PGP SIGNATURE----- iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAl+i9YMPHG1zdEByZWRo YXQuY29tAAoJECgfDbjSjVRpySQH/Ru/sxB9PncR1HsqSf0HC0tt/EMKgyZTXEwQ FITcjkCvBDS98a1VUvvZbjzTEDEZNnoUv94MjdLeBoptJ7GtK6nPoI6Ke0p1Zqbe mlY2BCb0FpN8FE+mthjAI03mhw6o8Qo/OPtyISQzUxCVVqUHL5TRAVAQdeidoK8n RBQ4WogwM/h7wI0d9GGgSxAON8IRQnBYImtzJieBb6zeScwKVFTWI1tqBdOyFN0/ AhzQiNZuhZ7a1XGJIsxmWB1NK2kcXNJuOF0ANh4coIHR0JzmH3xRy+Jnf5e3dYsw LI23DUZPSTJJXAwKPucyTG7RTX8F55N9DVHC9KDRD6Ntq1oreJ4= =pcbN -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging pc,pci,vhost,virtio: fixes Lots of fixes all over the place. virtio-mem and virtio-iommu patches are kind of fixes but it seems better to just make them behave sanely than try to educate users about the limitations ... Signed-off-by: Michael S. Tsirkin <mst@redhat.com> # gpg: Signature made Wed 04 Nov 2020 18:40:03 GMT # gpg: using RSA key 5D09FD0871C8F85B94CA8A0D281F0DB8D28D5469 # gpg: issuer "mst@redhat.com" # gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" [full] # gpg: aka "Michael S. Tsirkin <mst@redhat.com>" [full] # Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67 # Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469 * remotes/mst/tags/for_upstream: (31 commits) contrib/vhost-user-blk: fix get_config() information leak block/export: fix vhost-user-blk get_config() information leak block/export: make vhost-user-blk config space little-endian configure: introduce --enable-vhost-user-blk-server libvhost-user: follow QEMU comment style vhost-blk: set features before setting inflight feature Revert "vhost-blk: set features before setting inflight feature" net: Add vhost-vdpa in show_netdevs() vhost-vdpa: Add qemu_close in vhost_vdpa_cleanup vfio: Don't issue full 2^64 unmap virtio-iommu: Set supported page size mask vfio: Set IOMMU page size as per host supported page size memory: Add interface to set iommu page size mask virtio-iommu: Add notify_flag_changed() memory region callback virtio-iommu: Add replay() memory region callback virtio-iommu: Call memory notifiers in attach/detach virtio-iommu: Add memory notifiers for map/unmap virtio-iommu: Store memory region in endpoint struct virtio-iommu: Fix virtio_iommu_mr() hw/smbios: Fix leaked fd in save_opt_one() error path ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2020-11-05 15:16:43 +00:00
Stefan Hajnoczi	f8ffcb2bda	block/export: fix vhost-user-blk get_config() information leak Refuse get_config() requests in excess of sizeof(struct virtio_blk_config). Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20201027173528.213464-5-stefanha@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2020-11-03 16:39:05 -05:00
Stefan Hajnoczi	11f60f7eae	block/export: make vhost-user-blk config space little-endian VIRTIO 1.0 devices have little-endian configuration space. The vhost-user-blk-server.c code already uses little-endian for virtqueue processing but not for the configuration space fields. Fix this so the vhost-user-blk export works on big-endian hosts. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20201027173528.213464-4-stefanha@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2020-11-03 16:39:05 -05:00
Stefan Hajnoczi	bc15e44cb2	configure: introduce --enable-vhost-user-blk-server Make it possible to compile out the vhost-user-blk server. It is enabled by default on Linux. Note that vhost-user-server.c depends on libvhost-user, which requires CONFIG_LINUX. The CONFIG_VHOST_USER dependency was erroneous since that option controls vhost-user frontends (previously known as "master") and not device backends (previously known as "slave"). Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20201027173528.213464-3-stefanha@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2020-11-03 16:39:05 -05:00
Philippe Mathieu-Daudé	a0546a7b6f	block/nvme: Fix nvme_submit_command() on big-endian host The Completion Queue Command Identifier is a 16-bit value, so nvme_submit_command() is unlikely to work on big-endian hosts, as the relevant bits are truncated. Fix by using the correct byte-swap function. Fixes: `bdd6a90a9e` ("block: Add VFIO based NVMe driver") Reported-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20201029093306.1063879-25-philmd@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com>	2020-11-03 19:06:22 +00:00
Philippe Mathieu-Daudé	4b19e9b815	block/nvme: Fix use of write-only doorbells page on Aarch64 arch qemu_vfio_pci_map_bar() calls mmap(), and mmap(2) states: 'offset' must be a multiple of the page size as returned by sysconf(_SC_PAGE_SIZE). In commit `f68453237b` we started to use an offset of 4K which broke this contract on Aarch64 arch. Fix by mapping at offset 0, and and accessing doorbells at offset=4K. Fixes: `f68453237b` ("block/nvme: Map doorbells pages write-only") Reported-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20201029093306.1063879-24-philmd@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com>	2020-11-03 19:06:22 +00:00
Eric Auger	9e13d59884	block/nvme: Align iov's va and size on host page size Make sure iov's va and size are properly aligned on the host page size. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20201029093306.1063879-23-philmd@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com>	2020-11-03 19:06:22 +00:00
Eric Auger	f8fd3ebac3	block/nvme: Change size and alignment of prp_list_pages In preparation of 64kB host page support, let's change the size and alignment of the prp_list_pages so that the VFIO DMA MAP succeeds with 64kB host page size. We align on the host page size. Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20201029093306.1063879-22-philmd@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com>	2020-11-03 19:06:22 +00:00
Eric Auger	2387aaced7	block/nvme: Change size and alignment of queue In preparation of 64kB host page support, let's change the size and alignment of the queue so that the VFIO DMA MAP succeeds. We align on the host page size. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20201029093306.1063879-21-philmd@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com>	2020-11-03 19:06:22 +00:00
Eric Auger	0aecd06049	block/nvme: Change size and alignment of IDENTIFY response buffer In preparation of 64kB host page support, let's change the size and alignment of the IDENTIFY command response buffer so that the VFIO DMA MAP succeeds. We align on the host page size. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20201029093306.1063879-20-philmd@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com>	2020-11-03 19:06:22 +00:00
Philippe Mathieu-Daudé	a652a3ec69	block/nvme: Correct minimum device page size While trying to simplify the code using a macro, we forgot the 12-bit shift... Correct that. Fixes: `fad1eb6886` ("block/nvme: Use register definitions from 'block/nvme.h'") Reported-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20201029093306.1063879-19-philmd@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com>	2020-11-03 19:06:22 +00:00
Philippe Mathieu-Daudé	c8228ac355	block/nvme: Set request_alignment at initialization Commit `bdd6a90a9e` ("block: Add VFIO based NVMe driver") sets the request_alignment in nvme_refresh_limits(). For consistency, also set it during initialization. Reported-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20201029093306.1063879-18-philmd@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com>	2020-11-03 19:06:21 +00:00
Philippe Mathieu-Daudé	08d5406798	block/nvme: Simplify nvme_cmd_sync() As all commands use the ADMIN queue, it is pointless to pass it as argument each time. Remove the argument, and rename the function as nvme_admin_cmd_sync() to make this new behavior clearer. Reviewed-by: Eric Auger <eric.auger@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20201029093306.1063879-17-philmd@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com>	2020-11-03 19:06:21 +00:00
Philippe Mathieu-Daudé	52b75ea8ec	block/nvme: Simplify ADMIN queue access We don't need to dereference from BDRVNVMeState each time. Use a NVMeQueuePair pointer on the admin queue. The nvme_init() becomes easier to review, matching the style of nvme_add_io_queue(). Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20201029093306.1063879-16-philmd@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com>	2020-11-03 19:06:21 +00:00
Philippe Mathieu-Daudé	3c363c073e	block/nvme: Correctly initialize Admin Queue Attributes From the specification chapter 3.1.8 "AQA - Admin Queue Attributes" the Admin Submission Queue Size field is a 0’s based value: Admin Submission Queue Size (ASQS): Defines the size of the Admin Submission Queue in entries. Enabling a controller while this field is cleared to 00h produces undefined results. The minimum size of the Admin Submission Queue is two entries. The maximum size of the Admin Submission Queue is 4096 entries. This is a 0’s based value. This bug has never been hit because the device initialization uses a single command synchronously :) Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20201029093306.1063879-15-philmd@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com>	2020-11-03 19:06:21 +00:00
Philippe Mathieu-Daudé	76a24781cc	block/nvme: Use definitions instead of magic values in add_io_queue() Replace magic values by definitions, and simplifiy since the number of queues will never reach 64K. Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20201029093306.1063879-14-philmd@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com>	2020-11-03 19:06:21 +00:00
Philippe Mathieu-Daudé	dfa9c6c656	block/nvme: Make nvme_init_queue() return boolean indicating error Just for consistency, following the example documented since commit `e3fe3988d7` ("error: Document Error API usage rules"), return a boolean value indicating an error is set or not. Directly pass errp as the local_err is not requested in our case. This simplifies a bit nvme_create_queue_pair(). Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20201029093306.1063879-12-philmd@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com>	2020-11-03 19:06:21 +00:00
Philippe Mathieu-Daudé	7a5f00dde3	block/nvme: Make nvme_identify() return boolean indicating error Just for consistency, following the example documented since commit `e3fe3988d7` ("error: Document Error API usage rules"), return a boolean value indicating an error is set or not. Directly pass errp as the local_err is not requested in our case. Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20201029093306.1063879-11-philmd@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com>	2020-11-03 19:06:21 +00:00
Philippe Mathieu-Daudé	1b539bd6db	block/nvme: Use unsigned integer for queue counter/size We can not have negative queue count/size/index, use unsigned type. Rename 'nr_queues' as 'queue_count' to match the spec naming. Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20201029093306.1063879-10-philmd@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com>	2020-11-03 19:06:21 +00:00
Philippe Mathieu-Daudé	3214b0f094	block/nvme: Move definitions before structure declarations To be able to use some definitions in structure declarations, move them earlier. No logical change. Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20201029093306.1063879-9-philmd@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com>	2020-11-03 19:06:21 +00:00
Philippe Mathieu-Daudé	6e1e9ff2d3	block/nvme: Trace queue pair creation/deletion Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20201029093306.1063879-8-philmd@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com>	2020-11-03 19:06:20 +00:00
Philippe Mathieu-Daudé	51e98b6d21	block/nvme: Improve nvme_free_req_queue_wait() trace information What we want to trace is the block driver state and the queue index. Suggested-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20201029093306.1063879-7-philmd@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com>	2020-11-03 19:06:20 +00:00
Philippe Mathieu-Daudé	1c914cd120	block/nvme: Trace nvme_poll_queue() per queue As we want to enable multiple queues, report the event in each nvme_poll_queue() call, rather than once in the callback calling nvme_poll_queues(). Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20201029093306.1063879-6-philmd@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com>	2020-11-03 19:06:20 +00:00
Philippe Mathieu-Daudé	15b2260bef	block/nvme: Trace controller capabilities Controllers have different capabilities and report them in the CAP register. We are particularly interested by the page size limits. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20201029093306.1063879-5-philmd@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com>	2020-11-03 19:06:20 +00:00
Philippe Mathieu-Daudé	58ad6ae0cb	block/nvme: Report warning with warn_report() Instead of displaying warning on stderr, use warn_report() which also displays it on the monitor. Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20201029093306.1063879-4-philmd@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com>	2020-11-03 19:06:20 +00:00
Philippe Mathieu-Daudé	8526e39e99	block/nvme: Use hex format to display offset in trace events Use the same format used for the hw/vfio/ trace events. Suggested-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20201029093306.1063879-3-philmd@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com>	2020-11-03 19:06:20 +00:00
AlexChen	c9eb2f3e38	block/vvfat: Fix bad printf format specifiers We should use printf format specifier "%u" instead of "%d" for argument of type "unsigned int". In addition, fix two error format problems found by checkpatch.pl: ERROR: space required after that ',' (ctx:VxV) + fprintf(stderr,"%s attributes=0x%02x begin=%u size=%d\n", ^ ERROR: line over 90 characters + fprintf(stderr, "%d, %s (%u, %d)\n", i, commit->path ? commit->path : "(null)", commit->param.rename.cluster, commit->action); Reported-by: Euler Robot <euler.robot@huawei.com> Signed-off-by: Alex Chen <alex.chen@huawei.com> Message-Id: <5FA12620.6030705@huawei.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-11-03 16:24:56 +01:00
Eric Blake	dbc7b01492	nbd: Add 'qemu-nbd -A' to expose allocation depth Allow the server to expose an additional metacontext to be requested by savvy clients. qemu-nbd adds a new option -A to expose the qemu:allocation-depth metacontext through NBD_CMD_BLOCK_STATUS; this can also be set via QMP when using block-export-add. qemu as client is hacked into viewing the key aspects of this new context by abusing the already-experimental x-dirty-bitmap option to collapse all depths greater than 2, which results in a tri-state value visible in the output of 'qemu-img map --output=json' (yes, that means x-dirty-bitmap is now a bit of a misnomer, but I didn't feel like renaming it as it would introduce a needless break of back-compat, even though we make no compat guarantees with x- members): unallocated (depth 0) => "zero":false, "data":true local (depth 1) => "zero":false, "data":false backing (depth 2+) => "zero":true, "data":true libnbd as client is probably a nicer way to get at the information without having to decipher such hacks in qemu as client. ;) Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20201027050556.269064-11-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>	2020-10-30 15:22:00 -05:00
Eric Blake	a92b1b065e	block: Return depth level during bdrv_is_allocated_above When checking for allocation across a chain, it's already easy to count the depth within the chain at which the allocation is found. Instead of throwing that information away, return it to the caller. Existing callers only cared about allocated/non-allocated, but having a depth available will be used by NBD in the next patch. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20201027050556.269064-9-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> [eblake: rebase to master] Signed-off-by: Eric Blake <eblake@redhat.com>	2020-10-30 15:21:23 -05:00
Greg Kurz	1a6d3bd229	block: End quiescent sections when a BDS is deleted If a BDS gets deleted during blk_drain_all(), it might miss a call to bdrv_do_drained_end(). This means missing a call to aio_enable_external() and the AIO context remains disabled for ever. This can cause a device to become irresponsive and to disrupt the guest execution, ie. hang, loop forever or worse. This scenario is quite easy to encounter with virtio-scsi on POWER when punching multiple blockdev-create QMP commands while the guest is booting and it is still running the SLOF firmware. This happens because SLOF disables/re-enables PCI devices multiple times via IO/MEM/MASTER bits of PCI_COMMAND register after the initial probe/feature negotiation, as it tends to work with a single device at a time at various stages like probing and running block/network bootloaders without doing a full reset in-between. This naturally generates many dataplane stops and starts, and thus many drain sections that can race with blockdev_create_run(). In the end, SLOF bails out. It is somehow reproducible on x86 but it requires to generate articial dataplane start/stop activity with stop/cont QMP commands. In this case, seabios ends up looping for ever, waiting for the virtio-scsi device to send a response to a command it never received. Add a helper that pairs all previously called bdrv_do_drained_begin() with a bdrv_do_drained_end() and call it from bdrv_close(). While at it, update the "/bdrv-drain/graph-change/drain_all" test in test-bdrv-drain so that it can catch the issue. BugId: https://bugzilla.redhat.com/show_bug.cgi?id=1874441 Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <160346526998.272601.9045392804399803158.stgit@bahia.lan> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-10-27 15:26:20 +01:00
Alberto Garcia	46cd1e8a47	qcow2: Skip copy-on-write when allocating a zero cluster Since commit `c8bb23cbdb` when a write request results in a new allocation QEMU first tries to see if the rest of the cluster outside the written area contains only zeroes. In that case, instead of doing a normal copy-on-write operation and writing explicit zero buffers to disk, the code zeroes the whole cluster efficiently using pwrite_zeroes() with BDRV_REQ_NO_FALLBACK. This improves performance very significantly but it only happens when we are writing to an area that was completely unallocated before. Zero clusters (QCOW2_CLUSTER_ZERO_*) are treated like normal clusters and are therefore slower to allocate. This happens because the code uses bdrv_is_allocated_above() rather bdrv_block_status_above(). The former is not as accurate for this purpose but it is faster. However in the case of qcow2 the underlying call does already report zero clusters just fine so there is no reason why we cannot use that information. After testing 4KB writes on an image that only contains zero clusters this patch results in almost five times more IOPS. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <6d77cab968c501c44d6e1089b9bc91b04170b49e.1603731354.git.berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-10-27 15:26:20 +01:00
Alberto Garcia	d40f4a565a	qcow2: Report BDRV_BLOCK_ZERO more accurately in bdrv_co_block_status() If a BlockDriverState supports backing files but has none then any unallocated area reads back as zeroes. bdrv_co_block_status() is only reporting this is if want_zero is true, but this is an inexpensive test and there is no reason not to do it in all cases. Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <66fa0914a0e2b727ab6d1b63ca773d7cd29a9a9e.1603731354.git.berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-10-27 15:26:20 +01:00
Peter Maydell	a95e0396c8	* fix --disable-tcg builds (Claudio) * Fixes for macOS --enable-modules build and OpenBSD curses/iconv detection (myself) * Start preparing for meson 0.56 (myself) * Move directory configuration to meson (myself) * Start untangling qemu_init (myself) * Windows fixes (Sunil) * Remove -no-kbm (Thomas) -----BEGIN PGP SIGNATURE----- iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAl+WrxEUHHBib256aW5p QHJlZGhhdC5jb20ACgkQv/vSX3jHroNQAggAqfucqEQvz6s+DCPv2u572diyMvhe Y7vmaQF0qYKoAvy5OLqGlqXVsn8lwf19zJWo9Z7k4qNefWl84ii0J/kEmnolzTGq 7Z0CRSnGbNQy9YedYXuymaR3E0VY+6lsPnzIpufQISzQRdjzT8OQ51DMAhc04oQl saXsts7y+om+tzvW2JFGtNsfFRUjcRKqjIAVfwneBXFW9TRD2epvYxz/S0o+XJwF eSiINvTqDxxPyy6XJykC46xf/TTfReHv6fQgTn7Jw3TQuo4m7qXLi5Vj8W1erZJv t3xhZNabt813T6ztNcAAuJ0srIn55Ac7Fuq3/1ecgeVD08ntmabe4WhKRg== =931x -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/bonzini-gitlab/tags/for-upstream' into staging * fix --disable-tcg builds (Claudio) * Fixes for macOS --enable-modules build and OpenBSD curses/iconv detection (myself) * Start preparing for meson 0.56 (myself) * Move directory configuration to meson (myself) * Start untangling qemu_init (myself) * Windows fixes (Sunil) * Remove -no-kbm (Thomas) # gpg: Signature made Mon 26 Oct 2020 11:12:17 GMT # gpg: using RSA key F13338574B662389866C7682BFFBD25F78C7AE83 # gpg: issuer "pbonzini@redhat.com" # gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [full] # gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" [full] # Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1 # Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83 * remotes/bonzini-gitlab/tags/for-upstream: machine: move SMP initialization from vl.c machine: move UP defaults to class_base_init machine: remove deprecated -machine enforce-config-section option win32: boot broken when bind & data dir are the same WHPX: Fix WHPX build break configure: move install_blobs from configure to meson configure: remove unused variable from config-host.mak configure: move directory options from config-host.mak to meson configure: allow configuring localedir Makefile: separate meson rerun from the rest of the ninja invocation Remove deprecated -no-kvm option replay: do not build if TCG is not available qtest: unbreak non-TCG builds in bios-tables-test hw/core/qdev-clock: add a reference on aliased clocks do not use colons in test names meson: rewrite curses/iconv test build: fix macOS --enable-modules build Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2020-10-26 15:49:11 +00:00
Vladimir Sementsov-Ogievskiy	7e7e510077	block/io: fix bdrv_is_allocated_above bdrv_is_allocated_above wrongly handles short backing files: it reports after-EOF space as UNALLOCATED which is wrong, as on read the data is generated on the level of short backing file (if all overlays have unallocated areas at that place). Reusing bdrv_common_block_status_above fixes the issue and unifies code path. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-id: 20200924194003.22080-5-vsementsov@virtuozzo.com [Fix s/has/have/ as suggested by Eric Blake. Fix s/area/areas/. --Stefan] Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-10-23 13:42:16 +01:00
Vladimir Sementsov-Ogievskiy	624f27bbe9	block/io: bdrv_common_block_status_above: support bs == base We are going to reuse bdrv_common_block_status_above in bdrv_is_allocated_above. bdrv_is_allocated_above may be called with include_base == false and still bs == base (for ex. from img_rebase()). So, support this corner case. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-id: 20200924194003.22080-4-vsementsov@virtuozzo.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-10-23 13:42:16 +01:00
Vladimir Sementsov-Ogievskiy	3555a43261	block/io: bdrv_common_block_status_above: support include_base In order to reuse bdrv_common_block_status_above in bdrv_is_allocated_above, let's support include_base parameter. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 20200924194003.22080-3-vsementsov@virtuozzo.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-10-23 13:42:16 +01:00
Vladimir Sementsov-Ogievskiy	67c095c8b8	block/io: fix bdrv_co_block_status_above bdrv_co_block_status_above has several design problems with handling short backing files: 1. With want_zeros=true, it may return ret with BDRV_BLOCK_ZERO but without BDRV_BLOCK_ALLOCATED flag, when actually short backing file which produces these after-EOF zeros is inside requested backing sequence. 2. With want_zero=false, it may return pnum=0 prior to actual EOF, because of EOF of short backing file. Fix these things, making logic about short backing files clearer. With fixed bdrv_block_status_above we also have to improve is_zero in qcow2 code, otherwise iotest 154 will fail, because with this patch we stop to merge zeros of different types (produced by fully unallocated in the whole backing chain regions vs produced by short backing files). Note also, that this patch leaves for another day the general problem around block-status: misuse of BDRV_BLOCK_ALLOCATED as is-fs-allocated vs go-to-backing. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 20200924194003.22080-2-vsementsov@virtuozzo.com [Fix s/comes/come/ as suggested by Eric Blake --Stefan] Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-10-23 13:42:16 +01:00
Stefan Hajnoczi	d9b495f9c6	block/export: add vhost-user-blk multi-queue support Allow the number of queues to be configured using --export vhost-user-blk,num-queues=N. This setting should match the QEMU --device vhost-user-blk-pci,num-queues=N setting but QEMU vhost-user-blk.c lowers its own value if the vhost-user-blk backend offers fewer queues than QEMU. The vhost-user-blk-server.c code is already capable of multi-queue. All virtqueue processing runs in the same AioContext. No new locking is needed. Add the num-queues=N option and set the VIRTIO_BLK_F_MQ feature bit. Note that the feature bit only announces the presence of the num_queues configuration space field. It does not promise that there is more than 1 virtqueue, so we can set it unconditionally. I tested multi-queue by running a random read fio test with numjobs=4 on an -smp 4 guest. After the benchmark finished the guest /proc/interrupts file showed activity on all 4 virtio-blk MSI-X. The /sys/block/vda/mq/ directory shows that Linux blk-mq has 4 queues configured. An automated test is included in the next commit. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Acked-by: Markus Armbruster <armbru@redhat.com> Message-id: 20201001144604.559733-2-stefanha@redhat.com [Fixed accidental tab characters as suggested by Markus Armbruster --Stefan] Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-10-23 13:42:16 +01:00
Stefan Hajnoczi	f51d23c80a	block/export: add iothread and fixed-iothread options Make it possible to specify the iothread where the export will run. By default the block node can be moved to other AioContexts later and the export will follow. The fixed-iothread option forces strict behavior that prevents changing AioContext while the export is active. See the QAPI docs for details. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20200929125516.186715-5-stefanha@redhat.com [Fix stray '#' character in block-export.json and add missing "(since: 5.2)" as suggested by Eric Blake. --Stefan] Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-10-23 13:42:16 +01:00
Stefan Hajnoczi	cbc20bfb8f	block: move block exports to libblockdev Block exports are used by softmmu, qemu-storage-daemon, and qemu-nbd. They are not used by other programs and are not otherwise needed in libblock. Undo the recent move of blockdev-nbd.c from blockdev_ss into block_ss. Since bdrv_close_all() (libblock) calls blk_exp_close_all() (libblockdev) a stub function is required.. Make qemu-nbd.c use signal handling utility functions instead of duplicating the code. This helps because os-posix.c is in libblockdev and it depends on a qemu_system_killed() symbol that qemu-nbd.c lacks. Once we use the signal handling utility functions we also end up providing the necessary symbol. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 20200929125516.186715-4-stefanha@redhat.com [Fixed s/ndb/nbd/ typo in commit description as suggested by Eric Blake --Stefan] Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-10-23 13:42:16 +01:00
Stefan Hajnoczi	3a213f83d9	util/vhost-user-server: use static library in meson.build Don't compile contrib/libvhost-user/libvhost-user.c again. Instead build the static library once and then reuse it throughout QEMU. Also switch from CONFIG_LINUX to CONFIG_VHOST_USER, which is what the vhost-user tools (vhost-user-gpu, etc) do. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20200924151549.913737-14-stefanha@redhat.com [Added CONFIG_LINUX again because libvhost-user doesn't build on macOS. --Stefan] Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-10-23 13:42:16 +01:00
Stefan Hajnoczi	80a06cc52b	util/vhost-user-server: move header to include/ Headers used by other subsystems are located in include/. Also add the vhost-user-server and vhost-user-blk-server headers to MAINTAINERS. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20200924151549.913737-13-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-10-23 13:42:16 +01:00
Stefan Hajnoczi	90fc91d50b	block/export: convert vhost-user-blk server to block export API Use the new QAPI block exports API instead of defining our own QOM objects. This is a large change because the lifecycle of VuBlockDev needs to follow BlockExportDriver. QOM properties are replaced by QAPI options objects. VuBlockDev is renamed VuBlkExport and contains a BlockExport field. Several fields can be dropped since BlockExport already has equivalents. The file names and meson build integration will be adjusted in a future patch. libvhost-user should probably be built as a static library that is linked into QEMU instead of as a .c file that results in duplicate compilation. The new command-line syntax is: $ qemu-storage-daemon \ --blockdev file,node-name=drive0,filename=test.img \ --export vhost-user-blk,node-name=drive0,id=export0,unix-socket=/tmp/vhost-user-blk.sock Note that unix-socket is optional because we may wish to accept chardevs too in the future. Markus noted that supported address families are not explicit in the QAPI schema. It is unlikely that support for more address families will be added since file descriptor passing is required and few address families support it. If a new address family needs to be added, then the QAPI 'features' syntax can be used to advertize them. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Acked-by: Markus Armbruster <armbru@redhat.com> Message-id: 20200924151549.913737-12-stefanha@redhat.com [Skip test on big-endian host architectures because this device doesn't support them yet (as already mentioned in a code comment). --Stefan] Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-10-23 13:42:16 +01:00
Stefan Hajnoczi	0534b1b227	block/export: report flush errors Propagate the flush return value since errors are possible. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20200924151549.913737-11-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-10-23 13:42:16 +01:00
Stefan Hajnoczi	7185c85776	util/vhost-user-server: rework vu_client_trip() coroutine lifecycle The vu_client_trip() coroutine is leaked during AioContext switching. It is also unsafe to destroy the vu_dev in panic_cb() since its callers still access it in some cases. Rework the lifecycle to solve these safety issues. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20200924151549.913737-10-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-10-23 13:42:16 +01:00
Stefan Hajnoczi	47ba680466	util/vhost-user-server: drop unused DevicePanicNotifier The device panic notifier callback is not used. Drop it. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20200924151549.913737-7-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-10-23 13:42:16 +01:00
Stefan Hajnoczi	df6af7ce77	block/export: consolidate request structs into VuBlockReq Only one struct is needed per request. Drop req_data and the separate VuBlockReq instance. Instead let vu_queue_pop() allocate everything at once. This fixes the req_data memory leak in vu_block_virtio_process_req(). Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20200924151549.913737-6-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-10-23 13:42:16 +01:00
Coiby Xu	3578389bcf	block/export: vhost-user block device backend server By making use of libvhost-user, block device drive can be shared to the connected vhost-user client. Only one client can connect to the server one time. Since vhost-user-server needs a block drive to be created first, delay the creation of this object. Suggested-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Coiby Xu <coiby.xu@gmail.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-id: 20200918080912.321299-6-coiby.xu@gmail.com [Shorten "vhost_user_blk_server" string to "vhost_user_blk" to avoid the following compiler warning: ../block/export/vhost-user-blk-server.c:178:50: error: ‘%s’ directive output truncated writing 21 bytes into a region of size 20 [-Werror=format-truncation=] and fix "Invalid size %ld ..." ssize_t format string arguments for 32-bit hosts. --Stefan] Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-10-23 13:42:16 +01:00
Philippe Mathieu-Daudé	f25e7ab2b0	block/nvme: Add driver statistics for access alignment and hw errors Keep statistics of some hardware errors, and number of aligned/unaligned I/O accesses. QMP example booting a full RHEL 8.3 aarch64 guest: { "execute": "query-blockstats" } { "return": [ { "device": "", "node-name": "drive0", "stats": { "flush_total_time_ns": 6026948, "wr_highest_offset": 3383991230464, "wr_total_time_ns": 807450995, "failed_wr_operations": 0, "failed_rd_operations": 0, "wr_merged": 3, "wr_bytes": 50133504, "failed_unmap_operations": 0, "failed_flush_operations": 0, "account_invalid": false, "rd_total_time_ns": 1846979900, "flush_operations": 130, "wr_operations": 659, "rd_merged": 1192, "rd_bytes": 218244096, "account_failed": false, "idle_time_ns": 2678641497, "rd_operations": 7406, }, "driver-specific": { "driver": "nvme", "completion-errors": 0, "unaligned-accesses": 2959, "aligned-accesses": 4477 }, "qdev": "/machine/peripheral-anon/device[0]/virtio-backend" } ] } Suggested-by: Stefan Hajnoczi <stefanha@gmail.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Acked-by: Markus Armbruster <armbru@redhat.com> Message-id: 20201001162939.1567915-1-philmd@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-10-23 13:42:16 +01:00
Claudio Fontana	9b1c911654	replay: do not build if TCG is not available this fixes non-TCG builds broken recently by replay reverse debugging. Stub the needed functions in stub/, splitting roughly between functions needed only by system emulation, by system emulation and tools, and by everyone. This includes duplicating some code in replay/, and puts the logic for non-replay related events in the replay/ module (+ the stubs), so this should be revisited in the future. Surprisingly, only _one_ qtest was affected by this, ide-test.c, which resulted in a buzz as the bh events were never delivered, and the bh never executed. Many other subsystems _should_ have been affected. This fixes the immediate issue, however a better way to group replay functionality to TCG-only code could be developed in the long term. Signed-off-by: Claudio Fontana <cfontana@suse.de> Message-Id: <20201013192123.22632-4-cfontana@suse.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-10-22 11:53:54 -04:00
Daniel P. Berrangé	e1c4269763	block: deprecate the sheepdog block driver This thread from a little over a year ago: http://lists.wpkg.org/pipermail/sheepdog/2019-March/thread.html states that sheepdog is no longer actively developed. The only mentioned users are some companies who are said to have it for legacy reasons with plans to replace it by Ceph. There is talk about cutting out existing features to turn it into a simple demo of how to write a distributed block service. There is no evidence of anyone working on that idea: https://github.com/sheepdog/sheepdog/commits/master No real commits to git since Jan 2018, and before then just some minor technical debt cleanup. There is essentially no activity on the mailing list aside from patches to QEMU that get CC'd due to our MAINTAINERS entry. Fedora packages for sheepdog failed to build from upstream source because of the more strict linker that no longer merges duplicate global symbols. Fedora patches it to add the missing "extern" annotations and presumably other distros do to, but upstream source remains broken. There is only basic compile testing, no functional testing of the driver. Since there are no build pre-requisites the sheepdog driver is currently enabled unconditionally. This would result in configure issuing a deprecation warning by default for all users. Thus the configure default is changed to disable it, requiring users to pass --enable-sheepdog to build the driver. Reviewed-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <20201002113243.2347710-3-berrange@redhat.com> Reviewed-by: Neal Gompa <ngompa13@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-10-15 16:06:28 +02:00
Elena Afanasova	5b4c95d0a3	block/blkdebug: fix memory leak Spotted by PVS-Studio Signed-off-by: Elena Afanasova <eafanasova@gmail.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <1e903f928eb3da332cc95e2a6f87243bd9fe66e4.camel@gmail.com> Signed-off-by: Laurent Vivier <laurent@vivier.eu>	2020-10-13 13:33:46 +02:00
Christian Borntraeger	cd466702f0	vmdk: fix maybe uninitialized warnings Fedora 32 gcc 10 seems to give false positives: Compiling C object libblock.fa.p/block_vmdk.c.o ../block/vmdk.c: In function ‘vmdk_parse_extents’: ../block/vmdk.c:587:5: error: ‘extent’ may be used uninitialized in this function [-Werror=maybe-uninitialized] 587 \| g_free(extent->l1_table); \| ^~~~~~~~~~~~~~~~~~~~~~~~ ../block/vmdk.c:754:17: note: ‘extent’ was declared here 754 \| VmdkExtent extent; \| ^~~~~~ ../block/vmdk.c:620:11: error: ‘extent’ may be used uninitialized in this function [-Werror=maybe-uninitialized] 620 \| ret = vmdk_init_tables(bs, extent, errp); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ../block/vmdk.c:598:17: note: ‘extent’ was declared here 598 \| VmdkExtent extent; \| ^~~~~~ ../block/vmdk.c:1178:39: error: ‘extent’ may be used uninitialized in this function [-Werror=maybe-uninitialized] 1178 \| extent->flat_start_offset = flat_offset << 9; \| ~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~ ../block/vmdk.c: In function ‘vmdk_open_vmdk4’: ../block/vmdk.c:581:22: error: ‘extent’ may be used uninitialized in this function [-Werror=maybe-uninitialized] 581 \| extent->l2_cache = \| ~~~~~~~~~~~~~~~~~^ 582 \| g_malloc(extent->entry_size * extent->l2_size * L2_CACHE_SIZE); \| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ../block/vmdk.c:872:17: note: ‘extent’ was declared here 872 \| VmdkExtent extent; \| ^~~~~~ ../block/vmdk.c: In function ‘vmdk_open’: ../block/vmdk.c:620:11: error: ‘extent’ may be used uninitialized in this function [-Werror=maybe-uninitialized] 620 \| ret = vmdk_init_tables(bs, extent, errp); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ../block/vmdk.c:598:17: note: ‘extent’ was declared here 598 \| VmdkExtent extent; \| ^~~~~~ cc1: all warnings being treated as errors make: *** [Makefile.ninja:884: libblock.fa.p/block_vmdk.c.o] Error 1 fix them by assigning a default value. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Fam Zheng <fam@euphon.net> Message-Id: <20200930155859.303148-2-borntraeger@de.ibm.com> Signed-off-by: Laurent Vivier <laurent@vivier.eu>	2020-10-13 13:33:45 +02:00
Vladimir Sementsov-Ogievskiy	99d72dba1c	block/nbd: nbd_co_reconnect_loop(): don't connect if drained In a recent commit `12c75e20a2` we've improved nbd_co_reconnect_loop() to not make drain wait for additional sleep. Similarly, we shouldn't try to connect, if previous sleep was interrupted by drain begin, otherwise drain_begin will have to wait for the whole connection attempt. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20200903190301.367620-5-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2020-10-09 15:04:32 -05:00
Vladimir Sementsov-Ogievskiy	46f56631b5	block/nbd: fix reconnect-delay reconnect-delay has a design flaw: we handle it in the same loop where we do connection attempt. So, reconnect-delay may be exceeded by unpredictable time of connection attempt. Let's instead use separate timer. How to reproduce the bug: 1. Create an image on node1: qemu-img create -f qcow2 xx 100M 2. Start NBD server on node1: qemu-nbd xx 3. On node2 start qemu-io: ./build/qemu-io --image-opts \ driver=nbd,server.type=inet,server.host=192.168.100.5,server.port=10809,reconnect-delay=15 4. Type 'read 0 512' in qemu-io interface to check that connection works Be careful: you should make steps 5-7 in a short time, less than 15 seconds. 5. Kill nbd server on node1 6. Run 'read 0 512' in qemu-io interface again, to be sure that nbd client goes to reconnect loop. 7. On node1 run the following command sudo iptables -A INPUT -p tcp --dport 10809 -j DROP This will make the connect() call of qemu-io at node2 take a long time. And you'll see that read command in qemu-io will hang for a long time, more than 15 seconds specified by reconnect-delay parameter. It's the bug. 8. Don't forget to drop iptables rule on node1: sudo iptables -D INPUT -p tcp --dport 10809 -j DROP Important note: Step [5] is necessary to reproduce _this_ bug. If we miss step [5], the read command (step 6) will hang for a long time and this commit doesn't help, because there will be not long connect() to unreachable host, but long sendmsg() to unreachable host, which should be fixed by enabling and adjusting keep-alive on the socket, which is a thing for further patch set. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20200903190301.367620-4-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2020-10-09 15:04:32 -05:00
Vladimir Sementsov-Ogievskiy	8a509afd72	block/nbd: correctly use qio_channel_detach_aio_context when needed Don't use nbd_client_detach_aio_context() driver handler where we want to finalize the connection. We should directly use qio_channel_detach_aio_context() in such cases. Driver handler may (and will) contain another things, unrelated to the qio channel. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20200903190301.367620-3-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2020-10-09 15:04:32 -05:00
Vladimir Sementsov-Ogievskiy	8c517de24a	block/nbd: fix drain dead-lock because of nbd reconnect-delay We pause reconnect process during drained section. So, if we have some requests, waiting for reconnect we should cancel them, otherwise they deadlock the drained section. How to reproduce: 1. Create an image: qemu-img create -f qcow2 xx 100M 2. Start NBD server: qemu-nbd xx 3. Start vm with second nbd disk on node2, like this: ./build/x86_64-softmmu/qemu-system-x86_64 -nodefaults -drive \ file=/work/images/cent7.qcow2 -drive \ driver=nbd,server.type=inet,server.host=192.168.100.5,server.port=10809,reconnect-delay=60 \ -vnc :0 -m 2G -enable-kvm -vga std 4. Access the vm through vnc (or some other way?), and check that NBD drive works: dd if=/dev/sdb of=/dev/null bs=1M count=10 - the command should succeed. 5. Now, kill the nbd server, and run dd in the guest again: dd if=/dev/sdb of=/dev/null bs=1M count=10 Now Qemu is trying to reconnect, and dd-generated requests are waiting for the connection (they will wait up to 60 seconds (see reconnect-delay option above) and than fail). But suddenly, vm may totally hang in the deadlock. You may need to increase reconnect-delay period to catch the dead-lock. VM doesn't respond because drain dead-lock happens in cpu thread with global mutex taken. That's not good thing by itself and is not fixed by this commit (true way is using iothreads). Still this commit fixes drain dead-lock itself. Note: probably, we can instead continue to reconnect during drained section. To achieve this, we may move negotiation to the connect thread to make it independent of bs aio context. But expanding drained section doesn't seem good anyway. So, let's now fix the bug the simplest way. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20200903190301.367620-2-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2020-10-09 15:04:32 -05:00
Peter Maydell	f2687fdb75	* Reverse debugging (Pavel) * CFLAGS cleanup (Paolo) * ASLR fix (Mark) * cpus.c refactoring (Claudio) -----BEGIN PGP SIGNATURE----- iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAl98EB0UHHBib256aW5p QHJlZGhhdC5jb20ACgkQv/vSX3jHroOCsQf9G7EUAK1zcEOx20LtDdXFrk4tjsRp S83OGdihWe8SM+XiY9BfqsBbXdByqF+SitePOV3feGK0mOP5vtJIL7/2DLrtFTeF wOeARRA9ePVb7hcL5oXAQeE3bXrX8wq8Qtw9xAoHdw5JAEVmKIEJS6AL5Eu3M2Fh pvdBoV84pOm2/ARS3eRstRyW8gCC8rdLDlNsVDtCbYdNVq+VdkzR0l5Phc8JDx1M Qjdl1KpN6ZkuN8M6tnaQNTb9IUVu5c1tu5jdR6JdLUqAWp1wYZJ6r2jSatZWfLR3 H+gzFsDoLPfCjZ3IhfZyvzF5leSZmdbFfzI0tHS1UJ/ZZYjutDvlPlbyYA== =Jys5 -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/bonzini-gitlab/tags/for-upstream' into staging * Reverse debugging (Pavel) * CFLAGS cleanup (Paolo) * ASLR fix (Mark) * cpus.c refactoring (Claudio) # gpg: Signature made Tue 06 Oct 2020 07:35:09 BST # gpg: using RSA key F13338574B662389866C7682BFFBD25F78C7AE83 # gpg: issuer "pbonzini@redhat.com" # gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [full] # gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" [full] # Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1 # Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83 * remotes/bonzini-gitlab/tags/for-upstream: (37 commits) tests/acceptance: add reverse debugging test replay: create temporary snapshot at debugger connection replay: describe reverse debugging in docs/replay.txt gdbstub: add reverse continue support in replay mode gdbstub: add reverse step support in replay mode replay: flush rr queue before loading the vmstate replay: implement replay-seek command replay: introduce breakpoint at the specified step replay: introduce info hmp/qmp command qapi: introduce replay.json for record/replay-related stuff migration: introduce icount field for snapshots qcow2: introduce icount field for snapshots replay: provide an accessor for rr filename replay: don't record interrupt poll configure: don't enable ASLR for --enable-debug Windows builds configure: consistently pass CFLAGS/CXXFLAGS/LDFLAGS to meson configure: do not clobber environment CFLAGS/CXXFLAGS/LDFLAGS dtc: Convert Makefile bits to meson bits slirp: Convert Makefile bits to meson bits accel/tcg: use current_machine as it is always set for softmmu ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2020-10-06 15:04:10 +01:00
Pavel Dovgalyuk	b39847a505	migration: introduce icount field for snapshots Saving icount as a parameters of the snapshot allows navigation between them in the execution replay scenario. This information can be used for finding a specific snapshot for proceeding the recorded execution to the specific moment of the time. E.g., 'reverse step' action (introduced in one of the following patches) needs to load the nearest snapshot which is prior to the current moment of time. This patch also updates snapshot test which verifies qemu monitor output. Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgalyuk@ispras.ru> Acked-by: Markus Armbruster <armbru@redhat.com> Acked-by: Kevin Wolf <kwolf@redhat.com> -- v4 changes: - squashed format update with test output update v7 changes: - introduced the spaces between the fields in snapshot info output - updated the test to match new field widths Message-Id: <160174518865.12451.14327573383978752463.stgit@pasha-ThinkPad-X280> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-10-06 08:34:49 +02:00
Pavel Dovgalyuk	bbacffc5f7	qcow2: introduce icount field for snapshots This patch introduces the icount field for saving within the snapshot. It is required for navigation between the snapshots in record/replay mode. Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgalyuk@ispras.ru> Acked-by: Kevin Wolf <kwolf@redhat.com> -- v7 changes: - also fix the test which checks qcow2 snapshot extra data Message-Id: <160174518284.12451.2301137308458777398.stgit@pasha-ThinkPad-X280> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-10-06 08:34:49 +02:00
Vladimir Sementsov-Ogievskiy	b33b354f3a	block/io: refactor save/load vmstate Like for read/write in a previous commit, drop extra indirection layer, generate directly bdrv_readv_vmstate() and bdrv_writev_vmstate(). Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20200924185414.28642-8-vsementsov@virtuozzo.com>	2020-10-05 10:59:42 +01:00
Vladimir Sementsov-Ogievskiy	fae2681add	block: drop bdrv_prwv Now that we are not maintaining boilerplate code for coroutine wrappers, there is no more sense in keeping the extra indirection layer of bdrv_prwv(). Let's drop it and instead generate pure bdrv_preadv() and bdrv_pwritev(). Currently, bdrv_pwritev() and bdrv_preadv() are returning bytes on success, auto generated functions will instead return zero, as their _co_ prototype. Still, it's simple to make the conversion safe: the only external user of bdrv_pwritev() is test-bdrv-drain, and it is comfortable enough with bdrv_co_pwritev() instead. So prototypes are moved to local block/coroutines.h. Next, the only internal use is bdrv_pread() and bdrv_pwrite(), which are modified to return bytes on success. Of course, it would be great to convert bdrv_pread() and bdrv_pwrite() to return 0 on success. But this requires audit (and probably conversion) of all their users, let's leave it for another day refactoring. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20200924185414.28642-7-vsementsov@virtuozzo.com>	2020-10-05 10:59:42 +01:00
Vladimir Sementsov-Ogievskiy	9bb4b066cc	block: generate coroutine-wrapper code Use code generation implemented in previous commit to generated coroutine wrappers in block.c and block/io.c Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20200924185414.28642-6-vsementsov@virtuozzo.com>	2020-10-05 10:59:42 +01:00
Vladimir Sementsov-Ogievskiy	aaaa20b69b	scripts: add block-coroutine-wrapper.py We have a very frequent pattern of creating a coroutine from a function with several arguments: - create a structure to pack parameters - create _entry function to call original function taking parameters from struct - do different magic to handle completion: set ret to NOT_DONE or EINPROGRESS or use separate bool field - fill the struct and create coroutine from _entry function with this struct as a parameter - do coroutine enter and BDRV_POLL_WHILE loop Let's reduce code duplication by generating coroutine wrappers. This patch adds scripts/block-coroutine-wrapper.py together with some friends, which will generate functions with declared prototypes marked by the 'generated_co_wrapper' specifier. The usage of new code generation is as follows: 1. define the coroutine function somewhere int coroutine_fn bdrv_co_NAME(...) {...} 2. declare in some header file int generated_co_wrapper bdrv_NAME(...); with same list of parameters (generated_co_wrapper is defined in "include/block/block.h"). 3. Make sure the block_gen_c declaration in block/meson.build mentions the file with your marker function. Still, no function is now marked, this work is for the following commit. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200924185414.28642-5-vsementsov@virtuozzo.com> [Added encoding='utf-8' to open() calls as requested by Vladimir. Fixed typo and grammar issues pointed out by Eric Blake. Removed clang-format dependency that caused build test issues. --Stefan] Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-10-05 10:59:06 +01:00
Vladimir Sementsov-Ogievskiy	21c2283ebc	block: declare some coroutine functions in block/coroutines.h We are going to keep coroutine-wrappers code (structure-packing parameters, BDRV_POLL wrapper functions) in separate auto-generated files. So, we'll need a header with declaration of original _co_ functions, for those which are static now. As well, we'll need declarations for wrapper functions. Do these declarations now, as a preparation step. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20200924185414.28642-4-vsementsov@virtuozzo.com>	2020-10-05 09:35:52 +01:00
Vladimir Sementsov-Ogievskiy	f9e694cb32	block/io: refactor coroutine wrappers Most of our coroutine wrappers already follow this convention: We have 'coroutine_fn bdrv_co_<something>(<normal argument list>)' as the core function, and a wrapper 'bdrv_<something>(<same argument list>)' which does parameter packing and calls bdrv_run_co(). The only outsiders are the bdrv_prwv_co and bdrv_common_block_status_above wrappers. Let's refactor them to behave as the others, it simplifies further conversion of coroutine wrappers. This patch adds an indirection layer, but it will be compensated by a further commit, which will drop bdrv_co_prwv together with the is_write logic, to keep the read and write paths separate. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20200924185414.28642-3-vsementsov@virtuozzo.com>	2020-10-05 09:35:52 +01:00
Philippe Mathieu-Daudé	eefffb0244	block/nvme: Replace magic value by SCALE_MS definition Use self-explicit SCALE_MS definition instead of magic value (missed in similar commit `e4f310fe7f`). Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20200922083821.578519-7-philmd@redhat.com>	2020-10-05 09:35:52 +01:00
Philippe Mathieu-Daudé	fad1eb6886	block/nvme: Use register definitions from 'block/nvme.h' Use the NVMe register definitions from "block/nvme.h" which ease a bit reviewing the code while matching the datasheet. Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20200922083821.578519-6-philmd@redhat.com>	2020-10-05 09:35:52 +01:00
Philippe Mathieu-Daudé	9406e0d97e	block/nvme: Drop NVMeRegs structure, directly use NvmeBar NVMeRegs only contains NvmeBar. Simplify the code by using NvmeBar directly. This triggers a checkpatch.pl error: ERROR: Use of volatile is usually wrong, please add a comment #30: FILE: block/nvme.c:691: + volatile NvmeBar *regs; This is a false positive as in our case we are using I/O registers, so the 'volatile' use is justified. Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20200922083821.578519-5-philmd@redhat.com>	2020-10-05 09:35:52 +01:00
Philippe Mathieu-Daudé	37d7a45abd	block/nvme: Reduce I/O registers scope We only access the I/O register in nvme_init(). Remove the reference in BDRVNVMeState and reduce its scope. Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20200922083821.578519-4-philmd@redhat.com>	2020-10-05 09:35:52 +01:00
Philippe Mathieu-Daudé	f68453237b	block/nvme: Map doorbells pages write-only Per the datasheet sections 3.1.13/3.1.14: "The host should not read the doorbell registers." As we don't need read access, map the doorbells with write-only permission. We keep a reference to this mapped address in the BDRVNVMeState structure. Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20200922083821.578519-3-philmd@redhat.com>	2020-10-05 09:35:52 +01:00
Philippe Mathieu-Daudé	b02c01a513	util/vfio-helpers: Pass page protections to qemu_vfio_pci_map_bar() Pages are currently mapped READ/WRITE. To be able to use different protections, add a new argument to qemu_vfio_pci_map_bar(). Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20200922083821.578519-2-philmd@redhat.com>	2020-10-05 09:35:52 +01:00
Alberto Garcia	c508c73dca	qcow2: Use L1E_SIZE in qcow2_write_l1_entry() We overlooked these in `02b1ecfa10` Signed-off-by: Alberto Garcia <berto@igalia.com> Message-Id: <20200928162333.14998-1-berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-10-02 15:46:40 +02:00
Kevin Wolf	30dbc81d31	block/export: Move writable to BlockExportOptions The 'writable' option is a basic option that will probably be applicable to most if not all export types that we will implement. Move it from NBD to the generic BlockExport layer. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200924152717.287415-26-kwolf@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-10-02 15:46:40 +02:00
Kevin Wolf	8cade320c8	block/export: Add query-block-exports This adds a simple QMP command to query the list of block exports. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200924152717.287415-25-kwolf@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-10-02 15:46:40 +02:00
Kevin Wolf	331170e073	block/export: Create BlockBackend in blk_exp_add() Every export type will need a BlockBackend, so creating it centrally in blk_exp_add() instead of the .create driver callback avoids duplication. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200924152717.287415-24-kwolf@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-10-02 15:46:40 +02:00
Kevin Wolf	37a4f70cea	block/export: Move blk to BlockExport Every block export has a BlockBackend representing the disk that is exported. It should live in BlockExport therefore. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200924152717.287415-23-kwolf@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-10-02 15:46:40 +02:00
Kevin Wolf	1a9f7a804f	block/export: Add BLOCK_EXPORT_DELETED event Clients may want to know when an export has finally disappeard (block-export-del returns earlier than that in the general case), so add a QAPI event for it. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20200924152717.287415-22-kwolf@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-10-02 15:46:40 +02:00
Kevin Wolf	3c3bc462ad	block/export: Add block-export-del Implement a new QMP command block-export-del and make nbd-server-remove a wrapper around it. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200924152717.287415-21-kwolf@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-10-02 15:46:40 +02:00
Kevin Wolf	3859ad36f0	block/export: Move strong user reference to block_exports The reference owned by the user/monitor that is created when adding the export and dropped when removing it was tied to the 'exports' list in nbd/server.c. Every block export will have a user reference, so move it to the block export level and tie it to the 'block_exports' list in block/export/export.c instead. This is necessary for introducing a QMP command for removing exports. Note that exports are present in block_exports even after the user has requested shutdown. This is different from NBD's exports where exports are immediately removed on a shutdown request, even if they are still in the process of shutting down. In order to avoid that the user still interacts with an export that is shutting down (and possibly removes it a second time), we need to remember if the user actually still owns it. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200924152717.287415-20-kwolf@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-10-02 15:46:40 +02:00
Kevin Wolf	d53be9ce55	block/export: Add 'id' option to block-export-add We'll need an id to identify block exports in monitor commands. This adds one. Note that this is different from the 'name' option in the NBD server, which is the externally visible export name. While block export ids need to be unique in the whole process, export names must be unique only for the same server. Different export types or (potentially in the future) multiple NBD servers can have the same export name externally, but still need different block export ids internally. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200924152717.287415-19-kwolf@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-10-02 15:46:40 +02:00
Kevin Wolf	bc4ee65b8c	block/export: Add blk_exp_close_all(_type) This adds a function to shut down all block exports, and another one to shut down the block exports of a single type. The latter is used for now when stopping the NBD server. As soon as we implement support for multiple NBD servers, we'll need a per-server list of exports and it will be replaced by a function using that. As a side effect, the BlockExport layer has a list tracking all existing exports now. closed_exports loses its only user and can go away. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200924152717.287415-18-kwolf@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-10-02 15:46:40 +02:00
Kevin Wolf	a6ff798966	block/export: Allocate BlockExport in blk_exp_add() Instead of letting the driver allocate and return the BlockExport object, allocate it already in blk_exp_add() and pass it. This allows us to initialise the generic part before calling into the driver so that the driver can just use these values instead of having to parse the options a second time. For symmetry, move freeing the BlockExport to blk_exp_unref(). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200924152717.287415-17-kwolf@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-10-02 15:46:40 +02:00
Kevin Wolf	b6076afcab	block/export: Add node-name to BlockExportOptions Every block export needs a block node to export, so add a 'node-name' option to BlockExportOptions and remove the replaced option 'device' from BlockExportOptionsNbd. To maintain compatibility in nbd-server-add, BlockExportOptionsNbd needs to be wrapped by a new type NbdServerAddOptions that adds 'device' back because nbd-server-add doesn't use the BlockExportOptions base type at all (so even without changing it to a 'node-name' option in block-export-add, this compatibility code would be necessary). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200924152717.287415-16-kwolf@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-10-02 15:46:40 +02:00
Kevin Wolf	8612c68673	block/export: Move AioContext from NBDExport to BlockExport Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200924152717.287415-15-kwolf@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-10-02 15:46:40 +02:00
Kevin Wolf	c69de1bef5	block/export: Move refcount from NBDExport to BlockExport Having a refcount makes sense for all types of block exports. It is also a prerequisite for keeping a list of all exports at the BlockExport level. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200924152717.287415-14-kwolf@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-10-02 15:46:40 +02:00
Kevin Wolf	1c8222b014	nbd: Add max-connections to nbd-server-start This is a QMP equivalent of qemu-nbd's --shared option, limiting the maximum number of clients that can attach at the same time. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200924152717.287415-9-kwolf@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-10-02 15:46:40 +02:00
Kevin Wolf	9b562c646b	block/export: Remove magic from block-export-add nbd-server-add tries to be convenient and adds two questionable features that we don't want to share in block-export-add, even for NBD exports: 1. When requesting a writable export of a read-only device, the export is silently downgraded to read-only. This should be an error in the context of block-export-add. 2. When using a BlockBackend name, unplugging the device from the guest will automatically stop the NBD server, too. This may sometimes be what you want, but it could also be very surprising. Let's keep things explicit with block-export-add. If the user wants to stop the export, they should tell us so. Move these things into the nbd-server-add QMP command handler so that they apply only there. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200924152717.287415-8-kwolf@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-10-02 15:46:40 +02:00
Kevin Wolf	56ee86261e	block/export: Add BlockExport infrastructure and block-export-add We want to have a common set of commands for all types of block exports. Currently, this is only NBD, but we're going to add more types. This patch adds the basic BlockExport and BlockExportDriver structs and a QMP command block-export-add that creates a new export based on the given BlockExportOptions. qmp_nbd_server_add() becomes a wrapper around qmp_block_export_add(). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200924152717.287415-5-kwolf@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-10-02 15:46:40 +02:00
Kevin Wolf	143ea7670c	qapi: Rename BlockExport to BlockExportOptions The name BlockExport will be used for the struct containing the runtime state of block exports, so change the name of export creation options. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200924152717.287415-4-kwolf@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-10-02 15:46:40 +02:00
Kevin Wolf	5daa6bfd8e	qapi: Create block-export module Move all block export related types and commands from block-core to the new QAPI module block-export. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200924152717.287415-3-kwolf@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-10-02 15:46:40 +02:00
Philippe Mathieu-Daudé	74f2e02766	block/sheepdog: Replace magic val by NANOSECONDS_PER_SECOND definition Use self-explicit NANOSECONDS_PER_SECOND definition instead of magic value. Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200921110145.520944-1-philmd@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-10-02 15:46:40 +02:00
Philippe Mathieu-Daudé	f68c01470b	qapi: Restrict query-uuid command to machine code Only qemu-system-FOO and qemu-storage-daemon provide QMP monitors, therefore such declarations and definitions are irrelevant for user-mode emulation. Restricting the query-uuid command to machine.json pulls less QAPI-generated code into user-mode. Acked-by: Markus Armbruster <armbru@redhat.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200913195348.1064154-6-philmd@redhat.com> [Commit message tweaked] Signed-off-by: Markus Armbruster <armbru@redhat.com>	2020-09-29 15:41:35 +02:00
Stefan Hajnoczi	d73415a315	qemu/atomic.h: rename atomic_ to qatomic_ clang's C11 atomic_fetch_() functions only take a C11 atomic type pointer argument. QEMU uses direct types (int, etc) and this causes a compiler error when a QEMU code calls these functions in a source file that also included <stdatomic.h> via a system header file: $ CC=clang CXX=clang++ ./configure ... && make ../util/async.c:79:17: error: address argument to atomic operation must be a pointer to _Atomic type ('unsigned int ' invalid) Avoid using atomic_*() names in QEMU's atomic.h since that namespace is used by <stdatomic.h>. Prefix QEMU's APIs with 'q' so that atomic.h and <stdatomic.h> can co-exist. I checked /usr/include on my machine and searched GitHub for existing "qatomic_" users but there seem to be none. This patch was generated using: $ git grep -h -o '\<atomic$64$\?_[a-z0-9_]\+' include/qemu/atomic.h \| \ sort -u >/tmp/changed_identifiers $ for identifier in $(</tmp/changed_identifiers); do sed -i "s%\<$identifier\>%q$identifier%g" \ $(git grep -I -l "\<$identifier\>") done I manually fixed line-wrap issues and misaligned rST tables. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20200923105646.47864-1-stefanha@redhat.com>	2020-09-23 16:07:44 +01:00
Daniel P. Berrangé	b18a24a9f8	block/file: switch to use qemu_open/qemu_create for improved errors Currently at startup if using cache=none on a filesystem lacking O_DIRECT such as tmpfs, at startup QEMU prints qemu-system-x86_64: -drive file=/tmp/foo.img,cache=none: file system may not support O_DIRECT qemu-system-x86_64: -drive file=/tmp/foo.img,cache=none: Could not open '/tmp/foo.img': Invalid argument while at QMP level the hint is missing, so QEMU reports just "error": { "class": "GenericError", "desc": "Could not open '/tmp/foo.img': Invalid argument" } which is close to useless for the end user trying to figure out what they did wrong. With this change at startup QEMU prints qemu-system-x86_64: -drive file=/tmp/foo.img,cache=none: Unable to open '/tmp/foo.img': filesystem does not support O_DIRECT while at the QMP level QEMU reports a massively more informative "error": { "class": "GenericError", "desc": "Unable to open '/tmp/foo.img': filesystem does not support O_DIRECT" } Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>	2020-09-16 10:33:48 +01:00
Daniel P. Berrangé	448058aa99	util: rename qemu_open() to qemu_open_old() We want to introduce a new version of qemu_open() that uses an Error object for reporting problems and make this it the preferred interface. Rename the existing method to release the namespace for the new impl. Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>	2020-09-16 10:33:48 +01:00
Stefano Garzarella	7bae7c805d	block/rbd: add 'namespace' to qemu_rbd_strong_runtime_opts[] Commit `19ae9ae014` ("block/rbd: Add support for ceph namespaces") introduced namespace support for RBD, but we forgot to add the new 'namespace' options to qemu_rbd_strong_runtime_opts[]. The 'namespace' is used to identify the image, so it is a strong option since it can changes the data of a BDS. Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1821528 Fixes: `19ae9ae014` ("block/rbd: Add support for ceph namespaces") Cc: Florian Florensa <fflorensa@online.net> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Message-Id: <20200914190553.74871-1-sgarzare@redhat.com> Reviewed-by: Jason Dillaman <dillaman@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-09-15 11:31:10 +02:00
Alberto Garcia	bfd0989acf	qcow2: Convert qcow2_alloc_cluster_offset() into qcow2_alloc_host_offset() qcow2_alloc_cluster_offset() takes an (unaligned) guest offset and returns the (aligned) offset of the corresponding cluster in the qcow2 image. In practice none of the callers need to know where the cluster starts so this patch makes the function calculate and return the final host offset directly. The function is also renamed accordingly. See `388e581615` for a similar change to qcow2_get_cluster_offset(). Signed-off-by: Alberto Garcia <berto@igalia.com> Message-Id: <9bfef50ec9200d752413be4fc2aeb22a28378817.1599833007.git.berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-09-15 11:31:10 +02:00
Alberto Garcia	8e958260c5	qcow2: Make preallocate_co() resize the image to the correct size This function preallocates metadata structures and then extends the image to its new size, but that new size calculation is wrong because it doesn't take into account that the host_offset variable is always cluster-aligned. This problem can be reproduced with preallocation=metadata when the original size is not cluster-aligned but the new size is. In this case the final image size will be shorter than expected. qemu-img create -f qcow2 img.qcow2 31k qemu-img resize --preallocation=metadata img.qcow2 128k Signed-off-by: Alberto Garcia <berto@igalia.com> Message-Id: <adeb8b059917b141d5f5b3bd2a016262d3052c79.1599833007.git.berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> [mreitz: Mark compat=0.10 unsupported for iotest 125] Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-09-15 11:30:36 +02:00
John Snow	c1dadda02c	block/qcow: remove runtime opts Introduced by `d85f4222b4`, These were seemingly never used at all. Signed-off-by: John Snow <jsnow@redhat.com> Message-Id: <20200806211345.2925343-3-jsnow@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-09-15 11:05:13 +02:00
John Snow	30b70f070f	block/rbd: remove runtime_opts This saw its last use in `4bfb274165`. Signed-off-by: John Snow <jsnow@redhat.com> Message-Id: <20200806211345.2925343-2-jsnow@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-09-15 11:05:13 +02:00
Alberto Garcia	580384d637	qcow2: Return the original error code in qcow2_co_pwrite_zeroes() This function checks the current status of a (sub)cluster in order to see if an unaligned 'write zeroes' request can be done efficiently by simply updating the L2 metadata and without having to write actual zeroes to disk. If the situation does not allow using the fast path then the function returns -ENOTSUP and the caller falls back to writing zeroes. If can happen however that the aforementioned check returns an actual error code so in this case we should pass it to the caller. Signed-off-by: Alberto Garcia <berto@igalia.com> Message-Id: <20200909123739.719-1-berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-09-15 11:05:13 +02:00
Alberto Garcia	3fec237fca	qcow2: Make qcow2_free_any_clusters() free only one cluster This function takes an L2 entry and a number of clusters to free. Although in principle it can free any type of cluster (using the L2 entry to determine its type) in practice the API is broken because compressed clusters have a variable size and there is no way to free more than one without having the L2 entry of each one of them. The good news all callers are passing nb_clusters=1 so we can simply get rid of that parameter. Signed-off-by: Alberto Garcia <berto@igalia.com> Message-Id: <77cea0f4616f921d37e971b3c5b18a2faa24b173.1599573989.git.berto@igalia.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-09-15 11:05:13 +02:00
Alberto Garcia	1a52b73dba	qcow2: Handle QCowL2Meta on error in preallocate_co() If qcow2_alloc_cluster_offset() or qcow2_alloc_cluster_link_l2() fail then this function simply returns the error code, potentially leaking the QCowL2Meta structure and leaving stale items in s->cluster_allocs. A second problem is that this function calls qcow2_free_any_clusters() on failure but passing a host cluster offset instead of an L2 entry. Luckily for normal uncompressed clusters a raw offset also works like a valid L2 entry so it works just the same, but we should be using qcow2_free_clusters() instead. This patch fixes both problems by using qcow2_handle_l2meta(). Signed-off-by: Alberto Garcia <berto@igalia.com> Message-Id: <cd3a6b9abd43f9c0b60be413d760f0cacc67eb66.1599573989.git.berto@igalia.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-09-15 11:05:13 +02:00
Swapnil Ingle	83a6a90009	block/vhdx: Support vhdx image only with 512 bytes logical sector size block/vhdx uses qemu block layer where sector size is always 512 bytes. This may have issues with 4K logical sector sized vhdx image. For e.g qemu-img convert on such images fails with following assert: $qemu-img convert -f vhdx -O raw 4KTest1.vhdx test.raw qemu-img: util/iov.c:388: qiov_slice: Assertion `offset + len <= qiov->size' failed. Aborted This patch adds an check to return ENOTSUP for vhdx images which have logical sector size other than 512 bytes. Signed-off-by: Swapnil Ingle <swapnil.ingle@nutanix.com> Message-Id: <1596794594-44531-1-git-send-email-swapnil.ingle@nutanix.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-09-15 11:05:13 +02:00
Alberto Garcia	2b60c5b996	qcow2: Rewrite the documentation of qcow2_alloc_cluster_offset() The current text corresponds to an earlier, simpler version of this function and it does not explain how it works now. Signed-off-by: Alberto Garcia <berto@igalia.com> Message-Id: <bb5bd06f07c5a05b0818611de0d06ec5b66c8df3.1599150873.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-09-15 11:05:13 +02:00
Alberto Garcia	f7bd5bba1b	qcow2: Don't check nb_clusters when removing l2meta from the list In the past, when a new cluster was allocated the l2meta structure was a variable in the stack so it was necessary to have a way to tell whether it had been initialized and contained valid data or not. The nb_clusters field was used for this purpose. Since commit `f50f88b9fe` this is no longer the case, l2meta (nowadays a pointer to a list) is only allocated when needed and nb_clusters is guaranteed to be > 0 so this check is unnecessary. Signed-off-by: Alberto Garcia <berto@igalia.com> Message-Id: <ab0b67c29c7ba26e598db35f12aa5ab5982539c1.1599150873.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-09-15 11:05:13 +02:00
Alberto Garcia	184581fa4d	qcow2: Fix removal of list members from BDRVQcow2State.cluster_allocs When a write request needs to allocate new clusters (or change the L2 bitmap of existing ones) a QCowL2Meta structure is created so the L2 metadata can be later updated and any copy-on-write can be performed if necessary. A write request can span a region consisting of an arbitrary combination of previously unallocated and allocated clusters, and if the unallocated ones can be put contiguous to the existing ones then QEMU will do so in order to minimize the number of write operations. In practice this means that a write request has not just one but a number of QCowL2Meta structures. All of them are added to the cluster_allocs list that is stored in BDRVQcow2State and is used to detect overlapping requests. After the write request finishes all its associated QCowL2Meta are removed from that list. calculate_l2_meta() takes care of creating and putting those structures in the list, and qcow2_handle_l2meta() takes care of removing them. The problem is that the error path in handle_alloc() also tries to remove an item in that list, a remnant from the time when this was handled there (that code would not even be correct anymore because it only removes one struct and not all the ones from the same write request). This can trigger a double removal of the same item from the list, causing a crash. This is not easy to reproduce in practice because it requires that do_alloc_cluster_offset() fails after a successful previous allocation during the same write request, but it can be reproduced with the included test case. Signed-off-by: Alberto Garcia <berto@igalia.com> Message-Id: <3440a1c4d53c4fe48312b478c96accb338cbef7c.1599150873.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-09-15 11:05:13 +02:00
Alberto Garcia	02b1ecfa10	qcow2: Use macros for the L1, refcount and bitmap table entry sizes This patch replaces instances of sizeof(uint64_t) in the qcow2 driver with macros that indicate what those sizes are actually referring to. Signed-off-by: Alberto Garcia <berto@igalia.com> Message-Id: <20200828110828.13833-1-berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-09-15 11:05:12 +02:00
Lukas Straub	5eb9a3c7b0	block/quorum.c: stable children names If we remove the child with the highest index from the quorum, decrement s->next_child_index. This way we get stable children names as long as we only remove the last child. Signed-off-by: Lukas Straub <lukasstraub2@web.de> Fixes: https://bugs.launchpad.net/bugs/1881231 Reviewed-by: Zhang Chen <chen.zhang@intel.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-Id: <5d5f930424c1c770754041aa8ad6421dc4e2b58e.1596536719.git.lukasstraub2@web.de> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-09-15 11:05:12 +02:00
Peter Maydell	2499453eb1	Block layer patches: - qemu-img create: Fail gracefully when backing file is an empty string - Fixes related to filter block nodes ("Deal with filters" series) - block/nvme: Various cleanups required to use multiple queues - block/nvme: Use NvmeBar structure from "block/nvme.h" - file-win32: Fix "locking" option - iotests: Allow running from different directory -----BEGIN PGP SIGNATURE----- iQJFBAABCAAvFiEE3D3rFZqa+V09dFb+fwmycsiPL9YFAl9Z7bcRHGt3b2xmQHJl ZGhhdC5jb20ACgkQfwmycsiPL9ZS6w//bos+A0RfRRF0YFWkIBLQWxqzKcGvMJ8W XWv3mFzd47UaDgRYwVnCC3CR6bLYEINISngZ3geA4jI1+w7AtYKDOO0HN32dUg+D ZrNMn02701CA6qkmpxJ+yjsrl9ltR3jYe0me4Wr39Pvdexa2pl/e+M4Vas6FhkYL ghAwNThypscGCrFjAlz3ru2Sc/K+sPWrGoqkzr+SWvsm9wy4vb8aLxr8Yy50x/zc CqALS9SQ/YA93BCVi9CzPkVyV3ioA0kg/y38WvLtAQ9GZ3m/ekMro3WvdYsRsFCN LGXsuwFig+U7Kd7lJrCS9TLnlTJstNGqPq9jEoV5cThPvGknFfMvVOzRmmP7tzqT YRcPRy39z44OoLKa3kyg3aF38BTxt+9gPqBnivKMr9j9EecMvPsXXHRvF+lP+LsP j753Ih561hX6FurcjX8pc9GOM2cQA0GjlyL77UTTAmLZyFXP/8e55oQbBuYTylc/ Xlvmc/T+yEGiEGTnK+FxgDAiUaxbCCM9cDVStJjTvsIq43dwXb48g1onDsGZ5eDf j9lmAD6TJxHNOB5ErNsDPODf4/D1wJ9t9WVF8UZp9ArfPHRdxMzT7Q4LvetaDmVl +hQC9cgTq8Qd8LwSqbKEYua4L6iGbmLAT7/N6htq5L1eVLg76/tLg/tKSwh/vKAY yzPmyHaVK84= =gaaW -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging Block layer patches: - qemu-img create: Fail gracefully when backing file is an empty string - Fixes related to filter block nodes ("Deal with filters" series) - block/nvme: Various cleanups required to use multiple queues - block/nvme: Use NvmeBar structure from "block/nvme.h" - file-win32: Fix "locking" option - iotests: Allow running from different directory # gpg: Signature made Thu 10 Sep 2020 10:11:19 BST # gpg: using RSA key DC3DEB159A9AF95D3D7456FE7F09B272C88F2FD6 # gpg: issuer "kwolf@redhat.com" # gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" [full] # Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6 * remotes/kevin/tags/for-upstream: (65 commits) block/qcow2-cluster: Add missing "fallthrough" annotation block/nvme: Pair doorbell registers block/nvme: Use generic NvmeBar structure block/nvme: Group controller registers in NVMeRegs structure file-win32: Fix "locking" option iotests: Allow running from different directory iotests: Test committing to overridden backing iotests: Add test for commit in sub directory iotests: Add filter mirror test cases iotests: Add filter commit test cases iotests: Let complete_and_wait() work with commit iotests: Test that qcow2's data-file is flushed block: Leave BDS.backing_{file,format} constant block: Inline bdrv_co_block_status_from_*() blockdev: Fix active commit choice block: Drop backing_bs() qemu-img: Use child access functions nbd: Use CAF when looking for dirty bitmap commit: Deal with filters backup: Deal with filters ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2020-09-11 14:47:49 +01:00
Thomas Huth	b9be6faed1	block/qcow2-cluster: Add missing "fallthrough" annotation When compiling with -Werror=implicit-fallthrough, the compiler currently complains: ../../devel/qemu/block/qcow2-cluster.c: In function ‘cluster_needs_new_alloc’: ../../devel/qemu/block/qcow2-cluster.c:1320:12: error: this statement may fall through [-Werror=implicit-fallthrough=] if (l2_entry & QCOW_OFLAG_COPIED) { ^ ../../devel/qemu/block/qcow2-cluster.c:1323:5: note: here case QCOW2_CLUSTER_UNALLOCATED: ^~~~ It's quite obvious that the fallthrough is intended here, so let's add a comment to silence the compiler warning. Signed-off-by: Thomas Huth <thuth@redhat.com> Message-Id: <20200908070028.193298-1-thuth@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-09-10 11:11:13 +02:00
Philippe Mathieu-Daudé	e5ff22ba9f	block/nvme: Pair doorbell registers For each queue doorbell registers are paired as: - Submission Queue Tail Doorbell - Completion Queue Head Doorbell Reflect that in the NVMeRegs structure, and adapt nvme_create_queue_pair() accordingly. Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200904124130.583838-4-philmd@redhat.com> Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Fam Zheng <fam@euphon.net> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-09-10 11:11:13 +02:00
Philippe Mathieu-Daudé	c7100f0a0b	block/nvme: Use generic NvmeBar structure Commit `f3c507adcd` ("NVMe: Initial commit for new storage interface") introduced the NvmeBar structure. Unfortunately in commit `bdd6a90a9e` ("block: Add VFIO based NVMe driver") we duplicated it. Apparently in commit `a3d9a352d4` ("block: Move NVMe constants to a separate header") we tried to unify headers but forgot to remove the structure declared in the block/nvme.c source file. Do it now, and remove the structure size check which is redundant with the header check added in commit `74e18435c0` ("hw/block/nvme: Align I/O BAR to 4 KiB"). Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200904124130.583838-3-philmd@redhat.com> Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Fam Zheng <fam@euphon.net> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-09-10 11:11:13 +02:00
Philippe Mathieu-Daudé	0ea32f34ce	block/nvme: Group controller registers in NVMeRegs structure We want to use the NvmeBar structure from "block/nvme.h" in the next commit. As a preliminary step, group all the NVMe controller registers in the 'ctrl' field, keeping the doorbells registers out of it. Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200904124130.583838-2-philmd@redhat.com> Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Fam Zheng <fam@euphon.net> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-09-10 11:11:12 +02:00
Kevin Wolf	3b079ac0ff	file-win32: Fix "locking" option The intended behaviour was that locking=off/auto work and have no effect (to remain compatible with file-posix), whereas locking=on would return an error. Unfortunately, the code forgot to remove "locking" from the options QDict, so any attempt to use the option would fail. Replace the option parsing code for "locking" with something that is part of the raw_runtime_opts QemuOptsList (so it is properly removed from the QDict) and looks more like file-posix. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20200907092739.9988-1-kwolf@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-09-10 11:11:12 +02:00
Markus Armbruster	b15e402fc8	trace-events: Fix attribution of trace points to source Some trace points are attributed to the wrong source file. Happens when we neglect to update trace-events for code motion, or add events in the wrong place, or misspell the file name. Clean up with help of scripts/cleanup-trace-events.pl. Funnies requiring manual post-processing: * accel/tcg/cputlb.c trace points are in trace-events. * block.c and blockdev.c trace points are in block/trace-events. * hw/block/nvme.c uses the preprocessor to hide its trace point use from cleanup-trace-events.pl. * hw/tpm/tpm_spapr.c uses pseudo trace point tpm_spapr_show_buffer to guard debug code. * include/hw/xen/xen_common.h trace points are in hw/xen/trace-events. * linux-user/trace-events abbreviates a tedious list of filenames to /signal.c. net/colo-compare and net/filter-rewriter.c use pseudo trace points colo_compare_miscompare and colo_filter_rewriter_debug to guard debug code. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20200806141334.3646302-5-armbru@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-09-09 17:17:58 +01:00
Markus Armbruster	6ec9379870	trace-events: Delete unused trace points Tracked down with the help of scripts/cleanup-trace-events.pl. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-id: 20200806141334.3646302-4-armbru@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-09-09 17:17:02 +01:00
Max Reitz	0b877d09df	block: Leave BDS.backing_{file,format} constant Parts of the block layer treat BDS.backing_file as if it were whatever the image header says (i.e., if it is a relative path, it is relative to the overlay), other parts treat it like a cache for bs->backing->bs->filename (relative paths are relative to the CWD). Considering bs->backing->bs->filename exists, let us make it mean the former. Among other things, this now allows the user to specify a base when using qemu-img to commit an image file in a directory that is not the CWD (assuming, everything uses relative filenames). Before this patch: $ ./qemu-img create -f qcow2 foo/bot.qcow2 1M $ ./qemu-img create -f qcow2 -b bot.qcow2 foo/mid.qcow2 $ ./qemu-img create -f qcow2 -b mid.qcow2 foo/top.qcow2 $ ./qemu-img commit -b mid.qcow2 foo/top.qcow2 qemu-img: Did not find 'mid.qcow2' in the backing chain of 'foo/top.qcow2' $ ./qemu-img commit -b foo/mid.qcow2 foo/top.qcow2 qemu-img: Did not find 'foo/mid.qcow2' in the backing chain of 'foo/top.qcow2' $ ./qemu-img commit -b $PWD/foo/mid.qcow2 foo/top.qcow2 qemu-img: Did not find '[...]/foo/mid.qcow2' in the backing chain of 'foo/top.qcow2' After this patch: $ ./qemu-img commit -b mid.qcow2 foo/top.qcow2 Image committed. $ ./qemu-img commit -b foo/mid.qcow2 foo/top.qcow2 qemu-img: Did not find 'foo/mid.qcow2' in the backing chain of 'foo/top.qcow2' $ ./qemu-img commit -b $PWD/foo/mid.qcow2 foo/top.qcow2 Image committed. With this change, bdrv_find_backing_image() must look at whether the user has overridden a BDS's backing file. If so, it can no longer use bs->backing_file, but must instead compare the given filename against the backing node's filename directly. Note that this changes the QAPI output for a node's backing_file. We had very inconsistent output there (sometimes what the image header said, sometimes the actual filename of the backing image). This inconsistent output was effectively useless, so we have to decide one way or the other. Considering that bs->backing_file usually at runtime contained the path to the image relative to qemu's CWD (or absolute), this patch changes QAPI's backing_file to always report the bs->backing->bs->filename from now on. If you want to receive the image header information, you have to refer to full-backing-filename. This necessitates a change to iotest 228. The interesting information it really wanted is the image header, and it can get that now, but it has to use full-backing-filename instead of backing_file. Because of this patch's changes to bs->backing_file's behavior, we also need some reference output changes. Along with the changes to bs->backing_file, stop updating BDS.backing_format in bdrv_backing_attach() as well. This way, ImageInfo's backing-filename and backing-filename-format fields will represent what the image header says and nothing else. iotest 245 changes in behavior: With the backing node no longer overriding the parent node's backing_file string, you can now omit the @backing option when reopening a node with neither a default nor a current backing file even if it used to have a backing node at some point. 273 also changes: The base image is opened without a format layer, so ImageInfo.backing-filename-format used to report "file" for the base image's overlay after blockdev-snapshot. However, the image header never says "file" anywhere, so it now reports $IMGFMT. Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-09-07 12:31:31 +02:00
Max Reitz	549ec0d978	block: Inline bdrv_co_block_status_from_*() With bdrv_filter_bs(), we can easily handle this default filter behavior in bdrv_co_block_status(). blkdebug wants to have an additional assertion, so it keeps its own implementation, except bdrv_co_block_status_from_file() needs to be inlined there. Suggested-by: Eric Blake <eblake@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com>	2020-09-07 12:31:31 +02:00
Max Reitz	9a71b9de3f	commit: Deal with filters This includes some permission limiting (for example, we only need to take the RESIZE permission if the base is smaller than the top). Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-09-07 12:31:31 +02:00
Max Reitz	2b088c60bb	backup: Deal with filters Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com>	2020-09-07 12:31:31 +02:00
Max Reitz	3f072a7fb7	mirror: Deal with filters This includes some permission limiting (for example, we only need to take the RESIZE permission for active commits where the base is smaller than the top). base_overlay is introduced so we can query bdrv_is_allocated_above() on it - we cannot do that with base itself, because a filter's block_status is the same as its child node, so if there are filters on base, bdrv_is_allocated_above() on base would return information including base. Use this opportunity to rename qmp_drive_mirror()'s "source" BDS to "target_backing_bs", because that is what it really refers to. Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-09-07 12:31:31 +02:00
Max Reitz	c6f6d8462c	block-copy: Use CAF to find sync=top base Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com>	2020-09-07 12:31:31 +02:00
Max Reitz	0a7585dbba	block: Use child access functions for QAPI queries query-block, query-named-block-nodes, and query-blockstats now return any filtered child under "backing", not just bs->backing or COW children. This is so that filters do not interrupt the reported backing chain. This changes the output for iotest 184, as the throttled node now appears as a backing child. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com>	2020-09-07 12:31:31 +02:00
Max Reitz	3f26191c73	block: Report data child for query-blockstats It makes no sense to report the block stats of a purely metadata-storing child in query-blockstats. So if the primary child does not have any data, try to find a unique data-storing child. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com>	2020-09-07 12:31:31 +02:00
Max Reitz	07cd7b659a	block/null: Implement bdrv_get_allocated_file_size It is trivial, so we might as well do it. Remove _filter_actual_image_size from iotest 184, so we get to see the result in its reference output. Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-09-07 12:31:31 +02:00
Max Reitz	c8af87573f	block/snapshot: Fix fallback If the top node's driver does not provide snapshot functionality and we want to fall back to a node down the chain, we need to snapshot all non-COW children. For simplicity's sake, just do not fall back if there is more than one such child. Furthermore, we really only can fall back to bs->file and bs->backing, because bdrv_snapshot_goto() has to modify the child link (notably, set it to NULL). Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com>	2020-09-07 12:31:31 +02:00
Max Reitz	c4db2e25df	block: Use CAF in bdrv_co_rw_vmstate() If a node whose driver does not provide VM state functions has a metadata child, the VM state should probably go there; if it is a filter, the VM state should probably go there. It follows that we should generally go down to the primary child. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com>	2020-09-07 12:31:31 +02:00
Max Reitz	66b129ac5e	block: Iterate over children in refresh_limits Instead of looking at just bs->file and bs->backing, we should look at all children that could end up receiving forwarded requests. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com>	2020-09-07 12:31:31 +02:00
Max Reitz	fb787f02a6	vmdk: Drop vmdk_co_flush() Before HEAD^, we needed this because bdrv_co_flush() by itself would only flush bs->file. With HEAD^, bdrv_co_flush() will flush all children on which a WRITE or WRITE_UNCHANGED permission has been taken. Thus, vmdk no longer needs to do it itself. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com>	2020-09-07 12:31:31 +02:00
Max Reitz	883833e29c	block: Flush all children in generic code If the driver does not support .bdrv_co_flush() so bdrv_co_flush() itself has to flush the children of the given node, it should not flush just bs->file->bs, but in fact all children that might have been written to (judging from the permissions taken on them). This is a bug fix for qcow2 images with an external data file, as they so far did not flush that data_file node. In any case, the BLKDBG_EVENT() should be emitted on the primary child, because that is where a blkdebug node would be if there is any. Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com>	2020-09-07 12:31:31 +02:00
Max Reitz	23b93525a2	block: Use bdrv_cow_child() in bdrv_co_truncate() The condition modified here is not about potentially filtered children, but only about COW sources (i.e. traditional backing files). Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com>	2020-09-07 12:31:31 +02:00
Max Reitz	67acfd2188	stream: Deal with filters Because of the (not so recent anymore) changes that make the stream job independent of the base node and instead track the node above it, we have to split that "bottom" node into two cases: The bottom COW node, and the node directly above the base node (which may be an R/W filter or the bottom COW node). Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-09-07 12:31:31 +02:00
Max Reitz	cb8503159a	block: Use CAFs in block status functions Use the child access functions in the block status inquiry functions as appropriate. Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-09-07 12:31:31 +02:00
Max Reitz	93393e698c	block: Use bdrv_filter_(bs\|child) where obvious Places that use patterns like if (bs->drv->is_filter && bs->file) { ... something about bs->file->bs ... } should be BlockDriverState *filtered = bdrv_filter_bs(bs); if (filtered) { ... something about @filtered ... } instead. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com>	2020-09-07 12:31:31 +02:00
Max Reitz	4935e8be22	copy-on-read: Support compressed writes Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com>	2020-09-07 12:31:31 +02:00
Max Reitz	e7e754aec3	throttle: Support compressed writes Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com>	2020-09-07 12:31:31 +02:00
Max Reitz	8b8277cdb0	block: Drop bdrv_is_encrypted() The original purpose of bdrv_is_encrypted() was to inquire whether a BDS can be used without the user entering a password or not. It has not been used for that purpose for quite some time. Actually, it is not even fit for that purpose, because to answer that question, it would have recursively query all of the given node's children. So now we have to decide in which direction we want to fix bdrv_is_encrypted(): Recursively query all children, or drop it and just use bs->encrypted to get the current node's status? Nowadays, its only purpose is to report through bdrv_query_image_info() whether the given image is encrypted or not. For this purpose, it is probably more interesting to see whether a given node itself is encrypted or not (otherwise, a management application cannot discern for certain which nodes are really encrypted and which just have encrypted children). Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com>	2020-09-07 12:31:30 +02:00
Philippe Mathieu-Daudé	b111b3fcde	block/nvme: Use an array of EventNotifier In preparation of using multiple IRQ (thus multiple eventfds) make BDRVNVMeState::irq_notifier an array (for now of a single element, the admin queue notifier). Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200821195359.1285345-16-philmd@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-09-07 12:31:30 +02:00
Philippe Mathieu-Daudé	7a1fb2ef40	block/nvme: Extract nvme_poll_queue() As we want to do per-queue polling, extract the nvme_poll_queue() method which operates on a single queue. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200821195359.1285345-15-philmd@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-09-07 12:31:30 +02:00
Philippe Mathieu-Daudé	0a28b02ef9	block/nvme: Simplify nvme_create_queue_pair() arguments nvme_create_queue_pair() doesn't require BlockDriverState anymore. Replace it by BDRVNVMeState and AioContext to simplify. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200821195359.1285345-14-philmd@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-09-07 12:31:30 +02:00
Philippe Mathieu-Daudé	073a06978c	block/nvme: Replace BDRV_POLL_WHILE by AIO_WAIT_WHILE BDRV_POLL_WHILE() is defined as: #define BDRV_POLL_WHILE(bs, cond) ({ \ BlockDriverState *bs_ = (bs); \ AIO_WAIT_WHILE(bdrv_get_aio_context(bs_), \ cond); }) As we will remove the BlockDriverState use in the next commit, start by using the exploded version of BDRV_POLL_WHILE(). Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200821195359.1285345-13-philmd@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-09-07 12:31:30 +02:00
Philippe Mathieu-Daudé	3a6d34d066	block/nvme: Simplify nvme_init_queue() arguments nvme_init_queue() doesn't require BlockDriverState anymore. Replace it by BDRVNVMeState to simplify. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200821195359.1285345-12-philmd@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-09-07 12:31:30 +02:00
Philippe Mathieu-Daudé	38e1f8186f	block/nvme: Replace qemu_try_blockalign(bs) by qemu_try_memalign(pg_sz) qemu_try_blockalign() is a generic API that call back to the block driver to return its page alignment. As we call from within the very same driver, we already know to page alignment stored in our state. Remove indirections and use the value from BDRVNVMeState. This change is required to later remove the BlockDriverState argument, to make nvme_init_queue() per hardware, and not per block driver. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200821195359.1285345-11-philmd@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-09-07 12:31:30 +02:00
Philippe Mathieu-Daudé	2ed846930d	block/nvme: Replace qemu_try_blockalign0 by qemu_try_blockalign/memset In the next commit we'll get rid of qemu_try_blockalign(). To ease review, first replace qemu_try_blockalign0() by explicit calls to qemu_try_blockalign() and memset(). Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200821195359.1285345-10-philmd@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-09-07 12:31:30 +02:00
Philippe Mathieu-Daudé	7d3b214ae4	block/nvme: Use union of NvmeIdCtrl / NvmeIdNs structures We allocate an unique chunk of memory then use it for two different structures. By using an union, we make it clear the data is overlapping (and we can remove the casts). Suggested-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200821195359.1285345-9-philmd@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-09-07 12:24:53 +02:00
Philippe Mathieu-Daudé	4d98093937	block/nvme: Rename local variable We are going to modify the code in the next commit. Renaming the 'resp' variable to 'id' first makes the next commit easier to review. No logical changes. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200821195359.1285345-8-philmd@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-09-07 12:23:55 +02:00
Philippe Mathieu-Daudé	c8edbfb2cc	block/nvme: Use common error path in nvme_add_io_queue() Rearrange nvme_add_io_queue() by using a common error path. This will be proven useful in few commits where we add IRQ notification to the IO queues. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200821195359.1285345-7-philmd@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-09-07 12:23:55 +02:00
Philippe Mathieu-Daudé	bf6ce5ec6d	block/nvme: Improve error message when IO queue creation failed Do not use the same error message for different failures. Display a different error whether it is the CQ or the SQ. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200821195359.1285345-6-philmd@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-09-07 12:23:55 +02:00
Philippe Mathieu-Daudé	73159e52e6	block/nvme: Define INDEX macros to ease code review Use definitions instead of '0' or '1' indexes. Also this will be useful when using multi-queues later. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200821195359.1285345-5-philmd@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-09-07 12:23:55 +02:00
Philippe Mathieu-Daudé	0ea45f76eb	block/nvme: Let nvme_create_queue_pair() fail gracefully As nvme_create_queue_pair() is allowed to fail, replace the alloc() calls by try_alloc() to avoid aborting QEMU. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200821195359.1285345-4-philmd@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-09-07 12:23:55 +02:00
Philippe Mathieu-Daudé	e266f52cfb	block/nvme: Avoid further processing if trace event not enabled Avoid further processing if TRACE_NVME_SUBMIT_COMMAND_RAW is not enabled. This is an untested intend of performance optimization. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200821195359.1285345-3-philmd@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-09-07 12:23:55 +02:00
Philippe Mathieu-Daudé	e4f310fe7f	block/nvme: Replace magic value by SCALE_MS definition Use self-explicit SCALE_MS definition instead of magic value. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200821195359.1285345-2-philmd@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-09-07 12:23:55 +02:00
Peter Maydell	df8176274a	nbd patches for 2020-09-02 - fix a few iotests affected by earlier nbd changes - avoid blocking qemu by nbd client in connect() - build qemu-nbd for mingw -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEccLMIrHEYCkn0vOqp6FrSiUnQ2oFAl9QFB8ACgkQp6FrSiUn Q2pTwggArPN7dRwm1KD9jL8X6PV01uhLuLRFzrrofFX22Blroj5XPbR24BdKeN8V uDSFGzzoz2Vx3vKPHyZdx7bbeEL0pF9dzjZX6JwzT0McbVuge5aG2zC/ARdfc4PN 1Yf2FB/nY8Xt5G12usu3FIz7JBoQNm4mlPnqVqf7t0LQxgUFvO7F2LernyqEOYKS uSpXHNFqddZcax7etXeldJOlSGJBQTaCmplrJbw2ilVhLZJD+0OglY4SAsrrenN+ gb/KD4REjhtQsmoTaqthGCnGXipEoJYDfEOMhbkl0UK+9Mx0t+3cj3tflG0eGldM ERk1d4d7VSlyqIy7w43v3+IB8M99ag== =7Cn5 -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/ericb/tags/pull-nbd-2020-09-02' into staging nbd patches for 2020-09-02 - fix a few iotests affected by earlier nbd changes - avoid blocking qemu by nbd client in connect() - build qemu-nbd for mingw # gpg: Signature made Wed 02 Sep 2020 22:52:31 BST # gpg: using RSA key 71C2CC22B1C4602927D2F3AAA7A16B4A2527436A # gpg: Good signature from "Eric Blake <eblake@redhat.com>" [full] # gpg: aka "Eric Blake (Free Software Programmer) <ebb9@byu.net>" [full] # gpg: aka "[jpeg image of size 6874]" [full] # Primary key fingerprint: 71C2 CC22 B1C4 6029 27D2 F3AA A7A1 6B4A 2527 436A * remotes/ericb/tags/pull-nbd-2020-09-02: nbd: disable signals and forking on Windows builds nbd: skip SIGTERM handler if NBD device support is not built block: add missing socket_init() calls to tools block/nbd: use non-blocking connect: fix vm hang on connect() iotests/259: Fix reference output iotests/059: Fix reference output Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2020-09-03 21:35:01 +01:00
Vladimir Sementsov-Ogievskiy	1dc4718d84	block/nbd: use non-blocking connect: fix vm hang on connect() This makes nbd's connection_co yield during reconnects, so that reconnect doesn't block the main thread. This is very important in case of an unavailable nbd server host: connect() call may take a long time, blocking the main thread (and due to reconnect, it will hang again and again with small gaps of working time during pauses between connection attempts). Realization notes: - We don't want to implement non-blocking connect() over non-blocking socket, because getaddrinfo() doesn't have portable non-blocking realization anyway, so let's just use a thread for both getaddrinfo() and connect(). - We can't use qio_channel_socket_connect_async (which behaves similarly and starts a thread to execute connect() call), as it's relying on someone iterating main loop (g_main_loop_run() or something like this), which is not always the case. - We can't use thread_pool_submit_co API, as thread pool waits for all threads to finish (but we don't want to wait for blocking reconnect attempt on shutdown. So, we just create the thread by hand. Some additional difficulties are: - We want our connect to avoid blocking drained sections and aio context switches. To achieve this, we make it possible to "cancel" synchronous wait for the connect (which is a coroutine yield actually), still, the thread continues in background, and if successful, its result may be reused on next reconnect attempt. - We don't want to wait for reconnect on shutdown, so there is CONNECT_THREAD_RUNNING_DETACHED thread state, which means that the block layer is no longer interested in a result, and thread should close new connected socket on finish and free the state. How to reproduce the bug, fixed with this commit: 1. Create an image on node1: qemu-img create -f qcow2 xx 100M 2. Start NBD server on node1: qemu-nbd xx 3. Start vm with second nbd disk on node2, like this: ./x86_64-softmmu/qemu-system-x86_64 -nodefaults -drive \ file=/work/images/cent7.qcow2 -drive file=nbd+tcp://192.168.100.2 \ -vnc :0 -qmp stdio -m 2G -enable-kvm -vga std 4. Access the vm through vnc (or some other way?), and check that NBD drive works: dd if=/dev/sdb of=/dev/null bs=1M count=10 - the command should succeed. 5. Now, let's trigger nbd-reconnect loop in Qemu process. For this: 5.1 Kill NBD server on node1 5.2 run "dd if=/dev/sdb of=/dev/null bs=1M count=10" in the guest again. The command should fail and a lot of error messages about failing disk may appear as well. Now NBD client driver in Qemu tries to reconnect. Still, VM works well. 6. Make node1 unavailable on NBD port, so connect() from node2 will last for a long time: On node1 (Note, that 10809 is just a default NBD port): sudo iptables -A INPUT -p tcp --dport 10809 -j DROP After some time the guest hangs, and you may check in gdb that Qemu hangs in connect() call, issued from the main thread. This is the BUG. 7. Don't forget to drop iptables rule from your node1: sudo iptables -D INPUT -p tcp --dport 10809 -j DROP Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20200812145237.4396-1-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> [eblake: minor wording and formatting tweaks] Signed-off-by: Eric Blake <eblake@redhat.com>	2020-09-02 16:47:23 -05:00
Peter Maydell	e4d8b7c1a9	qemu-nvme -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE28EdLTc7SjdV9QLsYlFWYQpPbMAFAl9Pro4ACgkQYlFWYQpP bMC1CRAAkawTI4mMcOfI3smFoMeiY8kZJWXJUBXfHbMJ4asaoIjTkH/lXRXBw7KQ sH5tB9CuOums3VjagkZ0Sw6R/kP1LbyJTAwq/pwOXwYRDc/E3zQpMblkIHH1boIM Bxl814hw3hBqV+D0wgeKpl83lbiOpd10Cbpdb/xNKat6qVquLGurSGKgA7jNuF4s oPTPtfZpyH9LUr4DV+sL+fGX6vaCdSFZPZUhJqwFfx79+r3+YiHGLAE6fgsdGDJt 2RSSKMqBe2gg0BY5ToW9L55BsLnwMMrAZnGzEkeZvRKqm0JZBXQsERa61p4VEAJf uYkSEqOwsKjXQNTdDEekyH67AkgXaoqG0hiiOcgoLsla7C0zROtoKcfVM/+WC0LT T0/bfgubmoDV8kLzPuOV8xOGxjfbu4Qlxy1JsIC6BU4zBQvpDwOeTx3MUWaCUfvk YmDMEhZWGcZ3RBLrgQmzm4ZKMtGdYXnGQz5dwVkRRfghQs2fl5ZmUjGR7MqKe18n 4K0nzhPiXbOTlqvLVvzVlrBzdc8ECAs1kVoJF7C3LwRmXbT2N/fUhZP/nYpeM2Hj DQNmA8KpXMKae2+2iDnQNWbvdpz3SiHD6dK7A1bEsdoG0L60xfyeAF+JuPiESUnd OAhf+muxKiInv2k5GNh7mDZPWM6nDepf/PZP6ohc7dKxVam7N2M= =Y23H -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/nvme/tags/pull-nvme-20200902' into staging qemu-nvme # gpg: Signature made Wed 02 Sep 2020 15:39:10 BST # gpg: using RSA key DBC11D2D373B4A3755F502EC625156610A4F6CC0 # gpg: Good signature from "Keith Busch <kbusch@kernel.org>" [unknown] # gpg: aka "Keith Busch <keith.busch@gmail.com>" [unknown] # gpg: aka "Keith Busch <keith.busch@intel.com>" [unknown] # gpg: WARNING: This key is not certified with a trusted signature! # gpg: There is no indication that the signature belongs to the owner. # Primary key fingerprint: DBC1 1D2D 373B 4A37 55F5 02EC 6251 5661 0A4F 6CC0 * remotes/nvme/tags/pull-nvme-20200902: (39 commits) hw/block/nvme: remove explicit qsg/iov parameters hw/block/nvme: use preallocated qsg/iov in nvme_dma_prp hw/block/nvme: consolidate qsg/iov clearing hw/block/nvme: add ns/cmd references in NvmeRequest hw/block/nvme: be consistent about zeros vs zeroes hw/block/nvme: add check for mdts hw/block/nvme: refactor request bounds checking hw/block/nvme: verify validity of prp lists in the cmb hw/block/nvme: add request mapping helper hw/block/nvme: add tracing to nvme_map_prp hw/block/nvme: refactor dma read/write hw/block/nvme: destroy request iov before reuse hw/block/nvme: remove redundant has_sg member hw/block/nvme: replace dma_acct with blk_acct equivalent hw/block/nvme: add mapping helpers hw/block/nvme: memset preallocated requests structures hw/block/nvme: bump supported version to v1.3 hw/block/nvme: provide the mandatory subnqn field hw/block/nvme: enforce valid queue creation sequence hw/block/nvme: reject invalid nsid values in active namespace id list ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2020-09-02 21:20:20 +01:00
Klaus Jensen	69265150aa	hw/block/nvme: be consistent about zeros vs zeroes The NVM Express specification generally uses 'zeroes' and not 'zeros', so let us align with it. Cc: Fam Zheng <fam@euphon.net> Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>	2020-09-02 08:48:50 +02:00
Klaus Jensen	c26f217370	hw/block/nvme: bump spec data structures to v1.3 Add missing fields in the Identify Controller and Identify Namespace data structures to bring them in line with NVMe v1.3. This also adds data structures and defines for SGL support which requires a couple of trivial changes to the nvme block driver as well. Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Acked-by: Fam Zheng <fam@euphon.net> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com> Message-Id: <20200706061303.246057-2-its@irrelevant.dk>	2020-09-02 08:48:50 +02:00
Peter Maydell	887adde81d	meson fixes: * bump submodule to 0.55.1 * SDL, pixman and zlib fixes * firmwarepath fix * fix firmware builds meson related: * move install to Meson * move NSIS to Meson * do not make meson use cmake * add description to options -----BEGIN PGP SIGNATURE----- iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAl9OcpcUHHBib256aW5p QHJlZGhhdC5jb20ACgkQv/vSX3jHroM9+Af+InfjEFtsoubgQA2L7B1sHeksINOI nMAw9plmJSX0Qabp0PJDrMcBiLZlbdFGCm/88heTyDGRtbhIXLUrE2J0dyV4R4nR OSyuTgDna75QLxy6k1dIh6qVAtcj2hg+CxaJrUf2Ix6A1d1PAfWoweNXSF1LBmJf pyQ39eZXStI7/bkmwgTY3qK1gwjEaskvf68fTp1hgTN0VHwUb23/nucPaizbVF+a A0nWprPQaTjWhAFQ2jJesBbhN6FtD3EnOZs56JOSQ9J7W6uDnw8a+tOdEDbSYKe2 DdEUEjcz2cqAMCiWrzOMgxB8T885H+8yE6jgEmPHWaUUORQADMcBDMihNA== =NGLI -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/bonzini-gitlab/tags/for-upstream' into staging meson fixes: * bump submodule to 0.55.1 * SDL, pixman and zlib fixes * firmwarepath fix * fix firmware builds meson related: * move install to Meson * move NSIS to Meson * do not make meson use cmake * add description to options # gpg: Signature made Tue 01 Sep 2020 17:11:03 BST # gpg: using RSA key F13338574B662389866C7682BFFBD25F78C7AE83 # gpg: issuer "pbonzini@redhat.com" # gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [full] # gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" [full] # Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1 # Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83 * remotes/bonzini-gitlab/tags/for-upstream: (26 commits) Makefile: Fix in-tree clean/distclean Makefile: Add back TAGS/ctags/cscope rules meson: add description to options build: fix recurse-all target meson: use pkg-config method to find dependencies configure: do not include ${prefix} in firmwarepath meson: add pixman dependency to UI modules meson: add pixman dependency to chardev/baum module meson: add NSIS building meson: use meson mandir instead of qemu_mandir meson: pass docdir option meson: use meson datadir instead of qemu_datadir meson: pass qemu_suffix option configure: build docdir like other suffixed directories configure: always /-seperate directory from qemu_suffix configure: rename confsuffix option meson: move zlib detection to meson build-sys: remove install target from Makefile meson: install $localstatedir/run for qga meson: install desktop file ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2020-09-01 22:50:23 +01:00
Liao Pingfang	f181ab4ba5	block/vmdk: Remove superfluous breaks Remove superfluous breaks, as there is a "return" before them. Signed-off-by: Liao Pingfang <liao.pingfang@zte.com.cn> Signed-off-by: Yi Wang <wang.yi59@zte.com.cn> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Thomas Huth <thuth@redhat.com> Message-Id: <1594631107-36574-1-git-send-email-wang.yi59@zte.com.cn> Signed-off-by: Laurent Vivier <laurent@vivier.eu>	2020-09-01 08:37:28 +02:00
Paolo Bonzini	a10c8516ed	block: always link with zlib The qcow2 driver needs the zlib dependency. While emulators provided it through the migration code, this is not true of the tools. Move the dependency from the qcow1 rule directly into block_ss so that it is included unconditionally. Fixes build with --disable-qcow1. Reported-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Cc: qemu-block@nongnu.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-09-01 01:51:51 -04:00
Eduardo Habkost	7c9dcd6cab	throttle-groups: Move ThrottleGroup typedef to header Move typedef closer to the type check macros, to make it easier to convert the code to OBJECT_DEFINE_TYPE() in the future. Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Tested-By: Roman Bolshakov <r.bolshakov@yadro.com> Message-Id: <20200825192110.3528606-17-ehabkost@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2020-08-27 14:04:54 -04:00
Alberto Garcia	7bbb59202a	qcow2: Assert that expand_zero_clusters_in_l1() does not support subclusters This function is only used by qcow2_expand_zero_clusters() to downgrade a qcow2 image to a previous version. This would require transforming all extended L2 entries into normal L2 entries but this is not a simple task and there are no plans to implement this at the moment. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <15e65112b4144381b4d8c0bdf8fb76b0d813e3d1.1594396418.git.berto@igalia.com> [mreitz: Fixed comment style] Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-25 10:20:15 +02:00
Alberto Garcia	2118771ddf	qcow2: Allow preallocation and backing files if extended_l2 is set Traditional qcow2 images don't allow preallocation if a backing file is set. This is because once a cluster is allocated there is no way to tell that its data should be read from the backing file. Extended L2 entries have individual allocation bits for each subcluster, and therefore it is perfectly possible to have an allocated cluster with all its subclusters unallocated. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <6d5b0f38e7dc5f2f31d8cab1cb92044e9909aece.1594396418.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-25 09:20:04 +02:00
Alberto Garcia	7be2025258	qcow2: Add the 'extended_l2' option and the QCOW2_INCOMPAT_EXTL2 bit Now that the implementation of subclusters is complete we can finally add the necessary options to create and read images with this feature, which we call "extended L2 entries". Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <6476caaa73216bd05b7bb2d504a20415e1665176.1594396418.git.berto@igalia.com> [mreitz: %s/5\.1/5.2/; fixed 302's and 303's reference output] Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-25 09:19:55 +02:00
Alberto Garcia	40dee94320	qcow2: Add prealloc field to QCowL2Meta This field allows us to indicate that the L2 metadata update does not come from a write request with actual data but from a preallocation request. For traditional images this does not make any difference, but for images with extended L2 entries this means that the clusters are allocated normally in the L2 table but individual subclusters are marked as unallocated. This will allow preallocating images that have a backing file. There is one special case: when we resize an existing image we can also request that the new clusters are preallocated. If the image already had a backing file then we have to hide any possible stale data and zero out the new clusters (see commit `955c7d6687` for more details). In this case the subclusters cannot be left as unallocated so the L2 bitmap must be updated. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <960d4c444a4f5a870e2b47e5da322a73cd9a2f5a.1594396418.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-25 08:33:20 +02:00
Alberto Garcia	0dd07b298f	qcow2: Add subcluster support to qcow2_measure() Extended L2 entries are bigger than normal L2 entries so this has an impact on the amount of metadata needed for a qcow2 file. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <7efae2efd5e36b42d2570743a12576d68ce53685.1594396418.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-25 08:33:20 +02:00
Alberto Garcia	a6841a2de6	qcow2: Add subcluster support to qcow2_co_pwrite_zeroes() This works now at the subcluster level and pwrite_zeroes_alignment is updated accordingly. qcow2_cluster_zeroize() is turned into qcow2_subcluster_zeroize() with the following changes: - The request can now be subcluster-aligned. - The cluster-aligned body of the request is still zeroized using zero_in_l2_slice() as before. - The subcluster-aligned head and tail of the request are zeroized with the new zero_l2_subclusters() function. There is just one thing to take into account for a possible future improvement: compressed clusters cannot be partially zeroized so zero_l2_subclusters() on the head or the tail can return -ENOTSUP. This makes the caller repeat the complete request and write actual zeroes to disk. This is sub-optimal because 1) if the head area was compressed we would still be able to use the fast path for the body and possibly the tail. 2) if the tail area was compressed we are writing zeroes to the head and the body areas, which are already zeroized. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <17e05e2ee7e12f10dcf012da81e83ebe27eb3bef.1594396418.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-25 08:33:20 +02:00
Alberto Garcia	bf4a66eed4	qcow2: Add subcluster support to handle_alloc_space() The bdrv_co_pwrite_zeroes() call here fills complete clusters with zeroes, but it can happen that some subclusters are not part of the write request or the copy-on-write. This patch makes sure that only the affected subclusters are overwritten. A potential improvement would be to also fill with zeroes the other subclusters if we can guarantee that we are not overwriting existing data. However this would waste more disk space, so we should first evaluate if it's really worth doing. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <b3dc97e8e2240ddb5191a4f930e8fc9653f94621.1594396418.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-25 08:33:20 +02:00
Alberto Garcia	ff4cdec7f6	qcow2: Clear the L2 bitmap when allocating a compressed cluster Compressed clusters always have the bitmap part of the extended L2 entry set to 0. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <04455b3de5dfeb9d1cfe1fc7b02d7060a6e09710.1594396418.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-25 08:33:20 +02:00
Alberto Garcia	aca00cd971	qcow2: Update L2 bitmap in qcow2_alloc_cluster_link_l2() The L2 bitmap needs to be updated after each write to indicate what new subclusters are now allocated. This needs to happen even if the cluster was already allocated and the L2 entry was otherwise valid. In some cases however a write operation doesn't need change the L2 bitmap (because all affected subclusters were already allocated). This is detected in calculate_l2_meta(), and qcow2_alloc_cluster_link_l2() is never called in those cases. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <0875620d49f44320334b6a91c73b3f301f975f38.1594396418.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-25 08:33:20 +02:00
Alberto Garcia	fc2e6528d5	qcow2: Add subcluster support to check_refcounts_l2() The offset field of an uncompressed cluster's L2 entry must be aligned to the cluster size, otherwise it is invalid. If the cluster has no data then it means that the offset points to a preallocation, so we can clear the offset field without affecting the guest-visible data. This is what 'qemu-img check' does when run in repair mode. On traditional qcow2 images this can only happen when QCOW_OFLAG_ZERO is set, and repairing such entries turns the clusters from ZERO_ALLOC into ZERO_PLAIN. Extended L2 entries have no ZERO_ALLOC clusters and no QCOW_OFLAG_ZERO but the idea is the same: if none of the subclusters are allocated then we can clear the offset field and leave the bitmap untouched. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <9f4ed1d0a34b0a545b032c31ecd8c14734065342.1594396418.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-25 08:33:20 +02:00
Alberto Garcia	a68cd70326	qcow2: Add subcluster support to discard_in_l2_slice() Two things need to be taken into account here: 1) With full_discard == true the L2 entry must be cleared completely. This also includes the L2 bitmap if the image has extended L2 entries. 2) With full_discard == false we have to make the discarded cluster read back as zeroes. With normal L2 entries this is done with the QCOW_OFLAG_ZERO bit, whereas with extended L2 entries this is done with the individual 'all zeroes' bits for each subcluster. Note however that QCOW_OFLAG_ZERO is not supported in v2 qcow2 images so, if there is a backing file, discard cannot guarantee that the image will read back as zeroes. If this is important for the caller it should forbid it as qcow2_co_pdiscard() does (see `80f5c01183` for more details). Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <5ef8274e628aa3ab559bfac467abf488534f2b76.1594396418.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-25 08:33:20 +02:00
Alberto Garcia	205fa50750	qcow2: Add subcluster support to zero_in_l2_slice() The QCOW_OFLAG_ZERO bit that indicates that a cluster reads as zeroes is only used in standard L2 entries. Extended L2 entries use individual 'all zeroes' bits for each subcluster. This must be taken into account when updating the L2 entry and also when deciding that an existing entry does not need to be updated. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <b61d61606d8c9b367bd641ab37351ddb9172799a.1594396418.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-25 08:33:20 +02:00
Alberto Garcia	3f9c6b3b1f	qcow2: Add subcluster support to qcow2_get_host_offset() The logic of this function remains pretty much the same, except that it uses count_contiguous_subclusters(), which combines the logic of count_contiguous_clusters() / count_contiguous_clusters_unallocated() and checks individual subclusters. qcow2_cluster_to_subcluster_type() is not necessary as a separate function anymore so it's inlined into its caller. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <d2193fd48653a350d80f0eca1c67b1d9053fb2f3.1594396418.git.berto@igalia.com> [mreitz: Initialize expected_type to anything] Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-25 08:33:20 +02:00
Alberto Garcia	d53ec3d8d8	qcow2: Add subcluster support to calculate_l2_meta() If an image has subclusters then there are more copy-on-write scenarios that we need to consider. Let's say we have a write request from the middle of subcluster #3 until the end of the cluster: 1) If we are writing to a newly allocated cluster then we need copy-on-write. The previous contents of subclusters #0 to #3 must be copied to the new cluster. We can optimize this process by skipping all leading unallocated or zero subclusters (the status of those skipped subclusters will be reflected in the new L2 bitmap). 2) If we are overwriting an existing cluster: 2.1) If subcluster #3 is unallocated or has the all-zeroes bit set then we need copy-on-write (on subcluster #3 only). 2.2) If subcluster #3 was already allocated then there is no need for any copy-on-write. However we still need to update the L2 bitmap to reflect possible changes in the allocation status of subclusters #4 to #31. Because of this, this function checks if all the overwritten subclusters are already allocated and in this case it returns without creating a new QCowL2Meta structure. After all these changes l2meta_cow_start() and l2meta_cow_end() are not necessarily cluster-aligned anymore. We need to update the calculation of old_start and old_end in handle_dependencies() to guarantee that no two requests try to write on the same cluster. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <4292dd56e4446d386a2fe307311737a711c00708.1594396418.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-25 08:33:20 +02:00
Alberto Garcia	97490a143e	qcow2: Handle QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC When dealing with subcluster types there is a new value called QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC that has no equivalent in QCow2ClusterType. This patch handles that value in all places where subcluster types are processed. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <bf09e2e2439a468a901bb96ace411eed9ee50295.1594396418.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-25 08:33:20 +02:00
Alberto Garcia	10dabdc596	qcow2: Replace QCOW2_CLUSTER_* with QCOW2_SUBCLUSTER_* In order to support extended L2 entries some functions of the qcow2 driver need to start dealing with subclusters instead of clusters. qcow2_get_host_offset() is modified to return the subcluster type instead of the cluster type, and all callers are updated to replace all values of QCow2ClusterType with their QCow2SubclusterType equivalents. This patch only changes the data types, there are no semantic changes. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <f6c29737c295f32cbee74c903c30b01820363b34.1594396418.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-25 08:33:20 +02:00
Alberto Garcia	ca4a0bb81b	qcow2: Add cluster type parameter to qcow2_get_host_offset() This function returns an integer that can be either an error code or a cluster type (a value from the QCow2ClusterType enum). We are going to start using subcluster types instead of cluster types in some functions so it's better to use the exact data types instead of integers for clarity and in order to detect errors more easily. This patch makes qcow2_get_host_offset() return 0 on success and puts the returned cluster type in a separate parameter. There are no semantic changes. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <396b6eab1859a271551dcd7dcba77f8934aa3c3f.1594396418.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-25 08:33:20 +02:00
Alberto Garcia	c94d037825	qcow2: Add qcow2_cluster_is_allocated() This helper function tells us if a cluster is allocated (that is, there is an associated host offset for it). Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <6d8771c5c79cbdc6c519875a5078e1cc85856d63.1594396418.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-25 08:33:20 +02:00
Alberto Garcia	70d1cbae03	qcow2: Add qcow2_get_subcluster_range_type() There are situations in which we want to know how many contiguous subclusters of the same type there are in a given cluster. This can be done by simply iterating over the subclusters and repeatedly calling qcow2_get_subcluster_type() for each one of them. However once we determined the type of a subcluster we can check the rest efficiently by counting the number of adjacent ones (or zeroes) in the bitmap. This is what this function does. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <db917263d568ec6ffb4a41cac3c9100f96bf6c18.1594396418.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-25 08:33:20 +02:00
Alberto Garcia	34905d8eb1	qcow2: Add QCow2SubclusterType and qcow2_get_subcluster_type() This patch adds QCow2SubclusterType, which is the subcluster-level version of QCow2ClusterType. All QCOW2_SUBCLUSTER_* values have the the same meaning as their QCOW2_CLUSTER_* equivalents (when they exist). See below for details and caveats. In images without extended L2 entries clusters are treated as having exactly one subcluster so it is possible to replace one data type with the other while keeping the exact same semantics. With extended L2 entries there are new possible values, and every subcluster in the same cluster can obviously have a different QCow2SubclusterType so functions need to be adapted to work on the subcluster level. There are several things that have to be taken into account: a) QCOW2_SUBCLUSTER_COMPRESSED means that the whole cluster is compressed. We do not support compression at the subcluster level. b) There are two different values for unallocated subclusters: QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN which means that the whole cluster is unallocated, and QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC which means that the cluster is allocated but the subcluster is not. The latter can only happen in images with extended L2 entries. c) QCOW2_SUBCLUSTER_INVALID is used to detect the cases where an L2 entry has a value that violates the specification. The caller is responsible for handling these situations. To prevent compatibility problems with images that have invalid values but are currently being read by QEMU without causing side effects, QCOW2_SUBCLUSTER_INVALID is only returned for images with extended L2 entries. qcow2_cluster_to_subcluster_type() is added as a separate function from qcow2_get_subcluster_type(), but this is only temporary and both will be merged in a subsequent patch. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <26ef38e270f25851c98b51278852b4c4a7f97e69.1594396418.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-25 08:33:20 +02:00
Alberto Garcia	39a9f0a50e	qcow2: Update get/set_l2_entry() and add get/set_l2_bitmap() Extended L2 entries are 128-bit wide: 64 bits for the entry itself and 64 bits for the subcluster allocation bitmap. In order to support them correctly get/set_l2_entry() need to be updated so they take the entry width into account in order to calculate the correct offset. This patch also adds the get/set_l2_bitmap() functions that are used to access the bitmaps. For convenience we allow calling get_l2_bitmap() on images without subclusters. In this case the returned value is always 0 and has no meaning. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <6ee0f81ae3329c991de125618b3675e1e46acdbb.1594396418.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-25 08:33:20 +02:00
Alberto Garcia	c8fd8554d9	qcow2: Add l2_entry_size() qcow2 images with subclusters have 128-bit L2 entries. The first 64 bits contain the same information as traditional images and the last 64 bits form a bitmap with the status of each individual subcluster. Because of that we cannot assume that L2 entries are sizeof(uint64_t) anymore. This function returns the proper value for the image. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <d34d578bd0380e739e2dde3e8dd6187d3d249fa9.1594396418.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-25 08:33:20 +02:00
Alberto Garcia	3e71981592	qcow2: Add offset_into_subcluster() and size_to_subclusters() Like offset_into_cluster() and size_to_clusters(), but for subclusters. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <3cc2390dcdef3d234d47c741b708bd8734490862.1594396418.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-25 08:33:20 +02:00
Alberto Garcia	a53e8b7202	qcow2: Add offset_to_sc_index() For a given offset, return the subcluster number within its cluster (i.e. with 32 subclusters per cluster it returns a number between 0 and 31). Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <56e3e4ac0d827c6a2f5f259106c5ddb7c4ca2653.1594396418.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-25 08:33:20 +02:00
Alberto Garcia	d0346b5591	qcow2: Add subcluster-related fields to BDRVQcow2State This patch adds the following new fields to BDRVQcow2State: - subclusters_per_cluster: Number of subclusters in a cluster - subcluster_size: The size of each subcluster, in bytes - subcluster_bits: No. of bits so 1 << subcluster_bits = subcluster_size Images without subclusters are treated as if they had exactly one subcluster per cluster (i.e. subcluster_size = cluster_size). Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <55bfeac86b092fa2c9d182a95cbeb479ff7eca4f.1594396418.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-25 08:33:20 +02:00
Alberto Garcia	a3c7d91625	qcow2: Add dummy has_subclusters() function This function will be used by the qcow2 code to check if an image has subclusters or not. At the moment this simply returns false. Once all patches needed for subcluster support are ready then QEMU will be able to create and read images with subclusters and this function will return the actual value. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <905526221083581a1b7057bca1585487661c5c13.1594396418.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-25 08:33:20 +02:00
Alberto Garcia	12c6aebedf	qcow2: Add get_l2_entry() and set_l2_entry() The size of an L2 entry is 64 bits, but if we want to have subclusters we need extended L2 entries. This means that we have to access L2 tables and slices differently depending on whether an image has extended L2 entries or not. This patch replaces all l2_slice[] accesses with calls to get_l2_entry() and set_l2_entry(). Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <9586363531fec125ba1386e561762d3e4224e9fc.1594396418.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-25 08:33:20 +02:00
Alberto Garcia	57538c864f	qcow2: Process QCOW2_CLUSTER_ZERO_ALLOC clusters in handle_copied() When writing to a qcow2 file there are two functions that take a virtual offset and return a host offset, possibly allocating new clusters if necessary: - handle_copied() looks for normal data clusters that are already allocated and have a reference count of 1. In those clusters we can simply write the data and there is no need to perform any copy-on-write. - handle_alloc() looks for clusters that do need copy-on-write, either because they haven't been allocated yet, because their reference count is != 1 or because they are ZERO_ALLOC clusters. The ZERO_ALLOC case is a bit special because those are clusters that are already allocated and they could perfectly be dealt with in handle_copied() (as long as copy-on-write is performed when required). In fact, there is extra code specifically for them in handle_alloc() that tries to reuse the existing allocation if possible and frees them otherwise. This patch changes the handling of ZERO_ALLOC clusters so the semantics of these two functions are now like this: - handle_copied() looks for clusters that are already allocated and which we can overwrite (NORMAL and ZERO_ALLOC clusters with a reference count of 1). - handle_alloc() looks for clusters for which we need a new allocation (all other cases). One important difference after this change is that clusters found in handle_copied() may now require copy-on-write, but this will be necessary anyway once we add support for subclusters. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <eb17fc938f6be7be2e8d8ff42763d2c19241f866.1594396418.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-25 08:33:20 +02:00
Alberto Garcia	c1587d877e	qcow2: Split cluster_needs_cow() out of count_cow_clusters() We are going to need it in other places. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <65e5d9627ca2ebe7e62deaeddf60949c33067d9d.1594396418.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-25 08:33:20 +02:00
Alberto Garcia	8f91d6906c	qcow2: Add calculate_l2_meta() handle_alloc() creates a QCowL2Meta structure in order to update the image metadata and perform the necessary copy-on-write operations. This patch moves that code to a separate function so it can be used from other places. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <e5bc4a648dac31972bfa7a0e554be8064be78799.1594396418.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-25 08:33:20 +02:00
Alberto Garcia	388e581615	qcow2: Convert qcow2_get_cluster_offset() into qcow2_get_host_offset() qcow2_get_cluster_offset() takes an (unaligned) guest offset and returns the (aligned) offset of the corresponding cluster in the qcow2 image. In practice none of the callers need to know where the cluster starts so this patch makes the function calculate and return the final host offset directly. The function is also renamed accordingly. There is a pre-existing exception with compressed clusters: in this case the function returns the complete cluster descriptor (containing the offset and size of the compressed data). This does not change with this patch but it is now documented. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <ffae6cdc5ca8950e8280ac0f696dcc376cb07095.1594396418.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-25 08:33:20 +02:00
Alberto Garcia	9c4269d54b	qcow2: Make Qcow2AioTask store the full host offset The file_cluster_offset field of Qcow2AioTask stores a cluster-aligned host offset. In practice this is not very useful because all users() of this structure need the final host offset into the cluster, which they calculate using host_offset = file_cluster_offset + offset_into_cluster(s, offset) There is no reason why Qcow2AioTask cannot store host_offset directly and that is what this patch does. () compressed clusters are the exception: in this case what file_cluster_offset was storing was the full compressed cluster descriptor (offset + size). This does not change with this patch but it is documented now. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <07c4b15c644dcf06c9459f98846ac1c4ea96e26f.1594396418.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-25 08:33:20 +02:00
Marc-André Lureau	5e5733e599	meson: convert block Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-08-21 06:30:18 -04:00
Paolo Bonzini	243af0225a	trace: switch position of headers to what Meson requires Meson doesn't enjoy the same flexibility we have with Make in choosing the include path. In particular the tracing headers are using $(build_root)/$(<D). In order to keep the include directives unchanged, the simplest solution is to generate headers with patterns like "trace/trace-audio.h" and place forwarding headers in the source tree such that for example "audio/trace.h" includes "trace/trace-audio.h". This patch is too ugly to be applied to the Makefiles now. It's only a way to separate the changes to the tracing header files from the Meson rewrite of the tracing logic. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-08-21 06:18:24 -04:00
Stefan Reiter	7661a886a1	block/block-copy: always align copied region to cluster size Since commit `42ac214406` (block/block-copy: refactor task creation) block_copy_task_create calculates the area to be copied via bdrv_dirty_bitmap_next_dirty_area, but that can return an unaligned byte count if the image's last cluster end is not aligned to the bitmap's granularity. Always ALIGN_UP the resulting bytes value to satisfy block_copy_do_copy, which requires the 'bytes' parameter to be aligned to cluster size. Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Stefan Reiter <s.reiter@proxmox.com> Message-Id: <20200810095523.15071-1-s.reiter@proxmox.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-10 17:12:46 +02:00
Tuguoyi	348fcc4f7a	qcow2-cluster: Fix integer left shift error in qcow2_alloc_cluster_link_l2() When calculating the offset, the result of left shift operation will be promoted to type int64 automatically because the left operand of + operator is uint64_t. but the result after integer promotion may be produce an error value for us and trigger the following asserting error. For example, consider i=0x2000, cluster_bits=18, the result of left shift operation will be 0x80000000. Cause argument i is of signed integer type, the result is automatically promoted to 0xffffffff80000000 which is not we expected The way to trigger the assertion error: qemu-img create -f qcow2 -o preallocation=full,cluster_size=256k tmpdisk 10G This patch fix it by casting @i to uint64_t before doing left shift operation Signed-off-by: Guoyi Tu <tu.guoyi@h3c.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-id: 81ba90fe0c014f269621c283269b42ad@h3c.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2020-08-05 14:56:11 +01:00
Max Reitz	fe16c7ddf8	qcow2: Release read-only bitmaps when inactivated During migration, we release all bitmaps after storing them on disk, as long as they are (1) stored on disk, (2) not read-only, and (3) consistent. (2) seems arbitrary, though. The reason we do not release them is because we do not write them, as there is no need to; and then we just forget about all bitmaps that we have not written to the file. However, read-only persistent bitmaps are still in the file and in sync with their in-memory representation, so we may as well release them just like any R/W bitmap that we have updated. It leads to actual problems, too: After migration, letting the source continue may result in an error if there were any bitmaps on read-only nodes (such as backing images), because those have not been released by bdrv_inactive_all(), but bdrv_invalidate_cache_all() attempts to reload them (which fails, because they are still present in memory). Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200730120234.49288-2-mreitz@redhat.com> Tested-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2020-08-03 08:59:37 -05:00
Peter Maydell	5045be872d	nbd patches for 2020-07-28 - fix NBD handling of trim/zero requests larger than 2G - allow no-op resizes on NBD (in turn fixing qemu-img convert -c into NBD) - several deadlock fixes when using NBD reconnect -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEccLMIrHEYCkn0vOqp6FrSiUnQ2oFAl8gPV4ACgkQp6FrSiUn Q2ozdQgAiDHaHG2NX4jmduID7677/XhsLoVl1MV7UZnU+y9qQ2p+Mbsw1oMneu8P Dtfgx/mlWVGu68gn31f4xVq74VTZH6p3IGV7PMcYZ50xbESoFs6CYUwUWUp1GeC3 +kPOl0EpLvm1W/V93sKmg8FflGmNiJHNkfl/ddfk0gs6Z3EfjkmGJt7IP/pv1UCs 4icWvCJsqw2z8TnEwtTpMX5HZlWth1x37lUOShlPL5kA5hZqU+zYU/bYB5iKx+16 MebYg7C7CXYCCtH9cDH/swUWhOdQLkywA6yBAwc1zENsKy84aIAJIUls/Ji0q6CY A4s5c0FovLBuMDd9oLr0kJbkJQeVZA== =DD6l -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/ericb/tags/pull-nbd-2020-07-28' into staging nbd patches for 2020-07-28 - fix NBD handling of trim/zero requests larger than 2G - allow no-op resizes on NBD (in turn fixing qemu-img convert -c into NBD) - several deadlock fixes when using NBD reconnect # gpg: Signature made Tue 28 Jul 2020 15:59:42 BST # gpg: using RSA key 71C2CC22B1C4602927D2F3AAA7A16B4A2527436A # gpg: Good signature from "Eric Blake <eblake@redhat.com>" [full] # gpg: aka "Eric Blake (Free Software Programmer) <ebb9@byu.net>" [full] # gpg: aka "[jpeg image of size 6874]" [full] # Primary key fingerprint: 71C2 CC22 B1C4 6029 27D2 F3AA A7A1 6B4A 2527 436A * remotes/ericb/tags/pull-nbd-2020-07-28: block/nbd: nbd_co_reconnect_loop(): don't sleep if drained block/nbd: on shutdown terminate connection attempt block/nbd: allow drain during reconnect attempt block/nbd: split nbd_establish_connection out of nbd_client_connect iotests: Test convert to qcow2 compressed to NBD iotests: Add more qemu_img helpers iotests: Make qemu_nbd_popen() a contextmanager block: nbd: Fix convert qcow2 compressed to nbd nbd: Fix large trim/zero requests Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2020-07-28 20:43:03 +01:00
Peter Maydell	0c4fa5bc1a	Block patches for 5.1.0: - Fix block I/O for split transfers - Fix iotest 197 for non-qcow2 formats -----BEGIN PGP SIGNATURE----- iQFGBAABCAAwFiEEkb62CjDbPohX0Rgp9AfbAGHVz0AFAl8gK/gSHG1yZWl0ekBy ZWRoYXQuY29tAAoJEPQH2wBh1c9AR+kIALv+Z/A6SPpsAHjpyuRbluuhznfqPuiX mIVX0qNhsFBDAUVw1tOkMtfxOIvuaQW/QWzM0UPaHqB/I4ckzE6Dp98ys9uwHPdq ez23blWvBuB3P3y2ZBAYhhRlCqt3w4uI/lIJMu7VZBghXxj3fGcuTnLlWx8gb1IH 74MiBX8XPt532FiFTnpzxgns8NYkZY8mF6zduGqBPx6bPmdNdDfqAhL68Fv8uKJA k4dVH6ffPLZD+RrCz9GL5rsYQ6NR6tfyEoRMPqtJznhtzWwu5h5EF3p46VkcKheI k0axygEBAr9JbeCwbIK3a4hjQ7eaFQ6j9JR+lPZBRaDbLHv/xGNNuvw= =C4Lq -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/maxreitz/tags/pull-block-2020-07-28' into staging Block patches for 5.1.0: - Fix block I/O for split transfers - Fix iotest 197 for non-qcow2 formats # gpg: Signature made Tue 28 Jul 2020 14:45:28 BST # gpg: using RSA key 91BEB60A30DB3E8857D11829F407DB0061D5CF40 # gpg: issuer "mreitz@redhat.com" # gpg: Good signature from "Max Reitz <mreitz@redhat.com>" [full] # Primary key fingerprint: 91BE B60A 30DB 3E88 57D1 1829 F407 DB00 61D5 CF40 * remotes/maxreitz/tags/pull-block-2020-07-28: iotests/197: Fix for non-qcow2 formats iotests/028: Add test for cross-base-EOF reads block: Fix bdrv_aligned_p*v() for qiov_offset != 0 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2020-07-28 18:00:21 +01:00
Vladimir Sementsov-Ogievskiy	12c75e20a2	block/nbd: nbd_co_reconnect_loop(): don't sleep if drained We try to go to wakeable sleep, so that, if drain begins it will break the sleep. But what if nbd_client_co_drain_begin() already called and s->drained is already true? We'll go to sleep, and drain will have to wait for the whole timeout. Let's improve it. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200727184751.15704-5-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2020-07-28 09:54:43 -05:00
Vladimir Sementsov-Ogievskiy	fbeb3e63b3	block/nbd: on shutdown terminate connection attempt On shutdown nbd driver may be in a connecting state. We should shutdown it as well, otherwise we may hang in nbd_teardown_connection, waiting for conneciton_co to finish in BDRV_POLL_WHILE(bs, s->connection_co) loop if remote server is down. How to reproduce the dead lock: 1. Create nbd-fault-injector.conf with the following contents: [inject-error "mega1"] event=data io=readwrite when=before 2. In one terminal run nbd-fault-injector in a loop, like this: n=1; while true; do echo $n; ((n++)); ./nbd-fault-injector.py 127.0.0.1:10000 nbd-fault-injector.conf; done 3. In another terminal run qemu-io in a loop, like this: n=1; while true; do echo $n; ((n++)); ./qemu-io -c 'read 0 512' nbd://127.0.0.1:10000; done After some time, qemu-io will hang. Note, that this hang may be triggered by another bug, so the whole case is fixed only together with commit "block/nbd: allow drain during reconnect attempt". Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200727184751.15704-4-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2020-07-28 09:54:43 -05:00
Vladimir Sementsov-Ogievskiy	dd1ec1a4af	block/nbd: allow drain during reconnect attempt It should be safe to reenter qio_channel_yield() on io/channel read/write path, so it's safe to reduce in_flight and allow attaching new aio context. And no problem to allow drain itself: connection attempt is not a guest request. Moreover, if remote server is down, we can hang in negotiation, blocking drain section and provoking a dead lock. How to reproduce the dead lock: 1. Create nbd-fault-injector.conf with the following contents: [inject-error "mega1"] event=data io=readwrite when=before 2. In one terminal run nbd-fault-injector in a loop, like this: n=1; while true; do echo $n; ((n++)); ./nbd-fault-injector.py 127.0.0.1:10000 nbd-fault-injector.conf; done 3. In another terminal run qemu-io in a loop, like this: n=1; while true; do echo $n; ((n++)); ./qemu-io -c 'read 0 512' nbd://127.0.0.1:10000; done After some time, qemu-io will hang trying to drain, for example, like this: #3 aio_poll (ctx=0x55f006bdd890, blocking=true) at util/aio-posix.c:600 #4 bdrv_do_drained_begin (bs=0x55f006bea710, recursive=false, parent=0x0, ignore_bds_parents=false, poll=true) at block/io.c:427 #5 bdrv_drained_begin (bs=0x55f006bea710) at block/io.c:433 #6 blk_drain (blk=0x55f006befc80) at block/block-backend.c:1710 #7 blk_unref (blk=0x55f006befc80) at block/block-backend.c:498 #8 bdrv_open_inherit (filename=0x7fffba1563bc "nbd+tcp://127.0.0.1:10000", reference=0x0, options=0x55f006be86d0, flags=24578, parent=0x0, child_class=0x0, child_role=0, errp=0x7fffba154620) at block.c:3491 #9 bdrv_open (filename=0x7fffba1563bc "nbd+tcp://127.0.0.1:10000", reference=0x0, options=0x0, flags=16386, errp=0x7fffba154620) at block.c:3513 #10 blk_new_open (filename=0x7fffba1563bc "nbd+tcp://127.0.0.1:10000", reference=0x0, options=0x0, flags=16386, errp=0x7fffba154620) at block/block-backend.c:421 And connection_co stack like this: #0 qemu_coroutine_switch (from_=0x55f006bf2650, to_=0x7fe96e07d918, action=COROUTINE_YIELD) at util/coroutine-ucontext.c:302 #1 qemu_coroutine_yield () at util/qemu-coroutine.c:193 #2 qio_channel_yield (ioc=0x55f006bb3c20, condition=G_IO_IN) at io/channel.c:472 #3 qio_channel_readv_all_eof (ioc=0x55f006bb3c20, iov=0x7fe96d729bf0, niov=1, errp=0x7fe96d729eb0) at io/channel.c:110 #4 qio_channel_readv_all (ioc=0x55f006bb3c20, iov=0x7fe96d729bf0, niov=1, errp=0x7fe96d729eb0) at io/channel.c:143 #5 qio_channel_read_all (ioc=0x55f006bb3c20, buf=0x7fe96d729d28 "\300.\366\004\360U", buflen=8, errp=0x7fe96d729eb0) at io/channel.c:247 #6 nbd_read (ioc=0x55f006bb3c20, buffer=0x7fe96d729d28, size=8, desc=0x55f004f69644 "initial magic", errp=0x7fe96d729eb0) at /work/src/qemu/master/include/block/nbd.h:365 #7 nbd_read64 (ioc=0x55f006bb3c20, val=0x7fe96d729d28, desc=0x55f004f69644 "initial magic", errp=0x7fe96d729eb0) at /work/src/qemu/master/include/block/nbd.h:391 #8 nbd_start_negotiate (aio_context=0x55f006bdd890, ioc=0x55f006bb3c20, tlscreds=0x0, hostname=0x0, outioc=0x55f006bf19f8, structured_reply=true, zeroes=0x7fe96d729dca, errp=0x7fe96d729eb0) at nbd/client.c:904 #9 nbd_receive_negotiate (aio_context=0x55f006bdd890, ioc=0x55f006bb3c20, tlscreds=0x0, hostname=0x0, outioc=0x55f006bf19f8, info=0x55f006bf1a00, errp=0x7fe96d729eb0) at nbd/client.c:1032 #10 nbd_client_connect (bs=0x55f006bea710, errp=0x7fe96d729eb0) at block/nbd.c:1460 #11 nbd_reconnect_attempt (s=0x55f006bf19f0) at block/nbd.c:287 #12 nbd_co_reconnect_loop (s=0x55f006bf19f0) at block/nbd.c:309 #13 nbd_connection_entry (opaque=0x55f006bf19f0) at block/nbd.c:360 #14 coroutine_trampoline (i0=113190480, i1=22000) at util/coroutine-ucontext.c:173 Note, that the hang may be triggered by another bug, so the whole case is fixed only together with commit "block/nbd: on shutdown terminate connection attempt". Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20200727184751.15704-3-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2020-07-28 09:54:43 -05:00
Vladimir Sementsov-Ogievskiy	fa35591b9c	block/nbd: split nbd_establish_connection out of nbd_client_connect We are going to implement non-blocking version of nbd_establish_connection, which for a while will be used only for nbd_reconnect_attempt, not for nbd_open, so we need to call it separately. Refactor nbd_reconnect_attempt in a way which makes next commit simpler. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20200727184751.15704-2-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2020-07-28 09:54:43 -05:00
Nir Soffer	a2b333c018	block: nbd: Fix convert qcow2 compressed to nbd When converting to qcow2 compressed format, the last step is a special zero length compressed write, ending in a call to bdrv_co_truncate(). This call always fails for the nbd driver since it does not implement bdrv_co_truncate(). For block devices, which have the same limits, the call succeeds since the file driver implements bdrv_co_truncate(). If the caller asked to truncate to the same or smaller size with exact=false, the truncate succeeds. Implement the same logic for nbd. Example failing without this change: In one shell start qemu-nbd: $ truncate -s 1g test.tar $ qemu-nbd --socket=/tmp/nbd.sock --persistent --format=raw --offset 1536 test.tar In another shell convert an image to qcow2 compressed via NBD: $ echo "disk data" > disk.raw $ truncate -s 1g disk.raw $ qemu-img convert -f raw -O qcow2 -c disk1.raw nbd+unix:///?socket=/tmp/nbd.sock; echo $? 1 qemu-img failed, but the conversion was successful: $ qemu-img info nbd+unix:///?socket=/tmp/nbd.sock image: nbd+unix://?socket=/tmp/nbd.sock file format: qcow2 virtual size: 1 GiB (1073741824 bytes) ... $ qemu-img check nbd+unix:///?socket=/tmp/nbd.sock No errors were found on the image. 1/16384 = 0.01% allocated, 100.00% fragmented, 100.00% compressed clusters Image end offset: 393216 $ qemu-img compare disk.raw nbd+unix:///?socket=/tmp/nbd.sock Images are identical. Fixes: https://bugzilla.redhat.com/1860627 Signed-off-by: Nir Soffer <nsoffer@redhat.com> Message-Id: <20200727215846.395443-2-nsoffer@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> [eblake: typo fixes] Signed-off-by: Eric Blake <eblake@redhat.com>	2020-07-28 09:54:19 -05:00
Peter Maydell	2649915121	bitmaps patches for 2020-07-27 - Improve handling of various post-copy bitmap migration scenarios. A lost bitmap should merely mean that the next backup must be full rather than incremental, rather than abruptly breaking the entire guest migration. - Associated iotest improvements -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEccLMIrHEYCkn0vOqp6FrSiUnQ2oFAl8fPRkACgkQp6FrSiUn Q2qanQf/dRTrqZ7/hs8aENySf44o0dBzOLZr+FBcrqEj2sd0c6jPzV2X5CVtnA1v gBgKJJGLpti3mSeNQDbaXZIQrsesBAuxvJsc6vZ9npDCdMYnK/qPE3Zfw1bx12qR cb39ba28P4izgs216h92ZACtUewnvjkxyJgN7zfmCJdNcwZINMUItAS183tSbQjn n39Wb7a+umsRgV9HQv/6cXlQIPqFMyAOl5kkzV3evuw7EBoHFnNq4cjPrUnjkqiD xf2pcSomaedYd37SpvoH57JxfL3z/90OBcuXhFvbqFk4FgQ63rJ32nRve2ZbIDI0 XPbohnYjYoFv6Xs/jtTzctZCbZ+jTg== =1dmz -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/ericb/tags/pull-bitmaps-2020-07-27' into staging bitmaps patches for 2020-07-27 - Improve handling of various post-copy bitmap migration scenarios. A lost bitmap should merely mean that the next backup must be full rather than incremental, rather than abruptly breaking the entire guest migration. - Associated iotest improvements # gpg: Signature made Mon 27 Jul 2020 21:46:17 BST # gpg: using RSA key 71C2CC22B1C4602927D2F3AAA7A16B4A2527436A # gpg: Good signature from "Eric Blake <eblake@redhat.com>" [full] # gpg: aka "Eric Blake (Free Software Programmer) <ebb9@byu.net>" [full] # gpg: aka "[jpeg image of size 6874]" [full] # Primary key fingerprint: 71C2 CC22 B1C4 6029 27D2 F3AA A7A1 6B4A 2527 436A * remotes/ericb/tags/pull-bitmaps-2020-07-27: (24 commits) migration: Fix typos in bitmap migration comments iotests: Adjust which migration tests are quick qemu-iotests/199: add source-killed case to bitmaps postcopy qemu-iotests/199: add early shutdown case to bitmaps postcopy qemu-iotests/199: check persistent bitmaps qemu-iotests/199: prepare for new test-cases addition migration/savevm: don't worry if bitmap migration postcopy failed migration/block-dirty-bitmap: cancel migration on shutdown migration/block-dirty-bitmap: relax error handling in incoming part migration/block-dirty-bitmap: keep bitmap state for all bitmaps migration/block-dirty-bitmap: simplify dirty_bitmap_load_complete migration/block-dirty-bitmap: rename finish_lock to just lock migration/block-dirty-bitmap: refactor state global variables migration/block-dirty-bitmap: move mutex init to dirty_bitmap_mig_init migration/block-dirty-bitmap: rename dirty_bitmap_mig_cleanup migration/block-dirty-bitmap: rename state structure types migration/block-dirty-bitmap: fix dirty_bitmap_mig_before_vm_start qemu-iotests/199: increase postcopy period qemu-iotests/199: change discard patterns qemu-iotests/199: improve performance: set bitmap by discard ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2020-07-28 14:38:17 +01:00
Max Reitz	134b7dec6e	block: Fix bdrv_aligned_p*v() for qiov_offset != 0 Since these functions take a @qiov_offset, they must always take it into account when working with @qiov. There are a couple of places where they do not, but they should. Fixes: `65cd4424b9` ("block/io: bdrv_aligned_preadv: use and support qiov_offset") Fixes: `28c4da2869` ("block/io: bdrv_aligned_pwritev: use and support qiov_offset") Reported-by: Claudio Fontana <cfontana@suse.de> Reported-by: Bruce Rogers <brogers@suse.com> Cc: qemu-stable@nongnu.org Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200728120806.265916-2-mreitz@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Tested-by: Claudio Fontana <cfontana@suse.de> Tested-by: Bruce Rogers <brogers@suse.com>	2020-07-28 15:28:47 +02:00
Andrey Shinkevich	8098969cf2	qcow2: Fix capitalization of header extension constant. Make the capitalization of the hexadecimal numbers consistent for the QCOW2 header extension constants in docs/interop/qcow2.txt. Suggested-by: Eric Blake <eblake@redhat.com> Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <1594973699-781898-2-git-send-email-andrey.shinkevich@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2020-07-27 15:39:58 -05:00
Max Reitz	984c367814	block/amend: Check whether the node exists We should check whether the user-specified node-name actually refers to a node. The simplest way to do that is to use bdrv_lookup_bs() instead of bdrv_find_node() (the former wraps the latter, and produces an error message if necessary). Reported-by: Coverity (CID 1430268) Fixes: `ced914d0ab` Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200710095037.10885-1-mreitz@redhat.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>	2020-07-27 12:37:25 +02:00
Peter Maydell	0c1fd2f41f	Block layer patches: - file-posix: Handle `EINVAL` fallocate return value - qemu-img convert -n: Keep qcow2 v2 target sparse -----BEGIN PGP SIGNATURE----- iQJFBAABCAAvFiEE3D3rFZqa+V09dFb+fwmycsiPL9YFAl8XDZgRHGt3b2xmQHJl ZGhhdC5jb20ACgkQfwmycsiPL9bpFQ/9EnS4iV9w0KW/NuJ4FIVdBD/VZFzokDLi 1vXVVEjoxAxxiP8KlGM9HRi5NtvOMgzKhNGias0wOFiBorx8Ppfc+3sqwygc2dnw Vbl/od2D7xQZkddnp4Upo70m+eWRW6xaxX+lAcl6iS3gBPDwExLaYfBN8lFUyRrs T4C0miD+abEEyL3C5A4cEZJ7CIs0n7AqZkqgytWA7clwy79VgDSuMOgP6DOP1tGH 1uK4gMCB0xbn+PHk96lXPORcLwDBOP0PIluo/zBmffzsEZN1Lv5ddVmxMQWSivin UmAbpeEtSw9Py5lRVmLSBYvolVOUleE/Rlzad2iue2be5/G8VP8xiRYMp9mUVpLO +LPMUd9NRkPx7wjUJMPKF0G9FgVO7R0+9J6rC33aKBj2XAlxY6qQlqUN2Jo11/fK 2+9AkU7WVqx3vuW2Zz7wjq3Rjvpg/sK+V3P3Cm6HTwwwPbEwv8GcFe6eKdvJrZ9K hhwiFSUOd90OUAdKOQXKMFSZ/t1TrZhdX882Hvth11/AlQAUY4cxQbSKcc2nrvLu Axk0Va3haOD+ReRTs8W/iYNdrXGZmbr3MCkNiK3QSvnrdj602ompco7xyDTX1/qH 6Hu28q7jUG3p3cApLQIZVjmogfqcGU7SWIY4lp9HZqtGP0z+pmWg46UNqzlKLyv5 Y/fVHHshlRU= =QhOf -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging Block layer patches: - file-posix: Handle `EINVAL` fallocate return value - qemu-img convert -n: Keep qcow2 v2 target sparse # gpg: Signature made Tue 21 Jul 2020 16:45:28 BST # gpg: using RSA key DC3DEB159A9AF95D3D7456FE7F09B272C88F2FD6 # gpg: issuer "kwolf@redhat.com" # gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" [full] # Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6 * remotes/kevin/tags/for-upstream: iotests: Test sparseness for qemu-img convert -n qcow2: Implement v2 zero writes with discard if possible file-posix: Handle `EINVAL` fallocate return value Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2020-07-21 19:25:48 +01:00
Peter Maydell	b50dab9eca	QOM patches for 2020-07-21 -----BEGIN PGP SIGNATURE----- iQJGBAABCAAwFiEENUvIs9frKmtoZ05fOHC0AOuRhlMFAl8XDGsSHGFybWJydUBy ZWRoYXQuY29tAAoJEDhwtADrkYZTPTYP/11AHdREDvz+KDl7AXtndxE9wo6GPGUy GhMSXODgOm6a4/nk4xQQR7MU+57C2IMnYbAAdrNXA3YVHjUPMURVsAeTsl9ruchC 1JiNrtXKgaWv5P8WeNzuXcH9B7n3UZ/mV4FH8v0FhjZhsH6EHP0zOhgNScPyQurz AsmYN5hSoSz8jRFQ8xlNzBhNrYX5dG6fOs4M6yBRUgzaCuRcSn/+cgjngfAmcH4k F0KK+rBQA+KUAKYp0QUaUcbD5oj/6KaQMsfqsFROR8m+AAJrhkP2ffmcDCHfCW8A SQsp8k2SkdHXNgRoDJSea1TYMitCFrTBDv+MfBWOLH4ewQrGAzvsuG5o3FlnVeN7 CKHkkxqOEifJ/vnThVBTIsqxf8/HefDXi1B5BXSKYSvnPKnnh8HCZ7VxvAsNGZiR epr4gEGBCcb9/bXFUmVVPz6H+lUWORzhF4P0spNJwH5BT9FLybWgJt9o4KH+pxYA DL4GF5vOkBgIhgUR+vn535vik7M38u6gsB8m22s2FRkZmIxTpxp9eH2ehHGSoVYM Yl1kVzJmFMPakl0gG1dMmM4+DJGRTHLCfBFS4pzMs9DaCNHUF3CB8FjXQs3NIFCS XGnChbri/wF83DEueTIrUAqF2w0XgEy55aVBOZkmFT6DXPXbx+Y3Q/AomaktBLyv FFUe9SfMn63P =HTNj -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/armbru/tags/pull-qom-2020-07-21' into staging QOM patches for 2020-07-21 # gpg: Signature made Tue 21 Jul 2020 16:40:27 BST # gpg: using RSA key 354BC8B3D7EB2A6B68674E5F3870B400EB918653 # gpg: issuer "armbru@redhat.com" # gpg: Good signature from "Markus Armbruster <armbru@redhat.com>" [full] # gpg: aka "Markus Armbruster <armbru@pond.sub.org>" [full] # Primary key fingerprint: 354B C8B3 D7EB 2A6B 6867 4E5F 3870 B400 EB91 8653 * remotes/armbru/tags/pull-qom-2020-07-21: qom: Make info qom-tree sort children more efficiently qom: Document object_get_canonical_path() returns malloced string qom: Change object_get_canonical_path_component() not to malloc Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2020-07-21 18:31:52 +01:00
Kevin Wolf	61b3043965	qcow2: Implement v2 zero writes with discard if possible qcow2 version 2 images don't support the zero flag for clusters, so for write_zeroes requests, we return -ENOTSUP and get explicit zero buffer writes. If the image doesn't have a backing file, we can do better: Just discard the respective clusters. This is relevant for 'qemu-img convert -O qcow2 -n', where qemu-img has to assume that the existing target image may contain any data, so it has to write zeroes. Without this patch, this results in a fully allocated target image, even if the source image was empty. Reported-by: Nir Soffer <nsoffer@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20200721135520.72355-2-kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-07-21 16:28:57 +02:00
Antoine Damhet	bae127d4dc	file-posix: Handle `EINVAL` fallocate return value The `detect-zeroes=unmap` option may issue unaligned `FALLOC_FL_PUNCH_HOLE` requests, raw block devices can (and will) return `EINVAL`, qemu should then write the zeroes to the blockdev instead of issuing an `IO_ERROR`. The problem can be reprodced like this: $ qemu-io -c 'write -P 0 42 1234' --image-opts driver=host_device,filename=/dev/loop0,detect-zeroes=unmap write failed: Invalid argument Signed-off-by: Antoine Damhet <antoine.damhet@blade-group.com> Message-Id: <20200717135603.51180-1-antoine.damhet@blade-group.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-07-21 16:28:57 +02:00
Markus Armbruster	7a309cc95b	qom: Change object_get_canonical_path_component() not to malloc object_get_canonical_path_component() returns a malloced copy of a property name on success, null on failure. 19 of its 25 callers immediately free the returned copy. Change object_get_canonical_path_component() to return the property name directly. Since modifying the name would be wrong, adjust the return type to const char *. Drop the free from the 19 callers become simpler, add the g_strdup() to the other six. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20200714160202.3121879-4-armbru@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Li Qiang <liq3ea@gmail.com>	2020-07-21 16:23:43 +02:00
Stefan Hajnoczi	1d719ddc35	block: fix bdrv_aio_cancel() for ENOMEDIUM requests bdrv_aio_cancel() calls aio_poll() on the AioContext for the given I/O request until it has completed. ENOMEDIUM requests are special because there is no BlockDriverState when the drive has no medium! Define a .get_aio_context() function for BlkAioEmAIOCB requests so that bdrv_aio_cancel() can find the AioContext where the completion BH is pending. Without this function bdrv_aio_cancel() aborts on ENOMEDIUM requests! libFuzzer triggered the following assertion: cat << EOF \| qemu-system-i386 -M pc-q35-5.0 \ -nographic -monitor none -serial none \ -qtest stdio -trace ide\* outl 0xcf8 0x8000fa24 outl 0xcfc 0xe106c000 outl 0xcf8 0x8000fa04 outw 0xcfc 0x7 outl 0xcf8 0x8000fb20 write 0x0 0x3 0x2780e7 write 0xe106c22c 0xd 0x1130c218021130c218021130c2 write 0xe106c218 0x15 0x110010110010110010110010110010110010110010 EOF ide_exec_cmd IDE exec cmd: bus 0x56170a77a2b8; state 0x56170a77a340; cmd 0xe7 ide_reset IDEstate 0x56170a77a340 Aborted (core dumped) (gdb) bt #1 0x00007ffff4f93895 in abort () at /lib64/libc.so.6 #2 0x0000555555dc6c00 in bdrv_aio_cancel (acb=0x555556765550) at block/io.c:2745 #3 0x0000555555dac202 in blk_aio_cancel (acb=0x555556765550) at block/block-backend.c:1546 #4 0x0000555555b1bd74 in ide_reset (s=0x555557213340) at hw/ide/core.c:1318 #5 0x0000555555b1e3a1 in ide_bus_reset (bus=0x5555572132b8) at hw/ide/core.c:2422 #6 0x0000555555b2aa27 in ahci_reset_port (s=0x55555720eb50, port=2) at hw/ide/ahci.c:650 #7 0x0000555555b29fd7 in ahci_port_write (s=0x55555720eb50, port=2, offset=44, val=16) at hw/ide/ahci.c:360 #8 0x0000555555b2a564 in ahci_mem_write (opaque=0x55555720eb50, addr=556, val=16, size=1) at hw/ide/ahci.c:513 #9 0x000055555598415b in memory_region_write_accessor (mr=0x55555720eb80, addr=556, value=0x7fffffffb838, size=1, shift=0, mask=255, attrs=...) at softmmu/memory.c:483 Looking at bdrv_aio_cancel: 2728 /* async I/Os / 2729 2730 void bdrv_aio_cancel(BlockAIOCB acb) 2731 { 2732 qemu_aio_ref(acb); 2733 bdrv_aio_cancel_async(acb); 2734 while (acb->refcnt > 1) { 2735 if (acb->aiocb_info->get_aio_context) { 2736 aio_poll(acb->aiocb_info->get_aio_context(acb), true); 2737 } else if (acb->bs) { 2738 /* qemu_aio_ref and qemu_aio_unref are not thread-safe, so 2739 * assert that we're not using an I/O thread. Thread-safe 2740 * code should use bdrv_aio_cancel_async exclusively. 2741 */ 2742 assert(bdrv_get_aio_context(acb->bs) == qemu_get_aio_context()); 2743 aio_poll(bdrv_get_aio_context(acb->bs), true); 2744 } else { 2745 abort(); <=============== 2746 } 2747 } 2748 qemu_aio_unref(acb); 2749 } Fixes: `02c50efe08` ("block: Add bdrv_aio_cancel_async") Reported-by: Alexander Bulekov <alxndr@bu.edu> Buglink: https://bugs.launchpad.net/qemu/+bug/1878255 Originally-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20200720100141.129739-1-stefanha@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-07-21 12:00:38 +02:00
Maxim Levitsky	662d0c5392	block/crypto: disallow write sharing by default My commit 'block/crypto: implement the encryption key management' accidently allowed raw luks images to be shared between different qemu processes without share-rw=on explicit override. Fix that. Fixes: `bbfdae91fb` ("block/crypto: implement the encryption key management") Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1857490 Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20200719122059.59843-2-mlevitsk@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-07-21 10:49:02 +02:00
Kevin Wolf	a8c5cf27c9	file-posix: Fix leaked fd in raw_open_common() error path Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20200717105426.51134-4-kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-07-17 14:20:57 +02:00
Kevin Wolf	bca5283bd4	file-posix: Fix check_hdev_writable() with auto-read-only For Linux block devices, being able to open the device read-write doesn't necessarily mean that the device is actually writable (one example is a read-only LV, as you get with lvchange -pr <device>). We have check_hdev_writable() to check this condition and fail opening the image read-write if it's not actually writable. However, this check doesn't take auto-read-only into account, but results in a hard failure instead of downgrading to read-only where possible. Fix this and do the writable check not based on BDRV_O_RDWR, but only when this actually results in opening the file read-write. A second check is inserted in raw_reconfigure_getfd() to have the same check when dynamic auto-read-only upgrades an image file from read-only to read-write. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20200717105426.51134-3-kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-07-17 14:20:57 +02:00
Kevin Wolf	20eaf1bf6e	file-posix: Move check_hdev_writable() up We'll need to call it in raw_open_common(), so move the function to avoid a forward declaration. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20200717105426.51134-2-kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-07-17 14:20:57 +02:00
Kevin Wolf	5edc85571e	file-posix: Allow byte-aligned O_DIRECT with NFS Since commit `a6b257a08e` ('file-posix: Handle undetectable alignment'), we assume that if we open a file with O_DIRECT and alignment probing returns 1, we just couldn't find out the real alignment requirement because some filesystems make the requirement only for allocated blocks. In this case, a safe default of 4k is used. This is too strict for NFS, which does actually allow byte-aligned requests even with O_DIRECT. Because we can't distinguish both cases with generic code, let's just look at the file system magic and disable s->needs_alignment for NFS. This way, O_DIRECT can still be used on NFS for images that are not aligned to 4k. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200716142601.111237-3-kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-07-17 14:20:57 +02:00
Marc-André Lureau	a08464521c	Remove VXHS block device The vxhs code doesn't compile since v2.12.0. There's no point in fixing and then adding CI for a config that our users have demonstrated that they do not use; better to just remove it. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20200711065926.2204721-1-marcandre.lureau@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-07-17 14:20:57 +02:00
Peter Maydell	d2628b1eb7	Block layer patches: - file-posix: Mitigate file fragmentation with extent size hints - Tighten qemu-img rules on missing backing format - qemu-img map: Don't limit block status request size - Fix crash with virtio-scsi and iothreads -----BEGIN PGP SIGNATURE----- iQJFBAABCAAvFiEE3D3rFZqa+V09dFb+fwmycsiPL9YFAl8NsgMRHGt3b2xmQHJl ZGhhdC5jb20ACgkQfwmycsiPL9Z0tA//eqauxD7cTEpwrtLNrRtpiBtMG64BBpxz QfkURzB38bMVahHlwq3Gt7Zcov8V4V7vxK66h688Z/fhw3vmqIeVe8+P6+Y5s9FL jil8lewHuLTa6xELeugoV7SZXH8AAh1W2fQmiR7EPiOmpSE0wf7C5IShVlX8A04E r0n09+61qGjRIe1hNTwTtldqQEfx6UGnxQWcQb81JUPA1lZhX3cnPg/j94Bofr+m v/DbVTfsmUtTMjc0PdU7n4DKTWu8OS5B/X0unF21rTtO//cYBrhAeY3ax2jbFBWi CIZK8HLI5m9/HFyltql1LOsd+B5TtfnXMfSdvDh2jaVUlto7wTeTnWU1fv4wxUB5 hk7XgJo/y203ebFNHpTmW8tvLfGTP8uqCVfOEFxzjy+JHGrarlbWkwL2LMOFFAZ2 s2WcwlfqiYGFTG4+OFdhPf9qPWKSqMr+jTdZJTse64/c6+YXWHk+pP9lfYEUOgSi OYwdQUY9uiZ1K13q5Tif2TbFvs+c118xdTgVhAV7VtfPnWc3c647dX7iaq8Szknc IT93670Iqf/PzEj+L7XUbbLIIsAcmxD0sr7QAQEt7bfiYIDRIQLiVPyzXplETFg2 SEkvtqBovm84ct7pWQzqA6lFvr3oIFDNquR40XFGozHNnlBeNi5s7pXQnqUBLElr wDDuEi+z5QM= =DB0q -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging Block layer patches: - file-posix: Mitigate file fragmentation with extent size hints - Tighten qemu-img rules on missing backing format - qemu-img map: Don't limit block status request size - Fix crash with virtio-scsi and iothreads # gpg: Signature made Tue 14 Jul 2020 14:24:19 BST # gpg: using RSA key DC3DEB159A9AF95D3D7456FE7F09B272C88F2FD6 # gpg: issuer "kwolf@redhat.com" # gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" [full] # Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6 * remotes/kevin/tags/for-upstream: block: Avoid stale pointer dereference in blk_get_aio_context() qemu-img: Deprecate use of -b without -F block: Add support to warn on backing file change without format iotests: Specify explicit backing format where sensible qcow2: Deprecate use of qemu-img amend to change backing file block: Error if backing file fails during creation without -u qcow: Tolerate backing_fmt= vmdk: Add trivial backing_fmt support sheepdog: Add trivial backing_fmt support block: Finish deprecation of 'qemu-img convert -n -o' qemu-img: Flush stdout before before potential stderr messages file-posix: Mitigate file fragmentation with extent size hints iotests/059: Filter out disk size with more standard filter qemu-img map: Don't limit block status request size iotests: Simplify _filter_img_create() a bit Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2020-07-14 19:39:52 +01:00
Greg Kurz	e6cada9231	block: Avoid stale pointer dereference in blk_get_aio_context() It is possible for blk_remove_bs() to race with blk_drain_all(), causing the latter to dereference a stale blk->root pointer: blk_remove_bs(blk) bdrv_root_unref_child(blk->root) child_bs = blk->root->bs bdrv_detach_child(blk->root) ... g_free(blk->root) <============== blk->root becomes stale bdrv_unref(child_bs) <============ yield at some point A blk_drain_all() can be triggered by some guest action in the meantime, eg. on POWER, SLOF might disable bus mastering on a virtio-scsi-pci device: virtio_write_config() virtio_pci_stop_ioeventfd() virtio_bus_stop_ioeventfd() virtio_scsi_dataplane_stop() blk_drain_all() blk_get_aio_context() bs = blk->root ? blk->root->bs : NULL ^^^^^^^^^ stale Then, depending on one's luck, QEMU either crashes with SEGV or hits the assertion in blk_get_aio_context(). blk->root is set by blk_insert_bs() which calls bdrv_root_attach_child() first. The blk_remove_bs() function should rollback the changes made by blk_insert_bs() in the opposite order (or it should be documented somewhere why this isn't the case). Clear blk->root before calling bdrv_root_unref_child() in blk_remove_bs(). Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <159430264541.389456.11925072456012783045.stgit@bahia.lan> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-07-14 15:24:15 +02:00
Eric Blake	e54ee1b385	block: Add support to warn on backing file change without format For now, this is a mechanical addition; all callers pass false. But the next patch will use it to improve 'qemu-img rebase -u' when selecting a backing file with no format. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com> Message-Id: <20200706203954.341758-10-eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-07-14 15:18:59 +02:00
Eric Blake	bc5ee6da71	qcow2: Deprecate use of qemu-img amend to change backing file The use of 'qemu-img amend' to change qcow2 backing files is not tested very well. In particular, our implementation has a bug where if a new backing file is provided without a format, then the prior format is blindly reused, even if this results in data corruption, but this is not caught by iotests. There are also situations where amending other options needs access to the original backing file (for example, on a downgrade to a v2 image, knowing whether a v3 zero cluster must be allocated or may be left unallocated depends on knowing whether the backing file already reads as zero), but the command line does not have a nice way to tell us both the backing file to use for opening the image as well as the backing file to install after the operation is complete. Even if we do allow changing the backing file, it is redundant with the existing ability to change backing files via 'qemu-img rebase -u'. It is time to deprecate this support (leaving the existing behavior intact, even if it is buggy), and at a point in the future, require the use of only 'qemu-img rebase' for adjusting backing chain relations, saving 'qemu-img amend' for changes unrelated to the backing chain. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20200706203954.341758-8-eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-07-14 15:18:59 +02:00
Eric Blake	344acbd62f	qcow: Tolerate backing_fmt= qcow has no space in the metadata to store a backing format, and there are existing qcow images backed both by raw or by other formats (usually qcow) images, reliant on probing to tell the difference. On the bright side, because we probe every time, raw files are marked as probed and we thus forbid a commit action into the backing file where guest-controlled contents could change the result of the probe next time around (the iotest added here proves that). Still, allowing the user to specify the backing format during creation, even if we can't record it, is a good thing. This patch blindly allows any value that resolves to a known driver, even if the user's request is a mismatch from what probing finds; then the next patch will further enhance things to verify that the user's request matches what we actually probe. With this and the next patch in place, we will finally be ready to deprecate the creation of images where a backing format was not explicitly specified by the user. Note that this is only for QemuOpts usage; there is no change to the QAPI to allow a format through -blockdev. Add a new iotest 301 just for qcow, to demonstrate the latest behavior, and to make it easier to show the improvements made in the next patch. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20200706203954.341758-6-eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-07-14 15:18:59 +02:00
Eric Blake	d51a814cf4	vmdk: Add trivial backing_fmt support vmdk already requires that if backing_file is present, that it be another vmdk image (see vmdk_co_do_create). Meanwhile, we want to move towards always being explicit about the backing format for other drivers where it matters. So for convenience, make qemu-img create -F vmdk work, while rejecting all other explicit formats (note that this is only for QemuOpts usage; there is no change to the QAPI to allow a format through -blockdev). Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20200706203954.341758-5-eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-07-14 15:18:59 +02:00
Eric Blake	80fa43e7df	sheepdog: Add trivial backing_fmt support Sheepdog already requires that if backing_file is present, that it be another sheepdog image (see sd_co_create). Meanwhile, we want to move towards always being explicit about the backing format for other drivers where it matters. So for convenience, make qemu-img create -F sheepdog work, while rejecting all other explicit formats (note that this is only for QemuOpts usage; there is no change to the QAPI to allow a format through -blockdev). Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20200706203954.341758-4-eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-07-14 15:18:59 +02:00
Kevin Wolf	ffa244c84a	file-posix: Mitigate file fragmentation with extent size hints Especially when O_DIRECT is used with image files so that the page cache indirection can't cause a merge of allocating requests, the file will fragment on the file system layer, with a potentially very small fragment size (this depends on the requests the guest sent). On Linux, fragmentation can be reduced by setting an extent size hint when creating the file (at least on XFS, it can't be set any more after the first extent has been allocated), basically giving raw files a "cluster size" for allocation. This adds a create option to set the extent size hint, and changes the default from not setting a hint to setting it to 1 MB. The main reason why qcow2 defaults to smaller cluster sizes is that COW becomes more expensive, which is not an issue with raw files, so we can choose a larger size. The tradeoff here is only potentially wasted disk space. For qcow2 (or other image formats) over file-posix, the advantage should even be greater because they grow sequentially without leaving holes, so there won't be wasted space. Setting even larger extent size hints for such images may make sense. This can be done with the new option, but let's keep the default conservative for now. The effect is very visible with a test that intentionally creates a badly fragmented file with qemu-img bench (the time difference while creating the file is already remarkable) and then looks at the number of extents and the time a simple "qemu-img map" takes. Without an extent size hint: $ ./qemu-img create -f raw -o extent_size_hint=0 ~/tmp/test.raw 10G Formatting '/home/kwolf/tmp/test.raw', fmt=raw size=10737418240 extent_size_hint=0 $ ./qemu-img bench -f raw -t none -n -w ~/tmp/test.raw -c 1000000 -S 8192 -o 0 Sending 1000000 write requests, 4096 bytes each, 64 in parallel (starting at offset 0, step size 8192) Run completed in 25.848 seconds. $ ./qemu-img bench -f raw -t none -n -w ~/tmp/test.raw -c 1000000 -S 8192 -o 4096 Sending 1000000 write requests, 4096 bytes each, 64 in parallel (starting at offset 4096, step size 8192) Run completed in 19.616 seconds. $ filefrag ~/tmp/test.raw /home/kwolf/tmp/test.raw: 2000000 extents found $ time ./qemu-img map ~/tmp/test.raw Offset Length Mapped to File 0 0x1e8480000 0 /home/kwolf/tmp/test.raw real 0m1,279s user 0m0,043s sys 0m1,226s With the new default extent size hint of 1 MB: $ ./qemu-img create -f raw -o extent_size_hint=1M ~/tmp/test.raw 10G Formatting '/home/kwolf/tmp/test.raw', fmt=raw size=10737418240 extent_size_hint=1048576 $ ./qemu-img bench -f raw -t none -n -w ~/tmp/test.raw -c 1000000 -S 8192 -o 0 Sending 1000000 write requests, 4096 bytes each, 64 in parallel (starting at offset 0, step size 8192) Run completed in 11.833 seconds. $ ./qemu-img bench -f raw -t none -n -w ~/tmp/test.raw -c 1000000 -S 8192 -o 4096 Sending 1000000 write requests, 4096 bytes each, 64 in parallel (starting at offset 4096, step size 8192) Run completed in 10.155 seconds. $ filefrag ~/tmp/test.raw /home/kwolf/tmp/test.raw: 178 extents found $ time ./qemu-img map ~/tmp/test.raw Offset Length Mapped to File 0 0x1e8480000 0 /home/kwolf/tmp/test.raw real 0m0,061s user 0m0,040s sys 0m0,014s Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20200707142329.48303-1-kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-07-14 15:18:59 +02:00

... 6 7 8 9 10 ...

5547 Commits