qemu-e2k

Author	SHA1	Message	Date
Alberto Faria	a8f0e83cef	block: Use bdrv_co_pwrite_sync() when caller is coroutine_fn Convert uses of bdrv_pwrite_sync() into bdrv_co_pwrite_sync() when the callers are already coroutine_fn. Signed-off-by: Alberto Faria <afaria@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <v.sementsov-og@mail.ru> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20220609152744.3891847-10-afaria@redhat.com> Reviewed-by: Hanna Reitz <hreitz@redhat.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>	2022-07-12 12:14:56 +02:00
Alberto Faria	e97190a405	block: Add bdrv_co_pwrite_sync() Also convert bdrv_pwrite_sync() to being implemented using generated_co_wrapper. Signed-off-by: Alberto Faria <afaria@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20220609152744.3891847-9-afaria@redhat.com> Reviewed-by: Hanna Reitz <hreitz@redhat.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>	2022-07-12 12:14:56 +02:00
Alberto Faria	1d39c7098b	block: Implement bdrv_{pread,pwrite,pwrite_zeroes}() using generated_co_wrapper bdrv_{pread,pwrite}() now return -EIO instead of -EINVAL when 'bytes' is negative, making them consistent with bdrv_{preadv,pwritev}() and bdrv_co_{pread,pwrite,preadv,pwritev}(). bdrv_pwrite_zeroes() now also calls trace_bdrv_co_pwrite_zeroes() and clears the BDRV_REQ_MAY_UNMAP flag when appropriate, which it didn't previously. Signed-off-by: Alberto Faria <afaria@redhat.com> Message-Id: <20220609152744.3891847-8-afaria@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Hanna Reitz <hreitz@redhat.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>	2022-07-12 12:14:56 +02:00
Alberto Faria	c1458c66b2	block: Make 'bytes' param of bdrv_co_{pread,pwrite,preadv,pwritev}() an int64_t For consistency with other I/O functions, and in preparation to implement bdrv_{pread,pwrite}() using generated_co_wrapper. unsigned int fits in int64_t, so all callers remain correct. bdrv_check_request32() is called further down the stack and causes -EIO to be returned if 'bytes' is negative or greater than BDRV_REQUEST_MAX_BYTES, which in turns never exceeds SIZE_MAX. Signed-off-by: Alberto Faria <afaria@redhat.com> Message-Id: <20220609152744.3891847-7-afaria@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Hanna Reitz <hreitz@redhat.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>	2022-07-12 12:14:56 +02:00
Alberto Faria	757dda54b4	crypto: Make block callbacks return 0 on success They currently return the value of their headerlen/buflen parameter on success. Returning 0 instead makes it clear that short reads/writes are not possible. Signed-off-by: Alberto Faria <afaria@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20220609152744.3891847-5-afaria@redhat.com> Reviewed-by: Hanna Reitz <hreitz@redhat.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>	2022-07-12 12:14:55 +02:00
Alberto Faria	353a5d84b2	block: Make bdrv_{pread,pwrite}() return 0 on success They currently return the value of their 'bytes' parameter on success. Make them return 0 instead, for consistency with other I/O functions and in preparation to implement them using generated_co_wrapper. This also makes it clear that short reads/writes are not possible. The few callers that rely on the previous behavior are adjusted accordingly by hand. Signed-off-by: Alberto Faria <afaria@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20220609152744.3891847-4-afaria@redhat.com> Reviewed-by: Hanna Reitz <hreitz@redhat.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>	2022-07-12 12:14:55 +02:00
Alberto Faria	32cc71def9	block: Change bdrv_{pread,pwrite,pwrite_sync}() param order Swap 'buf' and 'bytes' around for consistency with bdrv_co_{pread,pwrite}(), and in preparation to implement these functions using generated_co_wrapper. Callers were updated using this Coccinelle script: @@ expression child, offset, buf, bytes, flags; @@ - bdrv_pread(child, offset, buf, bytes, flags) + bdrv_pread(child, offset, bytes, buf, flags) @@ expression child, offset, buf, bytes, flags; @@ - bdrv_pwrite(child, offset, buf, bytes, flags) + bdrv_pwrite(child, offset, bytes, buf, flags) @@ expression child, offset, buf, bytes, flags; @@ - bdrv_pwrite_sync(child, offset, buf, bytes, flags) + bdrv_pwrite_sync(child, offset, bytes, buf, flags) Resulting overly-long lines were then fixed by hand. Signed-off-by: Alberto Faria <afaria@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Message-Id: <20220609152744.3891847-3-afaria@redhat.com> Reviewed-by: Hanna Reitz <hreitz@redhat.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>	2022-07-12 12:14:55 +02:00
Alberto Faria	53fb7844f0	block: Add a 'flags' param to bdrv_{pread,pwrite,pwrite_sync}() For consistency with other I/O functions, and in preparation to implement them using generated_co_wrapper. Callers were updated using this Coccinelle script: @@ expression child, offset, buf, bytes; @@ - bdrv_pread(child, offset, buf, bytes) + bdrv_pread(child, offset, buf, bytes, 0) @@ expression child, offset, buf, bytes; @@ - bdrv_pwrite(child, offset, buf, bytes) + bdrv_pwrite(child, offset, buf, bytes, 0) @@ expression child, offset, buf, bytes; @@ - bdrv_pwrite_sync(child, offset, buf, bytes) + bdrv_pwrite_sync(child, offset, buf, bytes, 0) Resulting overly-long lines were then fixed by hand. Signed-off-by: Alberto Faria <afaria@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Message-Id: <20220609152744.3891847-2-afaria@redhat.com> Reviewed-by: Hanna Reitz <hreitz@redhat.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>	2022-07-12 12:14:55 +02:00
Stefan Hajnoczi	be6a166fde	block/io_uring: clarify that short reads can happen Jens Axboe has confirmed that short reads are rare but can happen: https://lore.kernel.org/io-uring/YsU%2FCGkl9ZXUI+Tj@stefanha-x1.localdomain/T/#m729963dc577d709b709c191922e98ec79d7eef54 The luring_resubmit_short_read() comment claimed they were only due to a specific io_uring bug that was fixed in Linux commit 9d93a3f5a0c ("io_uring: punt short reads to async context"), which is wrong. Dominique Martinet found that a btrfs bug also causes short reads. There may be more kernel code paths that result in short reads. Let's consider short reads fair game. Cc: Dominique Martinet <dominique.martinet@atmark-techno.com> Based-on: <20220630010137.2518851-1-dominique.martinet@atmark-techno.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Message-id: 20220706080341.1206476-1-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2022-07-07 09:04:15 +01:00
Dominique Martinet	c06fc7ce14	io_uring: fix short read slow path sqeq.off here is the offset to read within the disk image, so obviously not 'nread' (the amount we just read), but as the author meant to write its current value incremented by the amount we just read. Normally recent versions of linux will not issue short reads, but it can happen so we should fix this. This lead to weird image corruptions when short read happened Fixes: `6663a0a337` ("block/io_uring: implements interfaces for io_uring") Link: https://lkml.kernel.org/r/YrrFGO4A1jS0GI0G@atmark-techno.com Signed-off-by: Dominique Martinet <dominique.martinet@atmark-techno.com> Message-Id: <20220630010137.2518851-1-dominique.martinet@atmark-techno.com> Reviewed-by: Hanna Reitz <hreitz@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2022-07-07 09:04:12 +01:00
Denis V. Lunev	1b8f777673	block: use 'unsigned' for in_flight field on driver state This patch makes in_flight field 'unsigned' for BDRVNBDState and MirrorBlockJob. This matches the definition of this field on BDS and is generically correct - we should never get negative value here. Signed-off-by: Denis V. Lunev <den@openvz.org> CC: John Snow <jsnow@redhat.com> CC: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> CC: Kevin Wolf <kwolf@redhat.com> CC: Hanna Reitz <hreitz@redhat.com> CC: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>	2022-06-29 10:57:02 +03:00
Denis V. Lunev	8bb100c9e2	nbd: trace long NBD operations At the moment there are 2 sources of lengthy operations if configured: * open connection, which could retry inside and * reconnect of already opened connection These operations could be quite lengthy and cumbersome to catch thus it would be quite natural to add trace points for them. This patch is based on the original downstream work made by Vladimir. Signed-off-by: Denis V. Lunev <den@openvz.org> CC: Eric Blake <eblake@redhat.com> CC: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> CC: Kevin Wolf <kwolf@redhat.com> CC: Hanna Reitz <hreitz@redhat.com> CC: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>	2022-06-29 10:57:02 +03:00
Vladimir Sementsov-Ogievskiy	6db7fd1ca9	block/copy-before-write: implement cbw-timeout option In some scenarios, when copy-before-write operations lasts too long time, it's better to cancel it. Most useful would be to use the new option together with on-cbw-error=break-snapshot: this way if cbw operation takes too long time we'll just cancel backup process but do not disturb the guest too much. Note the tricky point of realization: we keep additional point in bs->in_flight during block_copy operation even if it's timed-out. Background "cancelled" block_copy operations will finish at some point and will want to access state. We should care to not free the state in .bdrv_close() earlier. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@openvz.org> Reviewed-by: Hanna Reitz <hreitz@redhat.com> [vsementsov: use bdrv_inc_in_flight()/bdrv_dec_in_flight() instead of direct manipulation on bs->in_flight] Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>	2022-06-29 10:56:12 +03:00
Vladimir Sementsov-Ogievskiy	15df6e6987	block/block-copy: block_copy(): add timeout_ns parameter Add possibility to limit block_copy() call in time. To be used in the next commit. As timed-out block_copy() call will continue in background anyway (we can't immediately cancel IO operation), it's important also give user a possibility to pass a callback, to do some additional actions on block-copy call finish. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@openvz.org> Reviewed-by: Hanna Reitz <hreitz@redhat.com> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>	2022-06-29 10:56:12 +03:00
Vladimir Sementsov-Ogievskiy	f1bb39a8a5	block/copy-before-write: add on-cbw-error open parameter Currently, behavior on copy-before-write operation failure is simple: report error to the guest. Let's implement alternative behavior: break the whole copy-before-write process (and corresponding backup job or NBD client) but keep guest working. It's needed if we consider guest stability as more important. The realisation is simple: on copy-before-write failure we set s->snapshot_ret and continue guest operations. s->snapshot_ret being set will lead to all further snapshot API requests. Note that all in-flight snapshot-API requests may still success: we do wait for them on BREAK_SNAPSHOT-failure path in cbw_do_copy_before_write(). Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@openvz.org> Reviewed-by: Hanna Reitz <hreitz@redhat.com> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>	2022-06-28 10:20:31 +03:00
Vladimir Sementsov-Ogievskiy	79ef0cebb5	block/copy-before-write: refactor option parsing We are going to add one more option of enum type. Let's refactor option parsing so that we can simply work with BlockdevOptionsCbw object. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@openvz.org> Reviewed-by: Hanna Reitz <hreitz@redhat.com> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>	2022-06-28 10:20:31 +03:00
Xie Yongji	779d82e1d3	vduse-blk: Add name option Currently we use 'id' option as the name of VDUSE device. It's a bit confusing since we use one value for two different purposes: the ID to identfy the export within QEMU (must be distinct from any other exports in the same QEMU process, but can overlap with names used by other processes), and the VDUSE name to uniquely identify it on the host (must be distinct from other VDUSE devices on the same host, but can overlap with other export types like NBD in the same process). To make it clear, this patch adds a separate 'name' option to specify the VDUSE name for the vduse-blk export instead. Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Message-Id: <20220614051532.92-7-xieyongji@bytedance.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2022-06-24 17:07:06 +02:00
Xie Yongji	0862a087fd	vduse-blk: Add serial option Add a 'serial' option to allow user to specify this value explicitly. And the default value is changed to an empty string as what we did in "hw/block/virtio-blk.c". Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Message-Id: <20220614051532.92-6-xieyongji@bytedance.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2022-06-24 17:07:06 +02:00
Eric Blake	2866ddd121	nbd: Drop dead code spotted by Coverity CID 1488362 points out that the second 'rc >= 0' check is now dead code. Reported-by: Peter Maydell <peter.maydell@linaro.org> Fixes: 172f5f1a40(nbd: remove peppering of nbd_client_connected) Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20220516210519.76135-1-eblake@redhat.com> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Vladimir Sementsov-Ogievskiy <v.sementsov-og@mail.ru> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2022-06-24 17:07:06 +02:00
Fabian Ebner	9b38fc56c0	block/gluster: correctly set max_pdiscard On 64-bit platforms, assigning SIZE_MAX to the int64_t max_pdiscard results in a negative value, and the following assertion would trigger down the line (it's not the same max_pdiscard, but computed from the other one): qemu-system-x86_64: ../block/io.c:3166: bdrv_co_pdiscard: Assertion `max_pdiscard >= bs->bl.request_alignment' failed. On 32-bit platforms, it's fine to keep using SIZE_MAX. The assertion in qemu_gluster_co_pdiscard() is checking that the value of 'bytes' can safely be passed to glfs_discard_async(), which takes a size_t for the argument in question, so it is kept as is. And since max_pdiscard is still <= SIZE_MAX, relying on max_pdiscard is still fine. Fixes: `0c8022876f` ("block: use int64_t instead of int in driver discard handlers") Cc: qemu-stable@nongnu.org Signed-off-by: Fabian Ebner <f.ebner@proxmox.com> Message-Id: <20220520075922.43972-1-f.ebner@proxmox.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2022-06-24 17:07:06 +02:00
Stefano Garzarella	66dc5f9606	block/rbd: report a better error when namespace does not exist If the namespace does not exist, rbd_create() fails with -ENOENT and QEMU reports a generic "error rbd create: No such file or directory": $ qemu-img create rbd:rbd/namespace/image 1M Formatting 'rbd:rbd/namespace/image', fmt=raw size=1048576 qemu-img: rbd:rbd/namespace/image: error rbd create: No such file or directory Unfortunately rados_ioctx_set_namespace() does not fail if the namespace does not exist, so let's use rbd_namespace_exists() in qemu_rbd_connect() to check if the namespace exists, reporting a more understandable error: $ qemu-img create rbd:rbd/namespace/image 1M Formatting 'rbd:rbd/namespace/image', fmt=raw size=1048576 qemu-img: rbd:rbd/namespace/image: namespace 'namespace' does not exist Reported-by: Tingting Mao <timao@redhat.com> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Message-Id: <20220517071012.6120-1-sgarzare@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2022-06-24 17:07:06 +02:00
Xie Yongji	d043e2db87	libvduse: Add support for reconnecting To support reconnecting after restart or crash, VDUSE backend might need to resubmit inflight I/Os. This stores the metadata such as the index of inflight I/O's descriptors to a shm file so that VDUSE backend can restore them during reconnecting. Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Message-Id: <20220523084611.91-9-xieyongji@bytedance.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2022-06-24 17:07:06 +02:00
Xie Yongji	9e4dea6727	vduse-blk: Add vduse-blk resize support To support block resize, this uses vduse_dev_update_config() to update the capacity field in configuration space and inject config interrupt on the block resize callback. Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20220523084611.91-8-xieyongji@bytedance.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2022-06-24 17:07:06 +02:00
Xie Yongji	2a2359b844	vduse-blk: Implement vduse-blk export This implements a VDUSE block backends based on the libvduse library. We can use it to export the BDSs for both VM and container (host) usage. The new command-line syntax is: $ qemu-storage-daemon \ --blockdev file,node-name=drive0,filename=test.img \ --export vduse-blk,node-name=drive0,id=vduse-export0,writable=on After the qemu-storage-daemon started, we need to use the "vdpa" command to attach the device to vDPA bus: $ vdpa dev add name vduse-export0 mgmtdev vduse Also the device must be removed via the "vdpa" command before we stop the qemu-storage-daemon. Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20220523084611.91-7-xieyongji@bytedance.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2022-06-24 17:07:06 +02:00
Xie Yongji	5c36802970	block/export: Abstract out the logic of virtio-blk I/O process Abstract the common logic of virtio-blk I/O process to a function named virtio_blk_process_req(). It's needed for the following commit. Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Message-Id: <20220523084611.91-4-xieyongji@bytedance.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2022-06-24 17:07:06 +02:00
Xie Yongji	8e7fd6f623	block/export: Fix incorrect length passed to vu_queue_push() Now the req->size is set to the correct value only when handling VIRTIO_BLK_T_GET_ID request. This patch fixes it. Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Message-Id: <20220523084611.91-3-xieyongji@bytedance.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2022-06-24 17:07:06 +02:00
Xie Yongji	ac1fc3a3a9	block: Support passing NULL ops to blk_set_dev_ops() This supports passing NULL ops to blk_set_dev_ops() so that we can remove stale ops in some cases. Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20220523084611.91-2-xieyongji@bytedance.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2022-06-24 17:07:06 +02:00
Vladimir Sementsov-Ogievskiy	618af89e55	block: simplify handling of try to merge different sized bitmaps We have too much logic to simply check that bitmaps are of the same size. Let's just define that hbitmap_merge() and bdrv_dirty_bitmap_merge_internal() require their argument bitmaps be of same size, this simplifies things. Let's look through the callers: For backup_init_bcs_bitmap() we already assert that merge can't fail. In bdrv_reclaim_dirty_bitmap_locked() we gracefully handle the error that can't happen: successor always has same size as its parent, drop this logic. In bdrv_merge_dirty_bitmap() we already has assertion and separate check. Make the check explicit and improve error message. Signed-off-by: Vladimir Sementsov-Ogievskiy <v.sementsov-og@mail.ru> Reviewed-by: Nikita Lapshin <nikita.lapshin@virtuozzo.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20220517111206.23585-4-v.sementsov-og@mail.ru> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2022-06-24 17:07:06 +02:00
Vladimir Sementsov-Ogievskiy	58cbfbdf73	block: improve block_dirty_bitmap_merge(): don't allocate extra bitmap We don't need extra bitmap. All we need is to backup the original bitmap when we do first merge. So, drop extra temporary bitmap and work directly with target and backup. Still to keep old semantics, that on failure target is unchanged and user don't need to restore, we need a local_backup variable and do restore ourselves on failure path. Signed-off-by: Vladimir Sementsov-Ogievskiy <v.sementsov-og@mail.ru> Message-Id: <20220517111206.23585-3-v.sementsov-og@mail.ru> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2022-06-24 17:07:06 +02:00
Vladimir Sementsov-Ogievskiy	775b30b305	block: block_dirty_bitmap_merge(): fix error path At the end we ignore failure of bdrv_merge_dirty_bitmap() and report success. And still set errp. That's wrong. Signed-off-by: Vladimir Sementsov-Ogievskiy <v.sementsov-og@mail.ru> Reviewed-by: Nikita Lapshin <nikita.lapshin@virtuozzo.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20220517111206.23585-2-v.sementsov-og@mail.ru> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2022-06-24 17:07:06 +02:00
Stefan Hajnoczi	1ab5096b3a	block: get rid of blk->guest_block_size Commit `1b7fd72955` ("block: rename buffer_alignment to guest_block_size") noted: At this point, the field is set by the device emulation, but completely ignored by the block layer. The last time the value of buffer_alignment/guest_block_size was actually used was before commit `339064d506` ("block: Don't use guest sector size for qemu_blockalign()"). This value has not been used since 2013. Get rid of it. Cc: Xie Yongji <xieyongji@bytedance.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20220518130945.2657905-1-stefanha@redhat.com> Reviewed-by: Paul Durrant <paul@xen.org> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Alberto Faria <afaria@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2022-06-24 17:07:06 +02:00
Stefan Hajnoczi	3399848b7f	block: drop unused bdrv_co_drain() API bdrv_co_drain() has not been used since commit `9a0cec664e` ("mirror: use bdrv_drained_begin/bdrv_drained_end") in 2016. Remove it so there are fewer drain scenarios to worry about. Use bdrv_drained_begin()/bdrv_drained_end() instead. They are "mixed" functions that can be called from coroutine context. Unlike bdrv_co_drain(), these functions provide control of the length of the drained section, which is usually the right thing. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20220521122714.3837731-1-stefanha@redhat.com> Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Reviewed-by: Alberto Faria <afaria@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2022-06-24 17:07:06 +02:00
Stefan Hajnoczi	99b969fbe1	linux-aio: explain why max batch is checked in laio_io_unplug() It may not be obvious why laio_io_unplug() checks max batch. I discussed this with Stefano and have added a comment summarizing the reason. Cc: Stefano Garzarella <sgarzare@redhat.com> Cc: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Message-id: 20220609164712.1539045-3-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2022-06-15 16:43:42 +01:00
Stefan Hajnoczi	f387cac5af	linux-aio: fix unbalanced plugged counter in laio_io_unplug() Every laio_io_plug() call has a matching laio_io_unplug() call. There is a plugged counter that tracks the number of levels of plugging and allows for nesting. The plugged counter must reflect the balance between laio_io_plug() and laio_io_unplug() calls accurately. Otherwise I/O stalls occur since io_submit(2) calls are skipped while plugged. Reported-by: Nikolay Tenev <nt@storpool.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Message-id: 20220609164712.1539045-2-stefanha@redhat.com Cc: Stefano Garzarella <sgarzare@redhat.com> Fixes: `68d7946648` ("linux-aio: add `dev_max_batch` parameter to laio_io_unplug()") [Stefano Garzarella suggested adding a Fixes tag. --Stefan] Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2022-06-15 16:43:42 +01:00
Sam Li	e2848bc574	Use io_uring_register_ring_fd() to skip fd operations Linux recently added a new io_uring(7) optimization API that QEMU doesn't take advantage of yet. The liburing library that QEMU uses has added a corresponding new API calling io_uring_register_ring_fd(). When this API is called after creating the ring, the io_uring_submit() library function passes a flag to the io_uring_enter(2) syscall allowing it to skip the ring file descriptor fdget()/fdput() operations. This saves some CPU cycles. Signed-off-by: Sam Li <faithilikerun@gmail.com> Message-id: 20220531105011.111082-1-faithilikerun@gmail.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2022-06-15 14:50:41 +01:00
Paolo Bonzini	f0d43b1ece	coroutine-lock: qemu_co_queue_restart_all is a coroutine-only qemu_co_enter_all qemu_co_queue_restart_all is basically the same as qemu_co_enter_all but without a QemuLockable argument. That's perfectly fine, but only as long as the function is marked coroutine_fn. If used outside coroutine context, qemu_co_queue_wait will attempt to take the lock and that is just broken: if you are calling qemu_co_queue_restart_all outside coroutine context, the lock is going to be a QemuMutex which cannot be taken twice by the same thread. The patch adds the marker to qemu_co_queue_restart_all and to its sole non-coroutine_fn caller; it then reimplements the function in terms of qemu_co_enter_all_impl, to remove duplicated code and to clarify that the latter also works in coroutine context. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20220427130830.150180-4-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-05-12 12:29:44 +02:00
Markus Armbruster	9c0928045c	Clean up ill-advised or unusual header guards Leading underscores are ill-advised because such identifiers are reserved. Trailing underscores are merely ugly. Strip both. Our header guards commonly end in _H. Normalize the exceptions. Macros should be ALL_CAPS. Normalize the exception. Done with scripts/clean-header-guards.pl. include/hw/xen/interface/ and tools/virtiofsd/ left alone, because these were imported from Xen and libfuse respectively. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20220506134911.2856099-3-armbru@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>	2022-05-11 16:50:01 +02:00
Markus Armbruster	52581c718c	Clean up header guards that don't match their file name Header guard symbols should match their file name to make guard collisions less likely. Cleaned up with scripts/clean-header-guards.pl, followed by some renaming of new guard symbols picked by the script to better ones. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20220506134911.2856099-2-armbru@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> [Change to generated file ebpf/rss.bpf.skeleton.h backed out]	2022-05-11 16:49:06 +02:00
Hanna Reitz	6d17e28798	block/vmdk: Fix reopening bs->file VMDK disk data is stored in extents, which may or may not be separate from bs->file. VmdkExtent.file points to where they are stored. Each that is stored in bs->file will simply reuse the exact pointer value of bs->file. (That is why vmdk_free_extents() will unref VmdkExtent.file (e->file) only if e->file != bs->file.) Reopen operations can change bs->file (they will replace the whole BdrvChild object, not just the BDS stored in that BdrvChild), and then we will need to change all .file pointers of all such VmdkExtents to point to the new BdrvChild. In vmdk_reopen_prepare(), we have to check which VmdkExtents are affected, and in vmdk_reopen_commit(), we can modify them. We have to split this because: - The new BdrvChild is created only after prepare, so we can change VmdkExtent.file only in commit - In commit, there no longer is any (valid) reference to the old BdrvChild object, so there would be nothing to compare VmdkExtent.file against to see whether it was equal to bs->file before reopening (There is BDRVReopenState.old_file_bs, but the old bs->file BdrvChild's .bs pointer will be NULL-ed when the new BdrvChild is created, and so we cannot compare VmdkExtent.file->bs against BDRVReopenState.old_file_bs) Signed-off-by: Hanna Reitz <hreitz@redhat.com> Message-Id: <20220314162719.65384-2-hreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2022-05-04 15:55:23 +02:00
Hanna Reitz	06e9cd19a4	qcow2: Do not reopen data_file in invalidate_cache qcow2_co_invalidate_cache() closes and opens the qcow2 file, by calling qcow2_close() and qcow2_do_open(). These two functions must thus be usable from both a global-state and an I/O context. As they are, they are not safe to call in an I/O context, because they use bdrv_unref_child() and bdrv_open_child() to close/open the data_file child, respectively, both of which are global-state functions. When used from qcow2_co_invalidate_cache(), we do not need to close/open the data_file child, though (we do not do this for bs->file or bs->backing either), and so we should skip it in the qcow2_co_invalidate_cache() path. To do so, add a parameter to qcow2_do_open() and qcow2_close() to make them skip handling s->data_file, and have qcow2_co_invalidate_cache() exempt it from the memset() on the BDRVQcow2State. (Note that the QED driver similarly closes/opens the QED image by invoking bdrv_qed_close()+bdrv_qed_do_open(), but both functions seem safe to use in an I/O context.) Fixes: https://gitlab.com/qemu-project/qemu/-/issues/945 Signed-off-by: Hanna Reitz <hreitz@redhat.com> Message-Id: <20220427114057.36651-3-hreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2022-05-04 15:55:23 +02:00
Marc-André Lureau	ad24b679d2	block: move fcntl_setfl() It is only used by block/file-posix.c, move it there. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>	2022-05-03 15:17:53 +04:00
Paolo Bonzini	620c5cb5da	nbd: document what is protected by the CoMutexes Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20220414175756.671165-10-pbonzini@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Lukas Straub <lukasstraub2@web.de> Signed-off-by: Eric Blake <eblake@redhat.com>	2022-04-26 13:16:42 -05:00
Paolo Bonzini	a80a9a1c73	nbd: take receive_mutex when reading requests[].receiving requests[].receiving is set by nbd_receive_replies() under the receive_mutex; Read it under the same mutex as well. Waking up receivers on errors happens after each reply finishes processing, in nbd_co_receive_one_chunk(). If there is no currently-active reply, there are two cases: * either there is no active request at all, in which case no element of request[] can have .receiving = true * or nbd_receive_replies() must be running and owns receive_mutex; in that case it will get back to nbd_co_receive_one_chunk() because the socket has been shutdown, and all waiting coroutines will wake up in turn. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20220414175756.671165-9-pbonzini@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@openvz.org> Reviewed-by: Lukas Straub <lukasstraub2@web.de> Signed-off-by: Eric Blake <eblake@redhat.com>	2022-04-26 13:16:41 -05:00
Paolo Bonzini	dba5156c0e	nbd: move s->state under requests_lock Remove the confusing, and most likely wrong, atomics. The only function that used to be somewhat in a hot path was nbd_client_connected(), but it is not anymore after the previous patches. The same logic is used both to check if a request had to be reissued and also in nbd_reconnecting_attempt(). The former cases are outside requests_lock, while nbd_reconnecting_attempt() does have the lock, therefore the two have been separated in the previous commit. nbd_client_will_reconnect() can simply take s->requests_lock, while nbd_reconnecting_attempt() can inline the access now that no complicated atomics are involved. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20220414175756.671165-8-pbonzini@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@openvz.org> Reviewed-by: Lukas Straub <lukasstraub2@web.de> Signed-off-by: Eric Blake <eblake@redhat.com>	2022-04-26 13:16:39 -05:00
Paolo Bonzini	8d45185cb7	nbd: code motion and function renaming Prepare for the next patch, so that the diff is less confusing. nbd_client_connecting is moved closer to the definition point. nbd_client_connecting_wait() is kept only for the reconnection logic; when it is used to check if a request has to be reissued, use the renamed function nbd_client_will_reconnect(). In the next patch, the two cases will have different locking requirements. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20220414175756.671165-7-pbonzini@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@openvz.org> Reviewed-by: Lukas Straub <lukasstraub2@web.de> Signed-off-by: Eric Blake <eblake@redhat.com>	2022-04-26 13:16:38 -05:00
Paolo Bonzini	ee19d953ec	nbd: use a QemuMutex to synchronize yanking, reconnection and coroutines The condition for waiting on the s->free_sema queue depends on both s->in_flight and s->state. The latter is currently using atomics, but this is quite dubious and probably wrong. Because s->state is written in the main thread too, for example by the yank callback, it cannot be protected by a CoMutex. Introduce a separate lock that can be used by nbd_co_send_request(); later on this lock will also be used for s->state. There will not be any contention on the lock unless there is a yank or reconnect, so this is not performance sensitive. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20220414175756.671165-6-pbonzini@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Lukas Straub <lukasstraub2@web.de> Signed-off-by: Eric Blake <eblake@redhat.com>	2022-04-26 13:16:37 -05:00
Paolo Bonzini	8610b4491f	nbd: keep send_mutex/free_sema handling outside nbd_co_do_establish_connection Elevate s->in_flight early so that other incoming requests will wait on the CoQueue in nbd_co_send_request; restart them after getting back from nbd_reconnect_attempt. This could be after the reconnect timer or nbd_cancel_in_flight have cancelled the attempt, so there is no need anymore to cancel the requests there. nbd_co_send_request now handles both stopping and restarting pending requests after a successful connection, and there is no need to hold send_mutex in nbd_co_do_establish_connection. The current setup is confusing because nbd_co_do_establish_connection is called both with send_mutex taken and without it. Before the patch it uses free_sema which (at least in theory...) is protected by send_mutex, after the patch it does not anymore. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20220414175756.671165-5-pbonzini@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> [eblake: wrap long line] Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@openvz.org> Reviewed-by: Lukas Straub <lukasstraub2@web.de> Signed-off-by: Eric Blake <eblake@redhat.com>	2022-04-26 13:16:36 -05:00
Paolo Bonzini	172f5f1a40	nbd: remove peppering of nbd_client_connected It is unnecessary to check nbd_client_connected() because every time s->state is moved out of NBD_CLIENT_CONNECTED the socket is shut down and all coroutines are resumed. The only case where it was actually needed is when the NBD server disconnects and there is no reconnect-delay. In that case, nbd_receive_replies() does not set s->reply.handle and nbd_co_do_receive_one_chunk() cannot continue. For that one case, check the return value of nbd_receive_replies(). As to the others: * nbd_receive_replies() can put the current coroutine to sleep if another reply is ongoing; then it will be woken by nbd_channel_error(), called by the ongoing reply. Or it can try itself to read a reply header and fail, thus calling nbd_channel_error() itself. * nbd_co_send_request() will write the body of the request and fail * nbd_reply_chunk_iter_receive() will call nbd_co_receive_one_chunk() and then nbd_co_do_receive_one_chunk(), which will handle the failure as above; or it will just detect a previous call to nbd_iter_channel_error() via iter->ret < 0. Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20220414175756.671165-4-pbonzini@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@openvz.org> Reviewed-by: Lukas Straub <lukasstraub2@web.de> Signed-off-by: Eric Blake <eblake@redhat.com>	2022-04-26 13:16:36 -05:00
Paolo Bonzini	0c43c6fc89	nbd: mark more coroutine_fns Several coroutine functions in block/nbd.c are not marked as such. This patch adds a few more markers; it is not exhaustive, but it focuses especially on: - places that wake other coroutines, because aio_co_wake() has very different semantics inside a coroutine (queuing after yield vs. entering immediately); - functions with _co_ in their names, to avoid confusion Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20220414175756.671165-3-pbonzini@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@openvz.org> Reviewed-by: Lukas Straub <lukasstraub2@web.de> Signed-off-by: Eric Blake <eblake@redhat.com>	2022-04-26 13:16:35 -05:00
Paolo Bonzini	8846b7d1c1	nbd: safeguard against waking up invalid coroutine The .reply_possible field of s->requests is never set to false. This is not a problem as it is only a safeguard to detect protocol errors, but it's sloppy. In fact, the field is actually not necessary at all, because .coroutine is set to NULL in NBD_FOREACH_REPLY_CHUNK after receiving the last chunk. Thus, replace .reply_possible with .coroutine and move the check before deciding the fate of this request. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20220414175756.671165-2-pbonzini@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@openvz.org> Reviewed-by: Lukas Straub <lukasstraub2@web.de> Signed-off-by: Eric Blake <eblake@redhat.com>	2022-04-26 13:16:24 -05:00

1 2 3 4 5 ...

5582 Commits