Commit Graph

6108 Commits

Author SHA1 Message Date
Denis V. Lunev c54fb43507 parallels: naive implementation of parallels_co_pdiscard
* Discarding with backing stores is not supported by the format.
* There is no buffering/queueing of the discard operation.
* Only operations aligned to the cluster are supported.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
2023-09-21 08:48:52 +02:00
Denis V. Lunev 66d201dd5a parallels: improve readability of allocate_clusters
Replace 'space' representing the amount of data to preallocate with
'bytes'.

Rationale:
* 'space' at each place is converted to bytes
* the unit is more close to the variable name

Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
2023-09-21 08:48:41 +02:00
Denis V. Lunev 3ac10d1909 parallels: naive implementation of allocate_clusters with used bitmap
The access to the bitmap is not optimized completely.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
2023-09-21 08:48:32 +02:00
Denis V. Lunev eeb1e6dc49 parallels: update used bitmap in allocate_cluster
We should extend the bitmap if the file is extended and set the bit in
the image used bitmap once the cluster is allocated. Sanity check at
that moment also looks like a good idea.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
2023-09-21 08:48:21 +02:00
Denis V. Lunev 73194d3f54 parallels: accept multiple clusters in mark_used()
This would be useful in the next patch in allocate_clusters(). This
change would not imply serious performance drawbacks as usually image
is full of data or are at the end of the bitmap.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
2023-09-21 08:48:09 +02:00
Denis V. Lunev e185100fec parallels: collect bitmap of used clusters at open
If the operation is failed, we need to check image consistency if the
problem is not about memory allocation.

Bitmap adjustments in allocate_cluster are not performed yet.
They worth to be separate. This was proven useful during debug of this
series. Kept as is for future bissecting.

It should be specifically noted that used bitmap must be recalculated
if data_off has been fixed during image consistency check.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
2023-09-21 08:47:34 +02:00
Denis V. Lunev 6f2206b0cb parallels: fix broken parallels_check_data_off()
Once we have repaired data_off field in the header we should update
s->data_start which is calculated on the base of it.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
2023-09-21 08:47:08 +02:00
Denis V. Lunev a398275e88 parallels: create mark_used() helper which sets bit in used bitmap
This functionality is used twice already and next patch will add more
code with it.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
2023-09-21 08:46:36 +02:00
Denis V. Lunev 8f5f532625 parallels: refactor path when we need to re-check image in parallels_open
More conditions follows thus the check should be more scalable.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
2023-09-21 08:46:22 +02:00
Denis V. Lunev e17b9d08d9 parallels: return earlier from parallels_open() function on error
At the beginning of the function we can return immediately until we
really allocate s->header.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
2023-09-21 08:46:11 +02:00
Denis V. Lunev 9c39878136 parallels: return earler in fail_format branch in parallels_open()
We do not need to perform any deallocation/cleanup if wrong format is
detected.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
2023-09-21 08:45:55 +02:00
Denis V. Lunev 93361b7e33 parallels: invent parallels_opts_prealloc() helper to parse prealloc opts
This patch creates above mentioned helper and moves its usage to the
beginning of parallels_open(). This simplifies parallels_open() a bit.

The patch also ensures that we store prealloc_size on block driver state
always in sectors. This makes code cleaner and avoids wrong opinion at
the assignment that the value is in bytes.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
2023-09-21 08:45:43 +02:00
Denis V. Lunev c74cd7bd32 parallels: fix memory leak in parallels_open()
We should free opts allocated through qemu_opts_create() at the end.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
2023-09-21 08:45:32 +02:00
Denis V. Lunev 73f3e13692 parallels: mark driver as supporting CBT
Parallels driver indeed support Parallels Dirty Bitmap Feature in
read-only mode. The patch adds bdrv_supports_persistent_dirty_bitmap()
callback which always return 1 to indicate that.

This will allow to copy CBT from Parallels image with qemu-img.

Note: read-write support is signalled through
bdrv_co_can_store_new_dirty_bitmap() and is different.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
2023-09-21 08:45:14 +02:00
Denis V. Lunev bb16991fe0 parallels: fix formatting in bdrv_parallels initialization
Old code is ugly and contains tabulations. There are no functional
changes in this patch.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
2023-09-21 08:45:00 +02:00
Stefan Hajnoczi 0f86afde66 block-backend: process zoned requests in the current AioContext
Process zoned requests in the current thread's AioContext instead of in
the BlockBackend's AioContext.

There is no need to use the BlockBackend's AioContext thanks to CoMutex
bs->wps->colock, which protects zone metadata.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20230912231037.826804-5-stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-09-20 17:46:01 +02:00
Stefan Hajnoczi 46eb6e861b block-backend: process I/O in the current AioContext
Switch blk_aio_*() APIs over to multi-queue by using
qemu_get_current_aio_context() instead of blk_get_aio_context(). This
change will allow devices to process I/O in multiple IOThreads in the
future.

I audited existing blk_aio_*() callers:
- migration/block.c: blk_mig_lock() protects the data accessed by the
  completion callback.
- The remaining emulated devices and exports run with
  qemu_get_aio_context() == blk_get_aio_context().

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20230912231037.826804-4-stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-09-20 17:46:01 +02:00
Stefan Hajnoczi 652b0dd808 block: remove AIOCBInfo->get_aio_context()
The synchronous bdrv_aio_cancel() function needs the acb's AioContext so
it can call aio_poll() to wait for cancellation.

It turns out that all users run under the BQL in the main AioContext, so
this callback is not needed.

Remove the callback, mark bdrv_aio_cancel() GLOBAL_STATE_CODE just like
its blk_aio_cancel() caller, and poll the main loop AioContext.

The purpose of this cleanup is to identify bdrv_aio_cancel() as an API
that does not work with the multi-queue block layer.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20230912231037.826804-2-stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-09-20 17:46:01 +02:00
Andrey Drobyshev via 2848289168 block: add BDRV_BLOCK_COMPRESSED flag for bdrv_block_status()
Functions qcow2_get_host_offset(), get_cluster_offset(),
vmdk_co_block_status() explicitly report compressed cluster types when data
is compressed.  However, this information is never passed further.  Let's
make use of it by adding new BDRV_BLOCK_COMPRESSED flag for
bdrv_block_status(), so that caller may know that the data range is
compressed.  In particular, we're going to use this flag to tweak
"qemu-img map" output.

This new flag is only being utilized by qcow, qcow2 and vmdk formats, as only
those support compression.

Reviewed-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
Signed-off-by: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>
Message-ID: <20230907210226.953821-2-andrey.drobyshev@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-09-20 17:46:01 +02:00
Kevin Wolf 9def6082cf block: Mark bdrv_add/del_child() and caller GRAPH_WRLOCK
The functions read the parents list in the generic block layer, so we
need to hold the graph lock already there. The BlockDriver
implementations actually modify the graph, so it has to be a writer
lock.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20230911094620.45040-22-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-09-20 17:46:01 +02:00
Kevin Wolf 32a8aba37e block: Mark bdrv_unref_child() GRAPH_WRLOCK
Instead of taking the writer lock internally, require callers to already
hold it when calling bdrv_unref_child(). These callers will typically
already hold the graph lock once the locking work is completed, which
means that they can't call functions that take it internally.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20230911094620.45040-21-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-09-20 17:46:01 +02:00
Kevin Wolf ede01e4635 block: Mark bdrv_root_unref_child() GRAPH_WRLOCK
Instead of taking the writer lock internally, require callers to already
hold it when calling bdrv_root_unref_child(). These callers will
typically already hold the graph lock once the locking work is
completed, which means that they can't call functions that take it
internally.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20230911094620.45040-20-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-09-20 17:46:01 +02:00
Kevin Wolf c629b6d223 block: Mark bdrv_child_perm() GRAPH_RDLOCK
This adds GRAPH_RDLOCK annotations to declare that callers of
bdrv_child_perm() need to hold a reader lock for the graph because
some implementations access the children list of a node.

The callers of bdrv_child_perm() conveniently already hold the lock.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20230911094620.45040-16-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-09-20 17:46:01 +02:00
Kevin Wolf 3804e3cf54 block: Mark bdrv_parent_perms_conflict() and callers GRAPH_RDLOCK
The function reads the parents list, so it needs to hold the graph lock.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20230911094620.45040-14-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-09-20 17:46:01 +02:00
Kevin Wolf afdaeb9ea0 block: Mark bdrv_attach_child() GRAPH_WRLOCK
Instead of taking the writer lock internally, require callers to already
hold it when calling bdrv_attach_child_common(). These callers will
typically already hold the graph lock once the locking work is
completed, which means that they can't call functions that take it
internally.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20230911094620.45040-13-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-09-20 17:46:01 +02:00
Kevin Wolf 7d4ca9d25b block: Mark bdrv_attach_child_common() GRAPH_WRLOCK
Instead of taking the writer lock internally, require callers to already
hold it when calling bdrv_attach_child_common(). These callers will
typically already hold the graph lock once the locking work is
completed, which means that they can't call functions that take it
internally.

Note that the transaction callbacks still take the lock internally, so
tran_finalize() must be called without the lock held. This is because
bdrv_append() also calls bdrv_replace_node_noperm(), which currently
requires the transaction callbacks to be called unlocked. In the next
step, both of them can be switched to locked tran_finalize() calls
together.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20230911094620.45040-11-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-09-20 17:46:01 +02:00
Kevin Wolf ac2ae233a0 block: Introduce bdrv_schedule_unref()
bdrv_unref() is called by a lot of places that need to hold the graph
lock (it naturally happens in the context of operations that change the
graph). However, bdrv_unref() takes the graph writer lock internally, so
it can't actually be called while already holding a graph lock without
causing a deadlock.

bdrv_unref() also can't just become GRAPH_WRLOCK because it drains the
node before closing it, and draining requires that the graph is
unlocked.

The solution is to defer deleting the node until we don't hold the lock
any more and draining is possible again.

Note that keeping images open for longer than necessary can create
problems, too: You can't open an image again before it is really closed
(if image locking didn't prevent it, it would cause corruption).
Reopening an image immediately happens at least during bdrv_open() and
bdrv_co_create().

In order to solve this problem, make sure to run the deferred unref in
bdrv_graph_wrunlock(), i.e. the first possible place where we can drain
again. This is also why bdrv_schedule_unref() is marked GRAPH_WRLOCK.

The output of iotest 051 is updated because the additional polling
changes the order of HMP output, resulting in a new "(qemu)" prompt in
the test output that was previously on a separate line and filtered out.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20230911094620.45040-6-kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-09-20 17:46:01 +02:00
Kevin Wolf edcce17b15 preallocate: Don't poll during permission updates
When the permission related BlockDriver callbacks are called, we are in
the middle of an operation traversing the block graph. Polling in such a
place is a very bad idea because the graph could change in unexpected
ways. In the future, callers will also hold the graph lock, which is
likely to turn polling into a deadlock.

So we need to get rid of calls to functions like bdrv_getlength() or
bdrv_truncate() there as these functions poll internally. They are
currently used so that when no parent has write/resize permissions on
the image any more, the preallocate filter drops the extra preallocated
area in the image file and gives up write/resize permissions itself.

In order to achieve this without polling in .bdrv_check_perm, don't
immediately truncate the image, but only schedule a BH to do so. The
filter keeps the write/resize permissions a bit longer now until the BH
has executed.

There is one case in which delaying doesn't work: Reopening the image
read-only. In this case, bs->file will likely be reopened read-only,
too, so keeping write permissions a bit longer on it doesn't work. But
we can already cover this case in preallocate_reopen_prepare() and not
rely on the permission updates for it.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20230911094620.45040-4-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-09-20 17:46:00 +02:00
Kevin Wolf 01e28f6006 preallocate: Factor out preallocate_truncate_to_real_size()
It's essentially the same code in preallocate_check_perm() and
preallocate_close(), except that the latter ignores errors.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20230911094620.45040-3-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-09-20 17:46:00 +02:00
Stefan Hajnoczi 78f8b6d9c8 Block layer patches
- Optimise reqs_lock to make multiqueue actually scale
 - virtio: Drop out of coroutine context in virtio_load()
 - iotests: Fix reference output for some tests after recent changes
 - vpc: Avoid dynamic stack allocation
 - Code cleanup, improved documentation
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCAAvFiEE3D3rFZqa+V09dFb+fwmycsiPL9YFAmT7VYgRHGt3b2xmQHJl
 ZGhhdC5jb20ACgkQfwmycsiPL9YfOg/7BoYF6lkB7DF/jH3XLY6f8zoI+OVM7dg1
 QFEjyVO+uZiJVh0CeBNI9WgnBe7f5vXMbiStyGbWKo3BLUsjnwoQcW/Sxpw61bR2
 jZYK6UHe0RhFqTQpbt8G1iCmlpRS+sX+Cy+lxcVcbqxcnLRXCOjT6ivyA4bGbYIC
 q9BHg/9hBmjuM05NTV6Axy8qjqBGVaIWE9ALTnw8H//waBr4/ydJPTl7EWHe3+tO
 Stm73evgPG7aLHM6W4qdFW4gwAQ8f+f42Q+0NH1YavB/pN3LTN1B6sLQY/51du+0
 d/JCsXex0IZQXmNPhqv1h01vhOyU9WBmlwpPG2iZv3a06SXk1ys3rQt/L7uIcsZg
 Z58CpcUJ517FERnkl0BWXzYhsdcW2K+RdlaiL5PX6H1A2B9LT05ouZfD47hh7kKv
 oX+Ulk05PFr3JRCKQF6QDEejRKXt169bGzInTlns/wXinD/V4sCkUnr9aWQuhoWk
 KhQm7WMscTTIyHP2FznO4x9kq0ALsoX/NKqBW2wgJUtqRzsd4XxPp5CXEsAir8Vt
 dpne/DaV5iDI1mGFJrvkctJN545tEoezBtUzC8/9rZGE0cxHAkhvQVZUDo7xVmrq
 PlGQ1ko9cNui/Gf9B6qDqaJJwSyw0S6vHurGVQJRwbyly57Fi5aisWkr4w7Rc4eA
 7u9B1RvwF/Q=
 =2wGD
 -----END PGP SIGNATURE-----

Merge tag 'for-upstream' of https://repo.or.cz/qemu/kevin into staging

Block layer patches

- Optimise reqs_lock to make multiqueue actually scale
- virtio: Drop out of coroutine context in virtio_load()
- iotests: Fix reference output for some tests after recent changes
- vpc: Avoid dynamic stack allocation
- Code cleanup, improved documentation

# -----BEGIN PGP SIGNATURE-----
#
# iQJFBAABCAAvFiEE3D3rFZqa+V09dFb+fwmycsiPL9YFAmT7VYgRHGt3b2xmQHJl
# ZGhhdC5jb20ACgkQfwmycsiPL9YfOg/7BoYF6lkB7DF/jH3XLY6f8zoI+OVM7dg1
# QFEjyVO+uZiJVh0CeBNI9WgnBe7f5vXMbiStyGbWKo3BLUsjnwoQcW/Sxpw61bR2
# jZYK6UHe0RhFqTQpbt8G1iCmlpRS+sX+Cy+lxcVcbqxcnLRXCOjT6ivyA4bGbYIC
# q9BHg/9hBmjuM05NTV6Axy8qjqBGVaIWE9ALTnw8H//waBr4/ydJPTl7EWHe3+tO
# Stm73evgPG7aLHM6W4qdFW4gwAQ8f+f42Q+0NH1YavB/pN3LTN1B6sLQY/51du+0
# d/JCsXex0IZQXmNPhqv1h01vhOyU9WBmlwpPG2iZv3a06SXk1ys3rQt/L7uIcsZg
# Z58CpcUJ517FERnkl0BWXzYhsdcW2K+RdlaiL5PX6H1A2B9LT05ouZfD47hh7kKv
# oX+Ulk05PFr3JRCKQF6QDEejRKXt169bGzInTlns/wXinD/V4sCkUnr9aWQuhoWk
# KhQm7WMscTTIyHP2FznO4x9kq0ALsoX/NKqBW2wgJUtqRzsd4XxPp5CXEsAir8Vt
# dpne/DaV5iDI1mGFJrvkctJN545tEoezBtUzC8/9rZGE0cxHAkhvQVZUDo7xVmrq
# PlGQ1ko9cNui/Gf9B6qDqaJJwSyw0S6vHurGVQJRwbyly57Fi5aisWkr4w7Rc4eA
# 7u9B1RvwF/Q=
# =2wGD
# -----END PGP SIGNATURE-----
# gpg: Signature made Fri 08 Sep 2023 13:10:32 EDT
# gpg:                using RSA key DC3DEB159A9AF95D3D7456FE7F09B272C88F2FD6
# gpg:                issuer "kwolf@redhat.com"
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" [full]
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74  56FE 7F09 B272 C88F 2FD6

* tag 'for-upstream' of https://repo.or.cz/qemu/kevin:
  virtio: Drop out of coroutine context in virtio_load()
  vmstate: Mark VMStateInfo.get/put() coroutine_mixed_fn
  block: Make more BlockDriver definitions static
  block/meson.build: Restore alphabetical order of files
  block: Remove unnecessary variable in bdrv_block_device_info
  block: Remove bdrv_query_block_node_info
  vmdk: Clean up bdrv_open_child() return value check
  qemu-img: Update documentation for compressed images
  block: Be more verbose in create fallback
  block/iscsi: Document why we use raw malloc()
  qemu-img: omit errno value in error message
  block: change reqs_lock to QemuMutex
  block: minimize bs->reqs_lock section in tracked_request_end()
  iotests: adapt test output for new qemu_cleanup() behavior
  block/vpc: Avoid dynamic stack allocation

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-09-11 09:11:22 -04:00
Kevin Wolf 9ea473fb7b block: Make more BlockDriver definitions static
Most block driver implementations don't have any reason for their
BlockDriver to be public. The only exceptions are bdrv_file, bdrv_raw
and bdrv_qcow2, which are actually used in other source files.

Make all other BlockDriver definitions static if they aren't yet.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20230905130607.35134-3-kwolf@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-09-08 17:03:09 +02:00
Kevin Wolf 9e03a5e195 block/meson.build: Restore alphabetical order of files
When commit 5e5733e599 created block/meson.build, the list of
unconditionally added files was in alphabetical order. Later commits
added new files in random places. Reorder the list to be alphabetical
again. (As for ordering foo.c against foo-*.c, there are both ways used
currently; standardise on having foo.c first, even though this is
different from the original commit 5e5733e5999.)

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20230905130607.35134-2-kwolf@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-09-08 17:03:09 +02:00
Fabiano Rosas c3b29ae6b4 block: Remove unnecessary variable in bdrv_block_device_info
The commit 5d8813593f ("block/qapi: Let bdrv_query_image_info()
recurse") removed the loop where we set the 'bs0' variable, so now it
is just the same as 'bs'.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20230901184605.32260-3-farosas@suse.de>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-09-08 17:03:09 +02:00
Fabiano Rosas be2e51c503 block: Remove bdrv_query_block_node_info
The last call site of this function has been removed by commit
c04d0ab026 ("qemu-img: Let info print block graph").

Reviewed-by: Claudio Fontana <cfontana@suse.de>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Message-ID: <20230901184605.32260-2-farosas@suse.de>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-09-08 17:03:09 +02:00
Dmitry Frolov a8d99c0e6c vmdk: Clean up bdrv_open_child() return value check
bdrv_open_child() may return NULL.
Usually return value is checked for this function.
Check for return value is more reliable.

Fixes: 24bc15d1f6 ("vmdk: Use BdrvChild instead of BDS for references to extents")

Signed-off-by: Dmitry Frolov <frolov@swemel.ru>
Message-ID: <20230831125926.796205-1-frolov@swemel.ru>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-09-08 17:03:09 +02:00
Peter Maydell 7966a36c83 block/iscsi: Document why we use raw malloc()
In block/iscsi.c we use a raw malloc() call, which is unusual
given the project standard is to use the glib memory allocation
functions. Document why we do so, to avoid it being converted
to g_malloc() by mistake.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-ID: <20230727150705.2664464-1-peter.maydell@linaro.org>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-09-08 17:03:09 +02:00
Stefan Hajnoczi fa9185fcdf block: change reqs_lock to QemuMutex
CoMutex has poor performance when lock contention is high. The tracked
requests list is accessed frequently and performance suffers in QEMU
multi-queue block layer scenarios.

It is not necessary to use CoMutex for the requests lock. The lock is
always released across coroutine yield operations. It is held for
relatively short periods of time and it is not beneficial to yield when
the lock is held by another coroutine.

Change the lock type from CoMutex to QemuMutex to improve multi-queue
block layer performance. fio randread bs=4k iodepth=64 with 4 IOThreads
handling a virtio-blk device with 8 virtqueues improves from 254k to
517k IOPS (+203%). Full benchmark results and configuration details are
available here:
980c40845d

In the future we may wish to introduce thread-local tracked requests
lists to avoid lock contention completely. That would be much more
involved though.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20230808155852.2745350-3-stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-09-08 17:03:09 +02:00
Stefan Hajnoczi 3480ce69a9 block: minimize bs->reqs_lock section in tracked_request_end()
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20230808155852.2745350-2-stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-09-08 17:03:09 +02:00
Philippe Mathieu-Daudé 3c2c599c79 block/vpc: Avoid dynamic stack allocation
Use autofree heap allocation instead of variable-length array on the
stack. Here we don't expect the bitmap size to be enormous, and
since we're about to read/write it to disk the overhead of the
allocation should be fine.

The codebase has very few VLAs, and if we can get rid of them all we
can make the compiler error on new additions.  This is a defensive
measure against security bugs where an on-stack dynamic allocation
isn't correctly size-checked (e.g.  CVE-2021-3527).

Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
[PMM: expanded commit message]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-ID: <20230811175229.808139-1-peter.maydell@linaro.org>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Francisco Iglesias <frasse.iglesias@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-09-08 17:03:09 +02:00
Stefan Hajnoczi c5ea91da44 trivial patches for 2023-09-08
-----BEGIN PGP SIGNATURE-----
 
 iQFDBAABCAAtFiEEe3O61ovnosKJMUsicBtPaxppPlkFAmT68tMPHG1qdEB0bHMu
 bXNrLnJ1AAoJEHAbT2saaT5ZbEwH/2XcX1f4KcEJbgUn0JVhGQ5GH2c2jepZlkTZ
 2dhvdEECbOPMg73hty0fyyWlyuLWdJ9cMpONfMtzmHTH8RKEOAbpn/zusyo3H+48
 6cunyUpBqbmb7MHPchrN+JmvtvaSPSazsj2Zdkh+Y4WlfEYj+yVysQ4zQlBlRyHv
 iOTi6OdjxXg1QcbtJxAUhp+tKaRJzagiCpLkoyW2m8DIuV9cLVHMJsE3OMgfKNgK
 /S+O1fLcaDhuSCrHAbZzArF3Tr4bfLqSwDtGCJfQpqKeIQDJuI+41GLIlm1nYY70
 IFJzEWMOrX/rcMG1CQnUFZOOyDSO+NfILwNnU+eyM49MUekmY54=
 =mmPS
 -----END PGP SIGNATURE-----

Merge tag 'pull-trivial-patches' of https://gitlab.com/mjt0k/qemu into staging

trivial patches for 2023-09-08

# -----BEGIN PGP SIGNATURE-----
#
# iQFDBAABCAAtFiEEe3O61ovnosKJMUsicBtPaxppPlkFAmT68tMPHG1qdEB0bHMu
# bXNrLnJ1AAoJEHAbT2saaT5ZbEwH/2XcX1f4KcEJbgUn0JVhGQ5GH2c2jepZlkTZ
# 2dhvdEECbOPMg73hty0fyyWlyuLWdJ9cMpONfMtzmHTH8RKEOAbpn/zusyo3H+48
# 6cunyUpBqbmb7MHPchrN+JmvtvaSPSazsj2Zdkh+Y4WlfEYj+yVysQ4zQlBlRyHv
# iOTi6OdjxXg1QcbtJxAUhp+tKaRJzagiCpLkoyW2m8DIuV9cLVHMJsE3OMgfKNgK
# /S+O1fLcaDhuSCrHAbZzArF3Tr4bfLqSwDtGCJfQpqKeIQDJuI+41GLIlm1nYY70
# IFJzEWMOrX/rcMG1CQnUFZOOyDSO+NfILwNnU+eyM49MUekmY54=
# =mmPS
# -----END PGP SIGNATURE-----
# gpg: Signature made Fri 08 Sep 2023 06:09:23 EDT
# gpg:                using RSA key 7B73BAD68BE7A2C289314B22701B4F6B1A693E59
# gpg:                issuer "mjt@tls.msk.ru"
# gpg: Good signature from "Michael Tokarev <mjt@tls.msk.ru>" [full]
# gpg:                 aka "Michael Tokarev <mjt@corpit.ru>" [full]
# gpg:                 aka "Michael Tokarev <mjt@debian.org>" [full]
# Primary key fingerprint: 6EE1 95D1 886E 8FFB 810D  4324 457C E0A0 8044 65C5
#      Subkey fingerprint: 7B73 BAD6 8BE7 A2C2 8931  4B22 701B 4F6B 1A69 3E59

* tag 'pull-trivial-patches' of https://gitlab.com/mjt0k/qemu: (22 commits)
  qxl: don't assert() if device isn't yet initialized
  hw/net/vmxnet3: Fix guest-triggerable assert()
  tests/qtest/usb-hcd: Remove the empty "init" tests
  target/ppc: use g_free() in test_opcode_table()
  hw/ppc: use g_free() in spapr_tce_table_post_load()
  trivial: Simplify the spots that use TARGET_BIG_ENDIAN as a numeric value
  accel/tcg: Fix typo in translator_io_start() description
  tests/qtest/test-hmp: Fix migrate_set_parameter xbzrle-cache-size test
  docs tests: Fix use of migrate_set_parameter
  qemu-options.hx: Rephrase the descriptions of the -hd* and -cdrom options
  hw/display/xlnx_dp: update comments
  block: spelling fixes
  misc/other: spelling fixes
  qga/: spelling fixes
  tests/: spelling fixes
  scripts/: spelling fixes
  include/: spelling fixes
  audio: spelling fixes
  xen: spelling fix
  riscv: spelling fixes
  ...

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-09-08 10:06:25 -04:00
Michael Tokarev 3202d8e404 block: spelling fixes
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Reviewed-by: Eric Blake <eblake@redhat.com>
2023-09-08 13:08:52 +03:00
Stefan Hajnoczi 06e0f098d6 io: follow coroutine AioContext in qio_channel_yield()
The ongoing QEMU multi-queue block layer effort makes it possible for multiple
threads to process I/O in parallel. The nbd block driver is not compatible with
the multi-queue block layer yet because QIOChannel cannot be used easily from
coroutines running in multiple threads. This series changes the QIOChannel API
to make that possible.

In the current API, calling qio_channel_attach_aio_context() sets the
AioContext where qio_channel_yield() installs an fd handler prior to yielding:

  qio_channel_attach_aio_context(ioc, my_ctx);
  ...
  qio_channel_yield(ioc); // my_ctx is used here
  ...
  qio_channel_detach_aio_context(ioc);

This API design has limitations: reading and writing must be done in the same
AioContext and moving between AioContexts involves a cumbersome sequence of API
calls that is not suitable for doing on a per-request basis.

There is no fundamental reason why a QIOChannel needs to run within the
same AioContext every time qio_channel_yield() is called. QIOChannel
only uses the AioContext while inside qio_channel_yield(). The rest of
the time, QIOChannel is independent of any AioContext.

In the new API, qio_channel_yield() queries the AioContext from the current
coroutine using qemu_coroutine_get_aio_context(). There is no need to
explicitly attach/detach AioContexts anymore and
qio_channel_attach_aio_context() and qio_channel_detach_aio_context() are gone.
One coroutine can read from the QIOChannel while another coroutine writes from
a different AioContext.

This API change allows the nbd block driver to use QIOChannel from any thread.
It's important to keep in mind that the block driver already synchronizes
QIOChannel access and ensures that two coroutines never read simultaneously or
write simultaneously.

This patch updates all users of qio_channel_attach_aio_context() to the
new API. Most conversions are simple, but vhost-user-server requires a
new qemu_coroutine_yield() call to quiesce the vu_client_trip()
coroutine when not attached to any AioContext.

While the API is has become simpler, there is one wart: QIOChannel has a
special case for the iohandler AioContext (used for handlers that must not run
in nested event loops). I didn't find an elegant way preserve that behavior, so
I added a new API called qio_channel_set_follow_coroutine_ctx(ioc, true|false)
for opting in to the new AioContext model. By default QIOChannel uses the
iohandler AioHandler. Code that formerly called
qio_channel_attach_aio_context() now calls
qio_channel_set_follow_coroutine_ctx(ioc, true) once after the QIOChannel is
created.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Acked-by: Daniel P. Berrangé <berrange@redhat.com>
Message-ID: <20230830224802.493686-5-stefanha@redhat.com>
[eblake: also fix migration/rdma.c]
Signed-off-by: Eric Blake <eblake@redhat.com>
2023-09-07 20:32:11 -05:00
Alexander Ivanov c89d4362dc parallels: Add data_off repairing to parallels_open()
Place data_start/data_end calculation after reading the image header
to s->header. Set s->data_start to the offset calculated in
parallels_test_data_off(). Call bdrv_check() if data_off is incorrect.

Signed-off-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
Reviewed-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: Denis V. Lunev <den@openvz.org>
2023-09-06 17:36:49 +02:00
Alexander Ivanov 8cd19203f4 parallels: Add data_off check
data_off field of the parallels image header can be corrupted. Check if
this field greater than the header + BAT size and less than file size.

Signed-off-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
Reviewed-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: Denis V. Lunev <den@openvz.org>
2023-09-06 17:36:49 +02:00
Alexander Ivanov 7c5f86d5ff parallels: Use bdrv_co_getlength() in parallels_check_outside_image()
bdrv_co_getlength() should be used in coroutine context. Replace
bdrv_getlength() by bdrv_co_getlength() in parallels_check_outside_image().

Signed-off-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
Reviewed-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: Denis V. Lunev <den@openvz.org>
2023-09-06 17:36:49 +02:00
Alexander Ivanov cfce1091d5 parallels: Image repairing in parallels_open()
Repair an image at opening if the image is unclean or out-of-image
corruption was detected.

Signed-off-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
Reviewed-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: Denis V. Lunev <den@openvz.org>
2023-09-06 17:36:49 +02:00
Alexander Ivanov 6bb8bc6367 parallels: Add checking and repairing duplicate offsets in BAT
Cluster offsets must be unique among all the BAT entries. Find duplicate
offsets in the BAT and fix it by copying the content of the relevant
cluster to a newly allocated cluster and set the new cluster offset to the
duplicated entry.

Add host_cluster_index() helper to deduplicate the code.

When new clusters are allocated, the file size increases by 128 Mb. Call
parallels_check_leak() to fix this leak.

Signed-off-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
Reviewed-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: Denis V. Lunev <den@openvz.org>
2023-09-06 17:36:49 +02:00
Alexander Ivanov c0b154533e parallels: Add data_start field to BDRVParallelsState
In the next patch we will need the offset of the data area for host cluster
index calculation. Add this field and setting up code.

Signed-off-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
Reviewed-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: Denis V. Lunev <den@openvz.org>
2023-09-06 17:36:49 +02:00
Alexander Ivanov 728e10173b parallels: Add "explicit" argument to parallels_check_leak()
In the on of the next patches we need to repair leaks without changing
leaks and leaks_fixed info in res. Also we don't want to print any warning
about leaks. Add "explicit" argument to skip info changing if the argument
is false.

Signed-off-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
Reviewed-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: Denis V. Lunev <den@openvz.org>
2023-09-06 17:36:49 +02:00
Alexander Ivanov 09eb64f9e3 parallels: Check if data_end greater than the file size
Initially data_end is set to the data_off image header field and must not
be greater than the file size.

Signed-off-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
Reviewed-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: Denis V. Lunev <den@openvz.org>
2023-09-06 17:36:49 +02:00
Alexander Ivanov fcadb48662 parallels: Incorrect data end calculation in parallels_open()
The BDRVParallelsState structure contains data_end field that is measured
in sectors. In parallels_open() initially this field is set by data_off
field from parallels image header.

According to the parallels format documentation, data_off field contains
an offset, in sectors, from the start of the file to the start of the
data area. For "WithoutFreeSpace" images: if data_off is zero, the offset
is calculated as the end of the BAT table plus some padding to ensure
sector size alignment.

The parallels_open() function has code for handling zero value in
data_off, but in the result data_end contains the offset in bytes.

Replace the alignment to sector size by division by sector size and fix
the comparision with s->header_size.

Signed-off-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
Reviewed-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
2023-09-06 17:36:49 +02:00
Alexander Ivanov a338dcbbab parallels: Fix comments formatting inside parallels driver
This patch is technically necessary as git patch rendering could result
in moving some code from one place to the another and that hits
checkpatch.pl warning. This problem specifically happens within next
series.

Signed-off-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
Reviewed-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: Denis V. Lunev <den@openvz.org>
2023-09-06 17:36:49 +02:00
Andrey Drobyshev fc6b211f92 block/io: align requests to subcluster_size
When target image is using subclusters, and we align the request during
copy-on-read, it makes sense to align to subcluster_size rather than
cluster_size.  Otherwise we end up with unnecessary allocations.

This commit renames bdrv_round_to_clusters() to bdrv_round_to_subclusters()
and utilizes subcluster_size field of BlockDriverInfo to make necessary
alignments.  It affects copy-on-read as well as mirror job (which is
using bdrv_round_to_clusters()).

This change also fixes the following bug with failing assert (covered by
the test in the subsequent commit):

qemu-img create -f qcow2 base.qcow2 64K
qemu-img create -f qcow2 -o extended_l2=on,backing_file=base.qcow2,backing_fmt=qcow2 img.qcow2 64K
qemu-io -c "write -P 0xaa 0 2K" img.qcow2
qemu-io -C -c "read -P 0x00 2K 62K" img.qcow2

qemu-io: ../block/io.c:1236: bdrv_co_do_copy_on_readv: Assertion `skip_bytes < pnum' failed.

Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20230711172553.234055-3-andrey.drobyshev@virtuozzo.com>
2023-08-30 07:39:10 -04:00
Andrey Drobyshev c54483b6f4 block: add subcluster_size field to BlockDriverInfo
This is going to be used in the subsequent commit as requests alignment
(in particular, during copy-on-read).  This value only makes sense for
the formats which support subclusters (currently QCOW2 only).  If this
field isn't set by driver's own bdrv_get_info() implementation, we
simply set it equal to the cluster size thus treating each cluster as
having a single subcluster.

Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20230711172553.234055-2-andrey.drobyshev@virtuozzo.com>
2023-08-30 07:39:10 -04:00
Hanna Czenczek d31b50a15d file-posix: Simplify raw_co_prw's 'out' zone code
We duplicate the same condition three times here, pull it out to the top
level.

Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
Message-Id: <20230824155345.109765-5-hreitz@redhat.com>
Reviewed-by: Sam Li <faithilikerun@gmail.com>
2023-08-29 10:51:04 +02:00
Hanna Czenczek deab5c9a4e file-posix: Fix zone update in I/O error path
We must check that zone information is present before running
update_zones_wp().

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=2234374
Fixes: Coverity CID 1512459
Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
Message-Id: <20230824155345.109765-4-hreitz@redhat.com>
Reviewed-by: Sam Li <faithilikerun@gmail.com>
2023-08-29 10:50:49 +02:00
Hanna Czenczek 4b5d80f3d0 file-posix: Check bs->bl.zoned for zone info
Instead of checking bs->wps or bs->bl.zone_size for whether zone
information is present, check bs->bl.zoned.  That is the flag that
raw_refresh_zoned_limits() reliably sets to indicate zone support.  If
it is set to something other than BLK_Z_NONE, other values and objects
like bs->wps and bs->bl.zone_size must be non-null/zero and valid; if it
is not, we cannot rely on their validity.

Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
Message-Id: <20230824155345.109765-3-hreitz@redhat.com>
Reviewed-by: Sam Li <faithilikerun@gmail.com>
2023-08-29 10:50:38 +02:00
Hanna Czenczek 56d1a022a7 file-posix: Clear bs->bl.zoned on error
bs->bl.zoned is what indicates whether the zone information is present
and valid; it is the only thing that raw_refresh_zoned_limits() sets if
CONFIG_BLKZONED is not defined, and it is also the only thing that it
sets if CONFIG_BLKZONED is defined, but there are no zones.

Make sure that it is always set to BLK_Z_NONE if there is an error
anywhere in raw_refresh_zoned_limits() so that we do not accidentally
announce zones while our information is incomplete or invalid.

This also fixes a memory leak in the last error path in
raw_refresh_zoned_limits().

Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
Message-Id: <20230824155345.109765-2-hreitz@redhat.com>
Reviewed-by: Sam Li <faithilikerun@gmail.com>
2023-08-29 10:49:58 +02:00
zhenwei pi 3b2337eff0 block/throttle-groups: Use ThrottleDirection instread of bool is_write
'bool is_write' style is obsolete from throttle framework, adapt
block throttle groups to the new style:
- use ThrottleDirection instead of 'bool is_write'. Ex,
  schedule_next_request(ThrottleGroupMember *tgm, bool is_write)
  -> schedule_next_request(ThrottleGroupMember *tgm, ThrottleDirection direction)

- use THROTTLE_MAX instead of hard code. Ex, ThrottleGroupMember *tokens[2]
  -> ThrottleGroupMember *tokens[THROTTLE_MAX]

- use ThrottleDirection instead of hard code on iteration. Ex, (i = 0; i < 2; i++)
  -> for (dir = THROTTLE_READ; dir < THROTTLE_MAX; dir++)

Use a simple python script to test the new style:
 #!/usr/bin/python3
import subprocess
import random
import time

commands = ['virsh blkdeviotune jammy vda --write-bytes-sec ', \
            'virsh blkdeviotune jammy vda --write-iops-sec ', \
            'virsh blkdeviotune jammy vda --read-bytes-sec ', \
            'virsh blkdeviotune jammy vda --read-iops-sec ']

for loop in range(1, 1000):
    time.sleep(random.randrange(3, 5))
    command = commands[random.randrange(0, 3)] + str(random.randrange(0, 1000000))
    subprocess.run(command, shell=True, check=True)

This works fine.

Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
Message-Id: <20230728022006.1098509-10-pizhenwei@bytedance.com>
Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
2023-08-29 10:49:24 +02:00
zhenwei pi e76f201f69 throttle: use enum ThrottleDirection instead of bool is_write
enum ThrottleDirection is already there, use ThrottleDirection instead
of 'bool is_write' for throttle API, also modify related codes from
block, fsdev, cryptodev and tests.

Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
Message-Id: <20230728022006.1098509-7-pizhenwei@bytedance.com>
Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
2023-08-29 10:49:24 +02:00
Stefano Garzarella 9b06d0d076 block/blkio: add more comments on the fd passing handling
As Hanna pointed out, it is not clear in the code why qemu_open()
can fail, and why blkio_set_int("fd") is not enough to discover
the `fd` property support.

Let's fix them by adding more details in the code comments.

Suggested-by: Hanna Czenczek <hreitz@redhat.com>
Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Message-id: 20230803082825.25293-3-sgarzare@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-08-03 11:28:43 -04:00
Stefano Garzarella 0b054b4c82 block/blkio: close the fd when blkio_connect() fails
libblkio drivers take ownership of `fd` only after a successful
blkio_connect(), so if it fails, we are still the owners.

Fixes: cad2ccc395 ("block/blkio: use qemu_open() to support fd passing for virtio-blk")
Suggested-by: Hanna Czenczek <hreitz@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
Message-id: 20230803082825.25293-2-sgarzare@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-08-03 11:28:43 -04:00
Richard Henderson f33c745764 Pull request
Please include these bug fixes in QEMU 8.1. Thanks!
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEhpWov9P5fNqsNXdanKSrs4Grc8gFAmTCzPUACgkQnKSrs4Gr
 c8g1DAf/fPUQ4zRsCn079pHIyK9TFo4COm23p4kiusxj8otfjt8LH1Zsc9pGWC2+
 bl2RlnPID8JlyJFDRN7b/RCEhj45a83GtCmhDDmqVgy1eO5vwOKm2XyyWeD+pq/U
 Hf2QLPLZZ7tCD8Njpty+gB3Ux4zqthKGXSg8FpJ3w0tl4me2efLvjMa6jHMwtnHT
 aAbyQ3WMpT9w4XHLqRQDHzBqrTSY4od3nl9SrM/DQ2klLIcz8ECTEZVBY9B3pq6m
 QvAg24tfb0QvS14YnZv/PMCfOaVuE87M9G4f93pCynnMxMYze+XczL0sGhIAS9wp
 03NgGlhGumOix6r2kHjlG6p3xywV8A==
 =jMf8
 -----END PGP SIGNATURE-----

Merge tag 'block-pull-request' of https://gitlab.com/stefanha/qemu into staging

Pull request

Please include these bug fixes in QEMU 8.1. Thanks!

# -----BEGIN PGP SIGNATURE-----
#
# iQEzBAABCAAdFiEEhpWov9P5fNqsNXdanKSrs4Grc8gFAmTCzPUACgkQnKSrs4Gr
# c8g1DAf/fPUQ4zRsCn079pHIyK9TFo4COm23p4kiusxj8otfjt8LH1Zsc9pGWC2+
# bl2RlnPID8JlyJFDRN7b/RCEhj45a83GtCmhDDmqVgy1eO5vwOKm2XyyWeD+pq/U
# Hf2QLPLZZ7tCD8Njpty+gB3Ux4zqthKGXSg8FpJ3w0tl4me2efLvjMa6jHMwtnHT
# aAbyQ3WMpT9w4XHLqRQDHzBqrTSY4od3nl9SrM/DQ2klLIcz8ECTEZVBY9B3pq6m
# QvAg24tfb0QvS14YnZv/PMCfOaVuE87M9G4f93pCynnMxMYze+XczL0sGhIAS9wp
# 03NgGlhGumOix6r2kHjlG6p3xywV8A==
# =jMf8
# -----END PGP SIGNATURE-----
# gpg: Signature made Thu 27 Jul 2023 01:00:53 PM PDT
# gpg:                using RSA key 8695A8BFD3F97CDAAC35775A9CA4ABB381AB73C8
# gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>" [full]
# gpg:                 aka "Stefan Hajnoczi <stefanha@gmail.com>" [full]

* tag 'block-pull-request' of https://gitlab.com/stefanha/qemu:
  block/blkio: use blkio_set_int("fd") to check fd support
  block/blkio: fall back on using `path` when `fd` setting fails
  block/blkio: retry blkio_connect() if it fails using `fd`
  block/blkio: move blkio_connect() in the drivers functions
  block: Fix pad_request's request restriction
  block/file-posix: fix g_file_get_contents return path
  block/blkio: do not use open flags in qemu_open()
  block/blkio: enable the completion eventfd

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-07-27 17:41:55 -07:00
Stefano Garzarella 1c38fe69e2 block/blkio: use blkio_set_int("fd") to check fd support
Setting the `fd` property fails with virtio-blk-* libblkio drivers
that do not support fd passing since
https://gitlab.com/libblkio/libblkio/-/merge_requests/208.

Getting the `fd` property, on the other hand, always succeeds for
virtio-blk-* libblkio drivers even when they don't support fd passing.

This patch switches to setting the `fd` property because it is a
better mechanism for probing fd passing support than getting the `fd`
property.

Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Message-id: 20230727161020.84213-5-sgarzare@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-07-27 15:51:46 -04:00
Stefano Garzarella 723bea27b1 block/blkio: fall back on using `path` when `fd` setting fails
qemu_open() fails if called with an unix domain socket in this way:
    -blockdev node-name=drive0,driver=virtio-blk-vhost-user,path=vhost-user-blk.sock,cache.direct=on: Could not open 'vhost-user-blk.sock': No such device or address

Since virtio-blk-vhost-user does not support fd passing, let`s always fall back
on using `path` if we fail the fd passing.

Fixes: cad2ccc395 ("block/blkio: use qemu_open() to support fd passing for virtio-blk")
Reported-by: Qing Wang <qinwang@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Message-id: 20230727161020.84213-4-sgarzare@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-07-27 15:51:46 -04:00
Stefano Garzarella 809c319f8a block/blkio: retry blkio_connect() if it fails using `fd`
libblkio 1.3.0 added support of "fd" property for virtio-blk-vhost-vdpa
driver. In QEMU, starting from commit cad2ccc395 ("block/blkio: use
qemu_open() to support fd passing for virtio-blk") we are using
`blkio_get_int(..., "fd")` to check if the "fd" property is supported
for all the virtio-blk-* driver.

Unfortunately that property is also available for those driver that do
not support it, such as virtio-blk-vhost-user.

So, `blkio_get_int()` is not enough to check whether the driver supports
the `fd` property or not. This is because the virito-blk common libblkio
driver only checks whether or not `fd` is set during `blkio_connect()`
and fails with -EINVAL for those transports that do not support it
(all except vhost-vdpa for now).

So let's handle the `blkio_connect()` failure, retrying it using `path`
directly.

Fixes: cad2ccc395 ("block/blkio: use qemu_open() to support fd passing for virtio-blk")
Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Message-id: 20230727161020.84213-3-sgarzare@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-07-27 15:51:46 -04:00
Stefano Garzarella 69785d66ae block/blkio: move blkio_connect() in the drivers functions
This is in preparation for the next patch, where for virtio-blk
drivers we need to handle the failure of blkio_connect().

Let's also rename the *_open() functions to *_connect() to make
the code reflect the changes applied.

Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Message-id: 20230727161020.84213-2-sgarzare@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-07-27 15:51:46 -04:00
Hanna Czenczek ef256751e9 block: Fix pad_request's request restriction
bdrv_pad_request() relies on requests' lengths not to exceed SIZE_MAX,
which bdrv_check_qiov_request() does not guarantee.

bdrv_check_request32() however will guarantee this, and both of
bdrv_pad_request()'s callers (bdrv_co_preadv_part() and
bdrv_co_pwritev_part()) already run it before calling
bdrv_pad_request().  Therefore, bdrv_pad_request() can safely call
bdrv_check_request32() without expecting error, too.

In effect, this patch will not change guest-visible behavior.  It is a
clean-up to tighten a condition to match what is guaranteed by our
callers, and which exists purely to show clearly why the subsequent
assertion (`assert(*bytes <= SIZE_MAX)`) is always true.

Note there is a difference between the interfaces of
bdrv_check_qiov_request() and bdrv_check_request32(): The former takes
an errp, the latter does not, so we can no longer just pass
&error_abort.  Instead, we need to check the returned value.  While we
do expect success (because the callers have already run this function),
an assert(ret == 0) is not much simpler than just to return an error if
it occurs, so let us handle errors by returning them up the stack now.

Reported-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
Message-id: 20230714085938.202730-1-hreitz@redhat.com
Fixes: 18743311b8
       ("block: Collapse padded I/O vecs exceeding IOV_MAX")
Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-07-27 09:48:16 -04:00
Sam Li 29a242e165 block/file-posix: fix g_file_get_contents return path
The g_file_get_contents() function returns a g_boolean. If it fails, the
returned value will be 0 instead of -1. Solve the issue by skipping
assigning ret value.

This issue was found by Matthew Rosato using virtio-blk-{pci,ccw} backed
by an NVMe partition e.g. /dev/nvme0n1p1 on s390x.

Signed-off-by: Sam Li <faithilikerun@gmail.com>
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-id: 20230727115844.8480-1-faithilikerun@gmail.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-07-27 09:46:09 -04:00
Stefano Garzarella a5942c177b block/blkio: do not use open flags in qemu_open()
qemu_open() in blkio_virtio_blk_common_open() is used to open the
character device (e.g. /dev/vhost-vdpa-0 or /dev/vfio/vfio) or in
the future eventually the unix socket.

In all these cases we cannot open the path in read-only mode,
when the `read-only` option of blockdev is on, because the exchange
of IOCTL commands for example will fail.

In order to open the device read-only, we have to use the `read-only`
property of the libblkio driver as we already do in blkio_file_open().

Fixes: cad2ccc395 ("block/blkio: use qemu_open() to support fd passing for virtio-blk")
Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2225439
Reported-by: Qing Wang <qinwang@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-id: 20230726074807.14041-1-sgarzare@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-07-26 11:29:39 -04:00
Stefano Garzarella 9359c45988 block/blkio: enable the completion eventfd
Until libblkio 1.3.0, virtio-blk drivers had completion eventfd
notifications enabled from the start, but from the next releases
this is no longer the case, so we have to explicitly enable them.

In fact, the libblkio documentation says they could be disabled,
so we should always enable them at the start if we want to be
sure to get completion eventfd notifications:

    By default, the driver might not generate completion events for
    requests so it is necessary to explicitly enable the completion
    file descriptor before use:

    void blkioq_set_completion_fd_enabled(struct blkioq *q, bool enable);

I discovered this while trying a development version of libblkio:
the guest kernel hangs during boot, while probing the device.

Fixes: fd66dbd424 ("blkio: add libblkio block driver")
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Message-id: 20230725103744.77343-1-sgarzare@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-07-25 15:34:17 -04:00
Eric Blake 8cb98a725e nbd/client: Simplify cookie vs. index computation
Our code relies on a sentinel cookie value of zero for deciding when a
packet has been handled, as well as relying on array indices between 0
and MAX_NBD_REQUESTS-1 for dereferencing purposes.  As long as we can
symmetrically convert between two forms, there is no reason to go with
the odd choice of using XOR with a random pointer, when we can instead
simplify the mappings with a mere offset of 1.

Using ((uint64_t)-1) as the sentinel instead of NULL such that the two
macros could be entirely eliminated might also be possible, but would
require a more careful audit to find places where we currently rely on
zero-initialization to be interpreted as the sentinel value, so I did
not pursue that course.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-ID: <20230608135653.2918540-7-eblake@redhat.com>
[eblake: enhance commit message]
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
2023-07-19 15:26:13 -05:00
Eric Blake 22efd81104 nbd: s/handle/cookie/ to match NBD spec
Externally, libnbd exposed the 64-bit opaque marker for each client
NBD packet as the "cookie", because it was less confusing when
contrasted with 'struct nbd_handle *' holding all libnbd state.  It
also avoids confusion between the noun 'handle' as a way to identify a
packet and the verb 'handle' for reacting to things like signals.
Upstream NBD changed their spec to favor the name "cookie" based on
libnbd's recommendations[1], so we can do likewise.

[1] https://github.com/NetworkBlockDevice/nbd/commit/ca4392eb2b

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-ID: <20230608135653.2918540-6-eblake@redhat.com>
[eblake: typo fix]
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
2023-07-19 15:25:30 -05:00
Stefan Hajnoczi 66547f416a block/nvme: invoke blk_io_plug_call() outside q->lock
blk_io_plug_call() is invoked outside a blk_io_plug()/blk_io_unplug()
section while opening the NVMe drive from:

  nvme_file_open() ->
  nvme_init() ->
  nvme_identify() ->
  nvme_admin_cmd_sync() ->
  nvme_submit_command() ->
  blk_io_plug_call()

blk_io_plug_call() immediately invokes the given callback when the
current thread is not plugged, as is the case during nvme_file_open().

Unfortunately, nvme_submit_command() calls blk_io_plug_call() with
q->lock still held:

    ...
    q->sq.tail = (q->sq.tail + 1) % NVME_QUEUE_SIZE;
    q->need_kick++;
    blk_io_plug_call(nvme_unplug_fn, q);
    qemu_mutex_unlock(&q->lock);
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^

nvme_unplug_fn() deadlocks trying to acquire q->lock because the lock is
already acquired by the same thread. The symptom is that QEMU hangs
during startup while opening the NVMe drive.

Fix this by moving the blk_io_plug_call() outside q->lock. This is safe
because no other thread runs code related to this queue and
blk_io_plug_call()'s internal state is immune to thread safety issues
since it is thread-local.

Reported-by: Lukáš Doktor <ldoktor@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-by: Lukas Doktor <ldoktor@redhat.com>
Message-id: 20230712191628.252806-1-stefanha@redhat.com
Fixes: f2e590002b ("block/nvme: convert to blk_io_plug_call() API")
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-07-17 09:17:41 -04:00
Stefan Hajnoczi c21eae1ccc block/blkio: fix module_block.py parsing
When QEMU is built with --enable-modules, the module_block.py script
parses block/*.c to find block drivers that are built as modules. The
script generates a table of block drivers called block_driver_modules[].
This table is used for block driver module loading.

The blkio.c driver uses macros to define its BlockDriver structs. This
was done to avoid code duplication but the module_block.py script is
unable to parse the macro. The result is that libblkio-based block
drivers can be built as modules but will not be found at runtime.

One fix is to make the module_block.py script or build system fancier so
it can parse C macros (e.g. by parsing the preprocessed source code). I
chose not to do this because it raises the complexity of the build,
making future issues harder to debug.

Keep things simple: use the macro to avoid duplicating BlockDriver
function pointers but define .format_name and .protocol_name manually
for each BlockDriver. This way the module_block.py is able to parse the
code.

Also get rid of the block driver name macros (e.g. DRIVER_IO_URING)
because module_block.py cannot parse them either.

Fixes: fd66dbd424 ("blkio: add libblkio block driver")
Reported-by: Qing Wang <qinwang@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Message-id: 20230704123436.187761-1-stefanha@redhat.com
Cc: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-07-04 17:28:25 +02:00
Richard Henderson 0eb8f90ede Block layer patches
- Re-enable the graph lock
 - More fixes to coroutine_fn marking
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCAAvFiEE3D3rFZqa+V09dFb+fwmycsiPL9YFAmScQCQRHGt3b2xmQHJl
 ZGhhdC5jb20ACgkQfwmycsiPL9bNSA//WIzPT45rFhl2U9QgyOJu26ho6ahsgwgI
 Z3QM5kCDB1dAN9USRPxhGboLGo8CyY7eeSwSrR7RtwBGYrWrAoJfGp5gK/7d9s5Q
 o0AGgRPnJGhFkBhRRMytsDsewM6Kk4IRmk4HMK3cOH3rsSM8RHs6KmDSBKesllu0
 QVGf3qW4u8LHyZyGM5OlPVUbtuDuK6/52FGhpXBp+x4oyNegOhjwO4mGOvTG+xIk
 Q5zwWZaPfjxaEDkvW8iahB6/D7Tpt64BmMf1Ydhxcd5eKEp932CiBI36aAlNKoRD
 Al5wztRx1GEh12ekN39jIi7Ypp3JX26keJcieKU0q656pT551UFRYjU0Rk08/Cca
 qv2oiQDu6bHgQ9zCQ1nMfa9+K2MyBwx0b5qfYkvs2RzgCTl8ImgBQANHfw8tz6Bq
 HUo1zsFBXCaK0boUB5iFwdf3rlx3t9UTEuDej/RaHqZjZD5xeG/smCcOlSfHaKUa
 wXfYxvm8ZfefJn1D6io1A+7M956uvIQNtmh13cU44clgFX9Y/bBNMg/5lMRsJKo8
 xxjvqCAyxo/pPfUsVWx4pc8AXbfVa85gyoSiaLEYZnqP54sJ2lFccqykCsTy58Lo
 VDcoPnoSc+LNqBOvtzxXgQbEWFCXU6fe0+TZgVYUvExWFIAOImeDWg2GD1JVrwsX
 e9QrPhL3DXg=
 =ZQcP
 -----END PGP SIGNATURE-----

Merge tag 'for-upstream' of https://repo.or.cz/qemu/kevin into staging

Block layer patches

- Re-enable the graph lock
- More fixes to coroutine_fn marking

# -----BEGIN PGP SIGNATURE-----
#
# iQJFBAABCAAvFiEE3D3rFZqa+V09dFb+fwmycsiPL9YFAmScQCQRHGt3b2xmQHJl
# ZGhhdC5jb20ACgkQfwmycsiPL9bNSA//WIzPT45rFhl2U9QgyOJu26ho6ahsgwgI
# Z3QM5kCDB1dAN9USRPxhGboLGo8CyY7eeSwSrR7RtwBGYrWrAoJfGp5gK/7d9s5Q
# o0AGgRPnJGhFkBhRRMytsDsewM6Kk4IRmk4HMK3cOH3rsSM8RHs6KmDSBKesllu0
# QVGf3qW4u8LHyZyGM5OlPVUbtuDuK6/52FGhpXBp+x4oyNegOhjwO4mGOvTG+xIk
# Q5zwWZaPfjxaEDkvW8iahB6/D7Tpt64BmMf1Ydhxcd5eKEp932CiBI36aAlNKoRD
# Al5wztRx1GEh12ekN39jIi7Ypp3JX26keJcieKU0q656pT551UFRYjU0Rk08/Cca
# qv2oiQDu6bHgQ9zCQ1nMfa9+K2MyBwx0b5qfYkvs2RzgCTl8ImgBQANHfw8tz6Bq
# HUo1zsFBXCaK0boUB5iFwdf3rlx3t9UTEuDej/RaHqZjZD5xeG/smCcOlSfHaKUa
# wXfYxvm8ZfefJn1D6io1A+7M956uvIQNtmh13cU44clgFX9Y/bBNMg/5lMRsJKo8
# xxjvqCAyxo/pPfUsVWx4pc8AXbfVa85gyoSiaLEYZnqP54sJ2lFccqykCsTy58Lo
# VDcoPnoSc+LNqBOvtzxXgQbEWFCXU6fe0+TZgVYUvExWFIAOImeDWg2GD1JVrwsX
# e9QrPhL3DXg=
# =ZQcP
# -----END PGP SIGNATURE-----
# gpg: Signature made Wed 28 Jun 2023 04:13:56 PM CEST
# gpg:                using RSA key DC3DEB159A9AF95D3D7456FE7F09B272C88F2FD6
# gpg:                issuer "kwolf@redhat.com"
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" [full]

* tag 'for-upstream' of https://repo.or.cz/qemu/kevin: (23 commits)
  block: use bdrv_co_debug_event in coroutine context
  block: use bdrv_co_getlength in coroutine context
  qcow2: mark more functions as coroutine_fns and GRAPH_RDLOCK
  vhdx: mark more functions as coroutine_fns and GRAPH_RDLOCK
  vmdk: mark more functions as coroutine_fns and GRAPH_RDLOCK
  dmg: mark more functions as coroutine_fns and GRAPH_RDLOCK
  cloop: mark more functions as coroutine_fns and GRAPH_RDLOCK
  block: mark another function as coroutine_fns and GRAPH_UNLOCKED
  bochs: mark more functions as coroutine_fns and GRAPH_RDLOCK
  vpc: mark more functions as coroutine_fns and GRAPH_RDLOCK
  qed: mark more functions as coroutine_fns and GRAPH_RDLOCK
  file-posix: remove incorrect coroutine_fn calls
  Revert "graph-lock: Disable locking for now"
  graph-lock: Unlock the AioContext while polling
  blockjob: Fix AioContext locking in block_job_add_bdrv()
  block: Fix AioContext locking in bdrv_open_backing_file()
  block: Fix AioContext locking in bdrv_open_inherit()
  block: Fix AioContext locking in bdrv_reopen_parse_file_or_backing()
  block: Fix AioContext locking in bdrv_attach_child_common()
  block: Fix AioContext locking in bdrv_open_child()
  ...

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-06-28 17:29:53 +02:00
Paolo Bonzini 17362398ee block: use bdrv_co_debug_event in coroutine context
bdrv_co_debug_event was recently introduced, with bdrv_debug_event
becoming a wrapper for use in unknown context.  Because most of the
time bdrv_debug_event is used on a BdrvChild via the wrapper macro
BLKDBG_EVENT, introduce a similar macro BLKDBG_CO_EVENT that calls
bdrv_co_debug_event, and switch whenever possible.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20230601115145.196465-13-pbonzini@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-06-28 09:46:34 +02:00
Paolo Bonzini 0af02bd107 block: use bdrv_co_getlength in coroutine context
bdrv_co_getlength was recently introduced, with bdrv_getlength becoming
a wrapper for use in unknown context.  Switch to bdrv_co_getlength when
possible.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20230601115145.196465-12-pbonzini@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-06-28 09:46:33 +02:00
Paolo Bonzini 70bacc4453 qcow2: mark more functions as coroutine_fns and GRAPH_RDLOCK
Mark functions as coroutine_fn when they are only called by other coroutine_fns
and they can suspend.  Change calls to co_wrappers to use the non-wrapped
functions, which in turn requires adding GRAPH_RDLOCK annotations.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20230601115145.196465-11-pbonzini@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-06-28 09:46:32 +02:00
Paolo Bonzini f6b0899493 vhdx: mark more functions as coroutine_fns and GRAPH_RDLOCK
Mark functions as coroutine_fn when they are only called by other coroutine_fns
and they can suspend.  Change calls to co_wrappers to use the non-wrapped
functions, which in turn requires adding GRAPH_RDLOCK annotations.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20230601115145.196465-10-pbonzini@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-06-28 09:46:30 +02:00
Paolo Bonzini 28944f99c4 vmdk: mark more functions as coroutine_fns and GRAPH_RDLOCK
Mark functions as coroutine_fn when they are only called by other coroutine_fns
and they can suspend.  Change calls to co_wrappers to use the non-wrapped
functions, which in turn requires adding GRAPH_RDLOCK annotations.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20230601115145.196465-9-pbonzini@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-06-28 09:46:29 +02:00
Paolo Bonzini 688dc49da5 dmg: mark more functions as coroutine_fns and GRAPH_RDLOCK
Mark functions as coroutine_fn when they are only called by other coroutine_fns
and they can suspend.  Change calls to co_wrappers to use the non-wrapped
functions, which in turn requires adding GRAPH_RDLOCK annotations.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20230601115145.196465-8-pbonzini@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-06-28 09:46:28 +02:00
Paolo Bonzini cf8d4c582b cloop: mark more functions as coroutine_fns and GRAPH_RDLOCK
Mark functions as coroutine_fn when they are only called by other coroutine_fns
and they can suspend.  Change calls to co_wrappers to use the non-wrapped
functions, which in turn requires adding GRAPH_RDLOCK annotations.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20230601115145.196465-7-pbonzini@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-06-28 09:46:27 +02:00
Paolo Bonzini e7918e9619 bochs: mark more functions as coroutine_fns and GRAPH_RDLOCK
Mark functions as coroutine_fn when they are only called by other coroutine_fns
and they can suspend.  Change calls to co_wrappers to use the non-wrapped
functions, which in turn requires adding GRAPH_RDLOCK annotations.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20230601115145.196465-5-pbonzini@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-06-28 09:46:24 +02:00
Paolo Bonzini 517b5dfffd vpc: mark more functions as coroutine_fns and GRAPH_RDLOCK
Mark functions as coroutine_fn when they are only called by other coroutine_fns
and they can suspend.  Change calls to co_wrappers to use the non-wrapped
functions, which in turn requires adding GRAPH_RDLOCK annotations.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20230601115145.196465-4-pbonzini@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-06-28 09:46:23 +02:00
Paolo Bonzini bba667da7f qed: mark more functions as coroutine_fns and GRAPH_RDLOCK
Mark functions as coroutine_fn when they are only called by other coroutine_fns
and they can suspend.  Change calls to co_wrappers to use the non-wrapped
functions, which in turn requires adding GRAPH_RDLOCK annotations.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20230601115145.196465-3-pbonzini@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-06-28 09:46:20 +02:00
Paolo Bonzini 36c6c8773a file-posix: remove incorrect coroutine_fn calls
raw_co_getlength is called by handle_aiocb_write_zeroes, which is not a coroutine
function.  This is harmless because raw_co_getlength does not actually suspend,
but in the interest of clarity make it a non-coroutine_fn that is just wrapped
by the coroutine_fn raw_co_getlength.  Likewise, check_cache_dropped was only
a coroutine_fn because it called raw_co_getlength, so it can be made non-coroutine
as well.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20230601115145.196465-2-pbonzini@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-06-28 09:46:14 +02:00
Kevin Wolf 3cce22defb Revert "graph-lock: Disable locking for now"
Now that bdrv_graph_wrlock() temporarily drops the AioContext lock that
its caller holds, it can poll without causing deadlocks. We can now
re-enable graph locking.

This reverts commit ad128dff0bf4b6f971d05eb4335a627883a19c1d.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20230605085711.21261-12-kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-06-28 08:46:26 +02:00
Kevin Wolf 31b2ddfea3 graph-lock: Unlock the AioContext while polling
If the caller keeps the AioContext lock for a block node in an iothread,
polling in bdrv_graph_wrlock() deadlocks if the condition isn't
fulfilled immediately.

Now that all callers make sure to actually have the AioContext locked
when they call bdrv_replace_child_noperm() like they should, we can
change bdrv_graph_wrlock() to take a BlockDriverState whose AioContext
lock the caller holds (NULL if it doesn't) and unlock it temporarily
while polling.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20230605085711.21261-11-kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-06-28 08:46:23 +02:00
Manos Pitsidianakis f8ed3648b5 vhost-user: fully use new backend/frontend naming
Slave/master nomenclature was replaced with backend/frontend in commit
1fc19b6527 ("vhost-user: Adopt new backend naming")

This patch replaces all remaining uses of master and slave in the
codebase.

Signed-off-by: Emmanouil Pitsidianakis <manos.pitsidianakis@linaro.org>
Message-Id: <20230613080849.2115347-1-manos.pitsidianakis@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
2023-06-26 09:50:00 -04:00
Philippe Mathieu-Daudé de6cd7599b meson: Replace softmmu_ss -> system_ss
We use the user_ss[] array to hold the user emulation sources,
and the softmmu_ss[] array to hold the system emulation ones.
Hold the latter in the 'system_ss[]' array for parity with user
emulation.

Mechanical change doing:

  $ sed -i -e s/softmmu_ss/system_ss/g $(git grep -l softmmu_ss)

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20230613133347.82210-10-philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-06-20 10:01:30 +02:00
Richard Henderson 7ce5a15fa6 * Fix emulated LCCB, LOCFHR, MXDB and MXDBR s390x instructions
* Fix the malta machine on s390x (big endian) hosts
 * Emulate /proc/cpuinfo on s390x
 * Remove pointless QOM casts
 * Improve the inclusion logic for libkeyutils and ipmi-bt-test in meson.build
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCAAvFiEEJ7iIR+7gJQEY8+q5LtnXdP5wLbUFAmR+ycgRHHRodXRoQHJl
 ZGhhdC5jb20ACgkQLtnXdP5wLbWXXw//WPz3ng50KLS+M1t3/ULEjO6XkGfP2LQZ
 RsZq3hf9THFPZgcREk+6SQvttOSTuvHNfakfujS6U1Ou5thReWqLe4itFW6+hB5j
 kQ+Sm6YJ+fpezkBnSefcUoL5nA9VVKZ6KE6kxq5CUBZNoIk1sSsfrU8y8wjzW0yg
 2nraOcG10aLpO2BfvKHVEAhJtwl9pHJsFANmHC2/h2wC9BZIAzdxiytzdcJ909gN
 AAa0hIrLK/oFgJjkSSxu+QTaVGPARXqkx5WV546F/zmDMFUWd9nrXaegwqxjgPBN
 m9Ua0SXll5hX2Z57vjJWlbTYkD+JUB22L0N7p5/xzhYRpLVSq1pdveo9psrzIC3E
 Bt7chZB58acQepJHxxa3UHDOHcnfdfaN+Dd9wD29wHr7nK8lOcsen7/7V+5YXomc
 qenkCtkpjKTl07OBxe6MDGZtPZYA8fK1CjEyYwHCe8QvxEzsyg96Bm3j4N2VPxQU
 +f/sFPX7SgogZI4mB4wdoxOF1RmQ+DXQ2tnB970txZRkmFq2jJHpW86jkkbq2Jl1
 KIjgdIXjVgy+MPtuQzO5cT+jfhGQL7FQynGXHjv/UidBid5XD3TDVNa9AthN3Mng
 +rPT90VJ7j9soMqvmNT1COSIRD+M49dQKBIQuq/gWplaTOHaAcJrCwYScwqe0u0P
 zmjCNeuPVw8=
 =dfJr
 -----END PGP SIGNATURE-----

Merge tag 'pull-request-2023-06-06' of https://gitlab.com/thuth/qemu into staging

* Fix emulated LCCB, LOCFHR, MXDB and MXDBR s390x instructions
* Fix the malta machine on s390x (big endian) hosts
* Emulate /proc/cpuinfo on s390x
* Remove pointless QOM casts
* Improve the inclusion logic for libkeyutils and ipmi-bt-test in meson.build

# -----BEGIN PGP SIGNATURE-----
#
# iQJFBAABCAAvFiEEJ7iIR+7gJQEY8+q5LtnXdP5wLbUFAmR+ycgRHHRodXRoQHJl
# ZGhhdC5jb20ACgkQLtnXdP5wLbWXXw//WPz3ng50KLS+M1t3/ULEjO6XkGfP2LQZ
# RsZq3hf9THFPZgcREk+6SQvttOSTuvHNfakfujS6U1Ou5thReWqLe4itFW6+hB5j
# kQ+Sm6YJ+fpezkBnSefcUoL5nA9VVKZ6KE6kxq5CUBZNoIk1sSsfrU8y8wjzW0yg
# 2nraOcG10aLpO2BfvKHVEAhJtwl9pHJsFANmHC2/h2wC9BZIAzdxiytzdcJ909gN
# AAa0hIrLK/oFgJjkSSxu+QTaVGPARXqkx5WV546F/zmDMFUWd9nrXaegwqxjgPBN
# m9Ua0SXll5hX2Z57vjJWlbTYkD+JUB22L0N7p5/xzhYRpLVSq1pdveo9psrzIC3E
# Bt7chZB58acQepJHxxa3UHDOHcnfdfaN+Dd9wD29wHr7nK8lOcsen7/7V+5YXomc
# qenkCtkpjKTl07OBxe6MDGZtPZYA8fK1CjEyYwHCe8QvxEzsyg96Bm3j4N2VPxQU
# +f/sFPX7SgogZI4mB4wdoxOF1RmQ+DXQ2tnB970txZRkmFq2jJHpW86jkkbq2Jl1
# KIjgdIXjVgy+MPtuQzO5cT+jfhGQL7FQynGXHjv/UidBid5XD3TDVNa9AthN3Mng
# +rPT90VJ7j9soMqvmNT1COSIRD+M49dQKBIQuq/gWplaTOHaAcJrCwYScwqe0u0P
# zmjCNeuPVw8=
# =dfJr
# -----END PGP SIGNATURE-----
# gpg: Signature made Mon 05 Jun 2023 10:53:12 PM PDT
# gpg:                using RSA key 27B88847EEE0250118F3EAB92ED9D774FE702DB5
# gpg:                issuer "thuth@redhat.com"
# gpg: Good signature from "Thomas Huth <th.huth@gmx.de>" [unknown]
# gpg:                 aka "Thomas Huth <thuth@redhat.com>" [unknown]
# gpg:                 aka "Thomas Huth <th.huth@posteo.de>" [unknown]
# gpg:                 aka "Thomas Huth <huth@tuxfamily.org>" [unknown]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg:          There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 27B8 8847 EEE0 2501 18F3  EAB9 2ED9 D774 FE70 2DB5

* tag 'pull-request-2023-06-06' of https://gitlab.com/thuth/qemu:
  linux-user: Emulate /proc/cpuinfo on s390x
  linux-user/elfload: Introduce elf_hwcap_str() on s390x
  linux-user/elfload: Expose get_elf_hwcap() on s390x
  s390x/tcg: Fix CPU address returned by STIDP
  bulk: Remove pointless QOM casts
  scripts: Add qom-cast-macro-clean-cocci-gen.py
  hw/mips/malta: Fix the malta machine on big endian hosts
  gitlab-ci: Remove unused Python package
  tests/qtest: Run ipmi-bt-test only if CONFIG_IPMI_EXTERN is set
  tests/tcg/s390x: Test MXDB and MXDBR
  target/s390x: Fix MXDB and MXDBR
  Add conditional dependency for libkeyutils
  tests/tcg/s390x: Test single-stepping SVC
  linux-user/s390x: Fix single-stepping SVC
  tests/tcg/s390x: Test LOCFHR
  target/s390x: Fix LOCFHR taking the wrong half of R2
  tests/tcg/s390x: Test LCBB
  target/s390x: Fix LCBB overwriting the top 32 bits

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-06-06 07:07:37 -07:00
Philippe Mathieu-Daudé 7d5b0d6864 bulk: Remove pointless QOM casts
Mechanical change running Coccinelle spatch with content
generated from the qom-cast-macro-clean-cocci-gen.py added
in the previous commit.

Suggested-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20230601093452.38972-3-philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Thomas Huth <thuth@redhat.com>
2023-06-05 20:48:34 +02:00
Jean-Louis Dupond 42a2890a76 qcow2: add discard-no-unref option
When we for example have a sparse qcow2 image and discard: unmap is enabled,
there can be a lot of fragmentation in the image after some time. Especially on VM's
that do a lot of writes/deletes.
This causes the qcow2 image to grow even over 110% of its virtual size,
because the free gaps in the image get too small to allocate new
continuous clusters. So it allocates new space at the end of the image.

Disabling discard is not an option, as discard is needed to keep the
incremental backup size as low as possible. Without discard, the
incremental backups would become large, as qemu thinks it's just dirty
blocks but it doesn't know the blocks are unneeded.
So we need to avoid fragmentation but also 'empty' the unneeded blocks in
the image to have a small incremental backup.

In addition, we also want to send the discards further down the stack, so
the underlying blocks are still discarded.

Therefor we introduce a new qcow2 option "discard-no-unref".
When setting this option to true, discards will no longer have the qcow2
driver relinquish cluster allocations. Other than that, the request is
handled as normal: All clusters in range are marked as zero, and, if
pass-discard-request is true, it is passed further down the stack.
The only difference is that the now-zero clusters are preallocated
instead of being unallocated.
This will avoid fragmentation on the qcow2 image.

Fixes: https://gitlab.com/qemu-project/qemu/-/issues/1621
Signed-off-by: Jean-Louis Dupond <jean-louis@dupond.be>
Message-Id: <20230605084523.34134-2-jean-louis@dupond.be>
Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
2023-06-05 13:15:42 +02:00
Alexander Ivanov 029136f261 parallels: Incorrect condition in out-of-image check
All the offsets in the BAT must be lower than the file size.
Fix the check condition for correct check.

Signed-off-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
Reviewed-by: Denis V. Lunev <den@openvz.org>
Message-Id: <20230424093147.197643-13-alexander.ivanov@virtuozzo.com>
Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
2023-06-05 13:13:59 +02:00
Alexander Ivanov c0fc051dd4 parallels: Replace qemu_co_mutex_lock by WITH_QEMU_LOCK_GUARD
Replace the way we use mutex in parallels_co_check() for simplier
and less error prone code.

Signed-off-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
Reviewed-by: Denis V. Lunev <den@openvz.org>
Message-Id: <20230424093147.197643-12-alexander.ivanov@virtuozzo.com>
Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
2023-06-05 13:13:58 +02:00
Alexander Ivanov 7e259e2540 parallels: Move statistic collection to a separate function
We will add more and more checks so we need a better code structure
in parallels_co_check. Let each check performs in a separate loop
in a separate helper.

Signed-off-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
Reviewed-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Message-Id: <20230424093147.197643-11-alexander.ivanov@virtuozzo.com>
Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
2023-06-05 13:13:57 +02:00
Alexander Ivanov 09a21edfaf parallels: Move check of leaks to a separate function
We will add more and more checks so we need a better code structure
in parallels_co_check. Let each check performs in a separate loop
in a separate helper.

Signed-off-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
Message-Id: <20230424093147.197643-10-alexander.ivanov@virtuozzo.com>
Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
2023-06-05 13:13:56 +02:00
Alexander Ivanov 9616f7a6c2 parallels: Fix statistics calculation
Exclude out-of-image clusters from allocated and fragmented clusters
calculation.

Signed-off-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
Message-Id: <20230424093147.197643-9-alexander.ivanov@virtuozzo.com>
Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
2023-06-05 13:13:55 +02:00
Alexander Ivanov 6d416e56a7 parallels: Move check of cluster outside image to a separate function
We will add more and more checks so we need a better code structure in
parallels_co_check. Let each check performs in a separate loop in a
separate helper.

Signed-off-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
Reviewed-by: Denis V. Lunev <den@openvz.org>
Message-Id: <20230424093147.197643-8-alexander.ivanov@virtuozzo.com>
Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
2023-06-05 13:13:54 +02:00
Alexander Ivanov 96de69c7d9 parallels: Move check of unclean image to a separate function
We will add more and more checks so we need a better code structure
in parallels_co_check. Let each check performs in a separate loop
in a separate helper.

Signed-off-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
Reviewed-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Message-Id: <20230424093147.197643-7-alexander.ivanov@virtuozzo.com>
Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
2023-06-05 13:13:53 +02:00
Alexander Ivanov 3569cb7b78 parallels: Use generic infrastructure for BAT writing in parallels_co_check()
BAT is written in the context of conventional operations over the image
inside bdrv_co_flush() when it calls parallels_co_flush_to_os() callback.
Thus we should not modify BAT array directly, but call
parallels_set_bat_entry() helper and bdrv_co_flush() further on. After
that there is no need to manually write BAT and track its modification.

This makes code more generic and allows to split parallels_set_bat_entry()
for independent pieces.

Signed-off-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
Reviewed-by: Denis V. Lunev <den@openvz.org>
Message-Id: <20230424093147.197643-6-alexander.ivanov@virtuozzo.com>
Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
2023-06-05 13:13:52 +02:00
Alexander Ivanov b64b29b96b parallels: create parallels_set_bat_entry_helper() to assign BAT value
This helper will be reused in next patches during parallels_co_check
rework to simplify its code.

Signed-off-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
Reviewed-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Message-Id: <20230424093147.197643-5-alexander.ivanov@virtuozzo.com>
Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
2023-06-05 13:13:51 +02:00
Alexander Ivanov 679749ce41 parallels: Fix image_end_offset and data_end after out-of-image check
Set data_end to the end of the last cluster inside the image. In such a
way we can be sure that corrupted offsets in the BAT can't affect on the
image size. If there are no allocated clusters set image_end_offset by
data_end.

Signed-off-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
Reviewed-by: Denis V. Lunev <den@openvz.org>
Message-Id: <20230424093147.197643-4-alexander.ivanov@virtuozzo.com>
Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
2023-06-05 13:13:50 +02:00
Alexander Ivanov ab2d739c41 parallels: Fix high_off calculation in parallels_co_check()
Don't let high_off be more than the file size even if we don't fix the
image.

Signed-off-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
Reviewed-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Message-Id: <20230424093147.197643-3-alexander.ivanov@virtuozzo.com>
Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
2023-06-05 13:13:49 +02:00
Alexander Ivanov f5e715dbbb parallels: Out of image offset in BAT leads to image inflation
data_end field in BDRVParallelsState is set to the biggest offset present
in BAT. If this offset is outside of the image, any further write will
create the cluster at this offset and/or the image will be truncated to
this offset on close. This is definitely not correct.

Raise an error in parallels_open() if data_end points outside the image
and it is not a check (let the check to repaire the image). Set data_end
to the end of the cluster with the last correct offset.

Signed-off-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
Message-Id: <20230424093147.197643-2-alexander.ivanov@virtuozzo.com>
Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
2023-06-05 13:13:47 +02:00
Hanna Czenczek 18743311b8 block: Collapse padded I/O vecs exceeding IOV_MAX
When processing vectored guest requests that are not aligned to the
storage request alignment, we pad them by adding head and/or tail
buffers for a read-modify-write cycle.

The guest can submit I/O vectors up to IOV_MAX (1024) in length, but
with this padding, the vector can exceed that limit.  As of
4c002cef0e ("util/iov: make
qemu_iovec_init_extended() honest"), we refuse to pad vectors beyond the
limit, instead returning an error to the guest.

To the guest, this appears as a random I/O error.  We should not return
an I/O error to the guest when it issued a perfectly valid request.

Before 4c002cef0e, we just made the vector
longer than IOV_MAX, which generally seems to work (because the guest
assumes a smaller alignment than we really have, file-posix's
raw_co_prw() will generally see bdrv_qiov_is_aligned() return false, and
so emulate the request, so that the IOV_MAX does not matter).  However,
that does not seem exactly great.

I see two ways to fix this problem:
1. We split such long requests into two requests.
2. We join some elements of the vector into new buffers to make it
   shorter.

I am wary of (1), because it seems like it may have unintended side
effects.

(2) on the other hand seems relatively simple to implement, with
hopefully few side effects, so this patch does that.

To do this, the use of qemu_iovec_init_extended() in bdrv_pad_request()
is effectively replaced by the new function bdrv_create_padded_qiov(),
which not only wraps the request IOV with padding head/tail, but also
ensures that the resulting vector will not have more than IOV_MAX
elements.  Putting that functionality into qemu_iovec_init_extended() is
infeasible because it requires allocating a bounce buffer; doing so
would require many more parameters (buffer alignment, how to initialize
the buffer, and out parameters like the buffer, its length, and the
original elements), which is not reasonable.

Conversely, it is not difficult to move qemu_iovec_init_extended()'s
functionality into bdrv_create_padded_qiov() by using public
qemu_iovec_* functions, so that is what this patch does.

Because bdrv_pad_request() was the only "serious" user of
qemu_iovec_init_extended(), the next patch will remove the latter
function, so the functionality is not implemented twice.

Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2141964
Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
Message-Id: <20230411173418.19549-3-hreitz@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
2023-06-05 13:11:06 +02:00
Eric Blake bd1386cce1 cutils: Adjust signature of parse_uint[_full]
It's already confusing that we have two very similar functions for
wrapping the parse of a 64-bit unsigned value, differing mainly on
whether they permit leading '-'.  Adjust the signature of parse_uint()
and parse_uint_full() to be like all of qemu_strto*(): put the result
parameter last, use the same types (uint64_t and unsigned long long
have the same width, but are not always the same type), and mark
endptr const (this latter change only affects the rare caller of
parse_uint).  Adjust all callers in the tree.

While at it, note that since cutils.c already includes:

    QEMU_BUILD_BUG_ON(sizeof(int64_t) != sizeof(long long));

we are guaranteed that the result of parse_uint* cannot exceed
UINT64_MAX (or the build would have failed), so we can drop
pre-existing dead comparisons in opts-visitor.c that were never false.

Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
Message-Id: <20230522190441.64278-8-eblake@redhat.com>
[eblake: Drop dead code spotted by Markus]
Signed-off-by: Eric Blake <eblake@redhat.com>
2023-06-02 12:27:19 -05:00
Stefano Garzarella cad2ccc395 block/blkio: use qemu_open() to support fd passing for virtio-blk
Some virtio-blk drivers (e.g. virtio-blk-vhost-vdpa) supports the fd
passing. Let's expose this to the user, so the management layer
can pass the file descriptor of an already opened path.

If the libblkio virtio-blk driver supports fd passing, let's always
use qemu_open() to open the `path`, so we can handle fd passing
from the management layer through the "/dev/fdset/N" special path.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Message-id: 20230530071941.8954-2-sgarzare@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-06-01 11:08:21 -04:00
Stefan Hajnoczi 2a0d7cb6b7 block: remove bdrv_co_io_plug() API
No block driver implements .bdrv_co_io_plug() anymore. Get rid of the
function pointers.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Acked-by: Kevin Wolf <kwolf@redhat.com>
Message-id: 20230530180959.1108766-7-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-06-01 08:59:24 -04:00
Stefan Hajnoczi 076682885d block/linux-aio: convert to blk_io_plug_call() API
Stop using the .bdrv_co_io_plug() API because it is not multi-queue
block layer friendly. Use the new blk_io_plug_call() API to batch I/O
submission instead.

Note that a dev_max_batch check is dropped in laio_io_unplug() because
the semantics of unplug_fn() are different from .bdrv_co_unplug():
1. unplug_fn() is only called when the last blk_io_unplug() call occurs,
   not every time blk_io_unplug() is called.
2. unplug_fn() is per-thread, not per-BlockDriverState, so there is no
   way to get per-BlockDriverState fields like dev_max_batch.

Therefore this condition cannot be moved to laio_unplug_fn(). It is not
obvious that this condition affects performance in practice, so I am
removing it instead of trying to come up with a more complex mechanism
to preserve the condition.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Acked-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Message-id: 20230530180959.1108766-6-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-06-01 07:34:03 -04:00
Stefan Hajnoczi 6a6da231b7 block/io_uring: convert to blk_io_plug_call() API
Stop using the .bdrv_co_io_plug() API because it is not multi-queue
block layer friendly. Use the new blk_io_plug_call() API to batch I/O
submission instead.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Acked-by: Kevin Wolf <kwolf@redhat.com>
Message-id: 20230530180959.1108766-5-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-06-01 07:34:03 -04:00
Stefan Hajnoczi 28ff7b4dfb block/blkio: convert to blk_io_plug_call() API
Stop using the .bdrv_co_io_plug() API because it is not multi-queue
block layer friendly. Use the new blk_io_plug_call() API to batch I/O
submission instead.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Acked-by: Kevin Wolf <kwolf@redhat.com>
Message-id: 20230530180959.1108766-4-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-06-01 07:34:03 -04:00
Stefan Hajnoczi f2e590002b block/nvme: convert to blk_io_plug_call() API
Stop using the .bdrv_co_io_plug() API because it is not multi-queue
block layer friendly. Use the new blk_io_plug_call() API to batch I/O
submission instead.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Acked-by: Kevin Wolf <kwolf@redhat.com>
Message-id: 20230530180959.1108766-3-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-06-01 07:34:03 -04:00
Stefan Hajnoczi 41abca8c39 block: add blk_io_plug_call() API
Introduce a new API for thread-local blk_io_plug() that does not
traverse the block graph. The goal is to make blk_io_plug() multi-queue
friendly.

Instead of having block drivers track whether or not we're in a plugged
section, provide an API that allows them to defer a function call until
we're unplugged: blk_io_plug_call(fn, opaque). If blk_io_plug_call() is
called multiple times with the same fn/opaque pair, then fn() is only
called once at the end of the function - resulting in batching.

This patch introduces the API and changes blk_io_plug()/blk_io_unplug().
blk_io_plug()/blk_io_unplug() no longer require a BlockBackend argument
because the plug state is now thread-local.

Later patches convert block drivers to blk_io_plug_call() and then we
can finally remove .bdrv_co_io_plug() once all block drivers have been
converted.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Acked-by: Kevin Wolf <kwolf@redhat.com>
Message-id: 20230530180959.1108766-2-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-06-01 07:34:03 -04:00
Stefan Hajnoczi 60f782b6b7 aio: remove aio_disable_external() API
All callers now pass is_external=false to aio_set_fd_handler() and
aio_set_event_notifier(). The aio_disable_external() API that
temporarily disables fd handlers that were registered is_external=true
is therefore dead code.

Remove aio_disable_external(), aio_enable_external(), and the
is_external arguments to aio_set_fd_handler() and
aio_set_event_notifier().

The entire test-fdmon-epoll test is removed because its sole purpose was
testing aio_disable_external().

Parts of this patch were generated using the following coccinelle
(https://coccinelle.lip6.fr/) semantic patch:

  @@
  expression ctx, fd, is_external, io_read, io_write, io_poll, io_poll_ready, opaque;
  @@
  - aio_set_fd_handler(ctx, fd, is_external, io_read, io_write, io_poll, io_poll_ready, opaque)
  + aio_set_fd_handler(ctx, fd, io_read, io_write, io_poll, io_poll_ready, opaque)

  @@
  expression ctx, notifier, is_external, io_read, io_poll, io_poll_ready;
  @@
  - aio_set_event_notifier(ctx, notifier, is_external, io_read, io_poll, io_poll_ready)
  + aio_set_event_notifier(ctx, notifier, io_read, io_poll, io_poll_ready)

Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20230516190238.8401-21-stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-05-30 17:37:26 +02:00
Stefan Hajnoczi 17b69c0fc1 block/fuse: do not set is_external=true on FUSE fd
This is part of ongoing work to remove the aio_disable_external() API.

Use BlockDevOps .drained_begin/end/poll() instead of
aio_set_fd_handler(is_external=true).

As a side-effect the FUSE export now follows AioContext changes like the
other export types.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20230516190238.8401-16-stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-05-30 17:32:02 +02:00
Stefan Hajnoczi 3d499a43a2 block/export: don't require AioContext lock around blk_exp_ref/unref()
The FUSE export calls blk_exp_ref/unref() without the AioContext lock.
Instead of fixing the FUSE export, adjust blk_exp_ref/unref() so they
work without the AioContext lock. This way it's less error-prone.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20230516190238.8401-15-stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-05-30 17:32:02 +02:00
Stefan Hajnoczi 195332c1d9 block/export: rewrite vduse-blk drain code
vduse_blk_detach_ctx() waits for in-flight requests using
AIO_WAIT_WHILE(). This is not allowed according to a comment in
bdrv_set_aio_context_commit():

  /*
   * Take the old AioContex when detaching it from bs.
   * At this point, new_context lock is already acquired, and we are now
   * also taking old_context. This is safe as long as bdrv_detach_aio_context
   * does not call AIO_POLL_WHILE().
   */

Use this opportunity to rewrite the drain code in vduse-blk:

- Use the BlockExport refcount so that vduse_blk_exp_delete() is only
  called when there are no more requests in flight.

- Implement .drained_poll() so in-flight request coroutines are stopped
  by the time .bdrv_detach_aio_context() is called.

- Remove AIO_WAIT_WHILE() from vduse_blk_detach_ctx() to solve the
  .bdrv_detach_aio_context() constraint violation. It's no longer
  needed due to the previous changes.

- Always handle the VDUSE file descriptor, even in drained sections. The
  VDUSE file descriptor doesn't submit I/O, so it's safe to handle it in
  drained sections. This ensures that the VDUSE kernel code gets a fast
  response.

- Suspend virtqueue fd handlers in .drained_begin() and resume them in
  .drained_end(). This eliminates the need for the
  aio_set_fd_handler(is_external=true) flag, which is being removed from
  QEMU.

This is a long list but splitting it into individual commits would
probably lead to git bisect failures - the changes are all related.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20230516190238.8401-14-stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-05-30 17:32:02 +02:00
Stefan Hajnoczi ab61335025 block: drain from main loop thread in bdrv_co_yield_to_drain()
For simplicity, always run BlockDevOps .drained_begin/end/poll()
callbacks in the main loop thread. This makes it easier to implement the
callbacks and avoids extra locks.

Move the function pointer declarations from the I/O Code section to the
Global State section for BlockDevOps, BdrvChildClass, and BlockDriver.

Narrow IO_OR_GS_CODE() to GLOBAL_STATE_CODE() where appropriate.

The test-bdrv-drain test case calls bdrv_drain() from an IOThread. This
is now only allowed from coroutine context, so update the test case to
run in a coroutine.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20230516190238.8401-11-stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-05-30 17:32:02 +02:00
Stefan Hajnoczi ff82b7835b block: add blk_in_drain() API
The BlockBackend quiesce_counter is greater than zero during drained
sections. Add an API to check whether the BlockBackend is in a drained
section.

The next patch will use this API.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20230516190238.8401-10-stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-05-30 17:32:02 +02:00
Stefan Hajnoczi e1054cd4aa block/export: stop using is_external in vhost-user-blk server
vhost-user activity must be suspended during bdrv_drained_begin/end().
This prevents new requests from interfering with whatever is happening
in the drained section.

Previously this was done using aio_set_fd_handler()'s is_external
argument. In a multi-queue block layer world the aio_disable_external()
API cannot be used since multiple AioContext may be processing I/O, not
just one.

Switch to BlockDevOps->drained_begin/end() callbacks.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20230516190238.8401-8-stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-05-30 17:32:02 +02:00
Stefan Hajnoczi 8f5e9a8ee1 block/export: wait for vhost-user-blk requests when draining
Each vhost-user-blk request runs in a coroutine. When the BlockBackend
enters a drained section we need to enter a quiescent state. Currently
any in-flight requests race with bdrv_drained_begin() because it is
unaware of vhost-user-blk requests.

When blk_co_preadv/pwritev()/etc returns it wakes the
bdrv_drained_begin() thread but vhost-user-blk request processing has
not yet finished. The request coroutine continues executing while the
main loop thread thinks it is in a drained section.

One example where this is unsafe is for blk_set_aio_context() where
bdrv_drained_begin() is called before .aio_context_detached() and
.aio_context_attach(). If request coroutines are still running after
bdrv_drained_begin(), then the AioContext could change underneath them
and they race with new requests processed in the new AioContext. This
could lead to virtqueue corruption, for example.

(This example is theoretical, I came across this while reading the
code and have not tried to reproduce it.)

It's easy to make bdrv_drained_begin() wait for in-flight requests: add
a .drained_poll() callback that checks the VuServer's in-flight counter.
VuServer just needs an API that returns true when there are requests in
flight. The in-flight counter needs to be atomic.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20230516190238.8401-7-stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-05-30 17:32:02 +02:00
Stefan Hajnoczi 75d33e8525 util/vhost-user-server: rename refcount to in_flight counter
The VuServer object has a refcount field and ref/unref APIs. The name is
confusing because it's actually an in-flight request counter instead of
a refcount.

Normally a refcount destroys the object upon reaching zero. The VuServer
counter is used to wake up the vhost-user coroutine when there are no
more requests.

Avoid confusing by renaming refcount and ref/unref to in_flight and
inc/dec.

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20230516190238.8401-6-stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-05-30 17:32:02 +02:00
Stefan Hajnoczi 2d19629581 block-backend: split blk_do_set_aio_context()
blk_set_aio_context() is not fully transactional because
blk_do_set_aio_context() updates blk->ctx outside the transaction. Most
of the time this goes unnoticed but a BlockDevOps.drained_end() callback
that invokes blk_get_aio_context() fails assert(ctx == blk->ctx). This
happens because blk->ctx is only assigned after
BlockDevOps.drained_end() is called and we're in an intermediate state
where BlockDrvierState nodes already have the new context and the
BlockBackend still has the old context.

Making blk_set_aio_context() fully transactional solves this assertion
failure because the BlockBackend's context is updated as part of the
transaction (before BlockDevOps.drained_end() is called).

Split blk_do_set_aio_context() in order to solve this assertion failure.
This helper function actually serves two different purposes:
1. It drives blk_set_aio_context().
2. It responds to BdrvChildClass->change_aio_ctx().

Get rid of the helper function. Do #1 inside blk_set_aio_context() and
do #2 inside blk_root_set_aio_ctx_commit(). This simplifies the code.

The only drawback of the fully transactional approach is that
blk_set_aio_context() must contend with blk_root_set_aio_ctx_commit()
being invoked as part of the AioContext change propagation. This can be
solved by temporarily setting blk->allow_aio_context_change to true.

Future patches call blk_get_aio_context() from
BlockDevOps->drained_end(), so this patch will become necessary.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20230516190238.8401-2-stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-05-30 17:32:02 +02:00
Kevin Wolf 9102f2ebdb copy-before-write: Fix open with child in iothread
The AioContext lock must not be held for bdrv_open_child(), but it is
necessary for the following operations, in particular those using nested
event loops in coroutine wrappers.

Temporarily dropping the main AioContext lock is not necessary because
we know we run in the main thread.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20230525124713.401149-9-kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-05-30 17:29:35 +02:00
Kevin Wolf 6e01215932 raw-format: Fix open with 'file' in iothread
When opening the 'file' child moves bs to an iothread, we need to hold
the AioContext lock of it before we can call raw_apply_options() (and
more specifically, bdrv_getlength() inside of it).

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20230525124713.401149-8-kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-05-30 17:21:23 +02:00
Kevin Wolf aa269ff888 qcow2: Fix open with 'file' in iothread
qcow2_open() doesn't work correctly when opening the 'file' child moves
bs to an iothread, for several reasons:

- It uses BDRV_POLL_WHILE() to wait for the qcow2_open_entry()
  coroutine, which involves dropping the AioContext lock for bs when it
  is not in the main context - but we don't hold it, so this crashes.

- It runs the qcow2_open_entry() coroutine in the current thread instead
  of the new AioContext of bs.

- qcow2_open_entry() doesn't notify the main loop when it's done.

This patches fixes these issues around delegating work to a coroutine.
Temporarily dropping the main AioContext lock is not necessary because
we know we run in the main thread.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20230525124713.401149-7-kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-05-30 17:21:23 +02:00
Kevin Wolf 2626d27f50 mirror: Hold main AioContext lock for calling bdrv_open_backing_file()
bdrv_open_backing_file() calls bdrv_open_inherit(), so all callers must
hold the main AioContext lock.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20230525124713.401149-6-kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-05-30 17:21:23 +02:00
Kevin Wolf 4c20dd24b1 block-backend: Fix blk_new_open() for iothreads
This fixes blk_new_open() to not assume that bs is in the main context.

In particular, the BlockBackend must be created with the right
AioContext because it will refuse to move to a different context
afterwards. (blk->allow_aio_context_change is false.)

Use this opportunity to use blk_insert_bs() instead of duplicating the
bdrv_root_attach_child() call. This is consistent with what
blk_new_with_bs() does. Add comments to document the locking rules.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20230525124713.401149-5-kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-05-30 17:21:23 +02:00
Kevin Wolf c6e0a6de62 block: Take main AioContext lock when calling bdrv_open()
The function documentation already says that all callers must hold the
main AioContext lock, but not all of them do. This can cause assertion
failures when functions called by bdrv_open() try to drop the lock. Fix
a few more callers to take the lock before calling bdrv_open().

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20230525124713.401149-4-kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-05-30 17:21:23 +02:00
Kevin Wolf dea97c1fbd block-coroutine-wrapper: Take AioContext lock in no_co_wrappers
All of the functions that currently take a BlockDriverState, BdrvChild
or BlockBackend as their first parameter expect the associated
AioContext to be locked when they are called. In the case of
no_co_wrappers, they are called from bottom halves directly in the main
loop, so no other caller can be expected to take the lock for them. This
can result in assertion failures because a lock that isn't taken is
released in nested event loops.

Looking at the first parameter is already done by co_wrappers to decide
where the coroutine should run, so doing the same in no_co_wrappers is
only consistent. Take the lock in the generated bottom halves to fix the
problem.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20230525124713.401149-2-kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-05-30 17:21:23 +02:00
Kevin Wolf 80fc5d2600 graph-lock: Disable locking for now
In QEMU 8.0, we've been seeing deadlocks in bdrv_graph_wrlock(). They
come from callers that hold an AioContext lock, which is not allowed
during polling. In theory, we could temporarily release the lock, but
callers are inconsistent about whether they hold a lock, and if they do,
some are also confused about which one they hold. While all of this is
fixable, it's not trivial, and the best course of action for 8.0.1 is
probably just disabling the graph locking code temporarily.

We don't currently rely on graph locking yet. It is supposed to replace
the AioContext lock eventually to enable multiqueue support, but as long
as we still have the AioContext lock, it is sufficient without the graph
lock. Once the AioContext lock goes away, the deadlock doesn't exist any
more either and this commit can be reverted. (Of course, it can also be
reverted while the AioContext lock still exists if the callers have been
fixed.)

Cc: qemu-stable@nongnu.org
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20230517152834.277483-2-kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-05-19 19:16:53 +02:00
Kevin Wolf 71438d8dac graph-lock: Honour read locks even in the main thread
There are some conditions under which we don't actually need to do
anything for taking a reader lock: Writing the graph is only possible
from the main context while holding the BQL. So if a reader is running
in the main context under the BQL and knows that it won't be interrupted
until the next writer runs, we don't actually need to do anything.

This is the case if the reader code neither has a nested event loop
(this is forbidden anyway while you hold the lock) nor is a coroutine
(because a writer could run when the coroutine has yielded).

These conditions are exactly what bdrv_graph_rdlock_main_loop() asserts.
They are not fulfilled in bdrv_graph_co_rdlock(), which always runs in a
coroutine.

This deletes the shortcuts in bdrv_graph_co_rdlock() that skip taking
the reader lock in the main thread.

Reported-by: Fiona Ebner <f.ebner@proxmox.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20230510203601.418015-9-kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-05-19 19:12:12 +02:00
Kevin Wolf 018e5987b5 blockjob: Adhere to rate limit even when reentered early
When jobs are sleeping, for example to enforce a given rate limit, they
can be reentered early, in particular in order to get paused, to update
the rate limit or to get cancelled.

Before this patch, they behave in this case as if they had fully
completed their rate limiting delay. This means that requests are sped
up beyond their limit, violating the constraints that the user gave us.

Change the block jobs to sleep in a loop until the necessary delay is
completed, while still allowing cancelling them immediately as well
pausing (handled by the pause point in job_sleep_ns()) and updating the
rate limit.

This change is also motivated by iotests cases being prone to fail
because drain operations pause and unpause them so often that block jobs
complete earlier than they are supposed to. In particular, the next
commit would fail iotests 030 without this change.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20230510203601.418015-8-kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-05-19 19:12:12 +02:00
Kevin Wolf e3e31dc872 qcow2: Unlock the graph in qcow2_do_open() where necessary
qcow2_do_open() calls a few no_co_wrappers that wrap functions taking
the graph lock internally as a writer. Therefore, it can't hold the
reader lock across these calls, it causes deadlocks. Drop the lock
temporarily around the calls.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20230510203601.418015-4-kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-05-19 19:12:12 +02:00
Kevin Wolf a184563778 block/export: Fix null pointer dereference in error path
There are some error paths in blk_exp_add() that jump to 'fail:' before
'exp' is even created. So we can't just unconditionally access exp->blk.

Add a NULL check, and switch from exp->blk to blk, which is available
earlier, just to be extra sure that we really cover all cases where
BlockDevOps could have been set for it (in practice, this only happens
in drv->create() today, so this part of the change isn't strictly
necessary).

Fixes: Coverity CID 1509238
Fixes: de79b52604
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20230510203601.418015-3-kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Tested-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-05-19 19:12:12 +02:00
Kevin Wolf 4db7ba3b87 block: Call .bdrv_co_create(_opts) unlocked
These are functions that modify the graph, so they must be able to take
a writer lock. This is impossible if they already hold the reader lock.
If they need a reader lock for some of their operations, they should
take it internally.

Many of them go through blk_*(), which will always take the lock itself.
Direct calls of bdrv_*() need to take the reader lock. Note that while
locking for bdrv_co_*() calls is checked by TSA, this is not the case
for the mixed_coroutine_fns bdrv_*(). Holding the lock is still required
when they are called from coroutine context like here!

This effectively reverts 4ec8df0183, but adds some internal locking
instead.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20230510203601.418015-2-kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-05-19 19:12:12 +02:00
Sam Li 52eb76f4b1 block: add accounting for zone append operation
Taking account of the new zone append write operation for zoned devices,
BLOCK_ACCT_ZONE_APPEND enum is introduced as other I/O request type (read,
write, flush).

Signed-off-by: Sam Li <faithilikerun@gmail.com>
Message-id: 20230508051916.178322-3-faithilikerun@gmail.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-05-15 08:18:10 -04:00
Sam Li 6c811e19bb block: add some trace events for zone append
Signed-off-by: Sam Li <faithilikerun@gmail.com>
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20230508051510.177850-5-faithilikerun@gmail.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-05-15 08:18:10 -04:00
Sam Li 4751d09adc block: introduce zone append write for zoned devices
A zone append command is a write operation that specifies the first
logical block of a zone as the write position. When writing to a zoned
block device using zone append, the byte offset of the call may point at
any position within the zone to which the data is being appended. Upon
completion the device will respond with the position where the data has
been written in the zone.

Signed-off-by: Sam Li <faithilikerun@gmail.com>
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20230508051510.177850-3-faithilikerun@gmail.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-05-15 08:18:10 -04:00
Sam Li a3c41f06d5 file-posix: add tracking of the zone write pointers
Since Linux doesn't have a user API to issue zone append operations to
zoned devices from user space, the file-posix driver is modified to add
zone append emulation using regular writes. To do this, the file-posix
driver tracks the wp location of all zones of the device. It uses an
array of uint64_t. The most significant bit of each wp location indicates
if the zone type is conventional zones.

The zones wp can be changed due to the following operations issued:
- zone reset: change the wp to the start offset of that zone
- zone finish: change to the end location of that zone
- write to a zone
- zone append

Signed-off-by: Sam Li <faithilikerun@gmail.com>
Message-id: 20230508051510.177850-2-faithilikerun@gmail.com
[Fix errno propagation from handle_aiocb_zone_mgmt()
--Stefan]
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-05-15 08:17:55 -04:00
Sam Li 142e307e79 block: add some trace events for new block layer APIs
Signed-off-by: Sam Li <faithilikerun@gmail.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Acked-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20230508045533.175575-8-faithilikerun@gmail.com
Message-id: 20230324090605.28361-8-faithilikerun@gmail.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-05-15 08:17:03 -04:00
Sam Li 774c726ceb block: add zoned BlockDriver check to block layer
Putting zoned/non-zoned BlockDrivers on top of each other is not
allowed.

Signed-off-by: Sam Li <faithilikerun@gmail.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Acked-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20230508045533.175575-6-faithilikerun@gmail.com
Message-id: 20230324090605.28361-6-faithilikerun@gmail.com
[Adjust commit message prefix as suggested by Philippe Mathieu-Daudé
<philmd@linaro.org> and clarify that the check is about zoned
BlockDrivers.
--Stefan]
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-05-15 08:17:03 -04:00
Sam Li 50c4bcd496 block/raw-format: add zone operations to pass through requests
raw-format driver usually sits on top of file-posix driver. It needs to
pass through requests of zone commands.

Signed-off-by: Sam Li <faithilikerun@gmail.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Acked-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20230508045533.175575-5-faithilikerun@gmail.com
Message-id: 20230324090605.28361-5-faithilikerun@gmail.com
[Adjust commit message prefix as suggested by Philippe Mathieu-Daudé
<philmd@linaro.org>.
--Stefan]
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-05-15 08:17:03 -04:00
Sam Li 6d43eaa396 block/block-backend: add block layer APIs resembling Linux ZonedBlockDevice ioctls
Add zoned device option to host_device BlockDriver. It will be presented only
for zoned host block devices. By adding zone management operations to the
host_block_device BlockDriver, users can use the new block layer APIs
including Report Zone and four zone management operations
(open, close, finish, reset, reset_all).

Qemu-io uses the new APIs to perform zoned storage commands of the device:
zone_report(zrp), zone_open(zo), zone_close(zc), zone_reset(zrs),
zone_finish(zf).

For example, to test zone_report, use following command:
$ ./build/qemu-io --image-opts -n driver=host_device, filename=/dev/nullb0
-c "zrp offset nr_zones"

Signed-off-by: Sam Li <faithilikerun@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Acked-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20230508045533.175575-4-faithilikerun@gmail.com
Message-id: 20230324090605.28361-4-faithilikerun@gmail.com
[Adjust commit message prefix as suggested by Philippe Mathieu-Daudé
<philmd@linaro.org> and remove spurious ret = -errno in
raw_co_zone_mgmt().
--Stefan]
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-05-15 08:17:03 -04:00
Sam Li a735b56e49 block/file-posix: introduce helper functions for sysfs attributes
Use get_sysfs_str_val() to get the string value of device
zoned model. Then get_sysfs_zoned_model() can convert it to
BlockZoneModel type of QEMU.

Use get_sysfs_long_val() to get the long value of zoned device
information.

Signed-off-by: Sam Li <faithilikerun@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Acked-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20230508045533.175575-3-faithilikerun@gmail.com
Message-id: 20230324090605.28361-3-faithilikerun@gmail.com
[Adjust commit message prefix as suggested by Philippe Mathieu-Daudé
<philmd@linaro.org>.
--Stefan]
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-05-15 08:17:03 -04:00
Vladimir Sementsov-Ogievskiy 43c71fe3b8 block/meson.build: prefer positive condition for replication
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Lukas Straub <lukasstraub2@web.de>
Reviewed-by: Zhang Chen <chen.zhang@intel.com>
Message-Id: <20230428194928.1426370-2-vsementsov@yandex-team.ru>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-05-10 18:48:11 +02:00
Stefan Hajnoczi 58a2e3f5c3 block: compile out assert_bdrv_graph_readable() by default
reader_count() is a performance bottleneck because the global
aio_context_list_lock mutex causes thread contention. Put this debugging
assertion behind a new ./configure --enable-debug-graph-lock option and
disable it by default.

The --enable-debug-graph-lock option is also enabled by the more general
--enable-debug option.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20230501173443.153062-1-stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-05-10 14:16:54 +02:00
Kevin Wolf e19b157f3c block: Mark bdrv_refresh_limits() and callers GRAPH_RDLOCK
This adds GRAPH_RDLOCK annotations to declare that callers of
bdrv_refresh_limits() need to hold a reader lock for the graph because
it accesses the children list of a node.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20230504115750.54437-21-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-05-10 14:16:54 +02:00