OK, that's a big rewrite of the logic.
Pre-patch we have an always running coroutine - connection_co. It does
reply receiving and reconnecting. And it leads to a lot of difficult
and unobvious code around drained sections and context switch. We also
abuse bs->in_flight counter which is increased for connection_co and
temporary decreased in points where we want to allow drained section to
begin. One of these place is in another file: in nbd_read_eof() in
nbd/client.c.
We also cancel reconnect and requests waiting for reconnect on drained
begin which is not correct. And this patch fixes that.
Let's finally drop this always running coroutine and go another way:
do both reconnect and receiving in request coroutines.
The detailed list of changes below (in the sequence of diff hunks).
1. receiving coroutines are woken directly from nbd_channel_error, when
we change s->state
2. nbd_co_establish_connection_cancel(): we don't have drain_begin now,
and in nbd_teardown_connection() all requests should already be
finished (and reconnect is done from request). So
nbd_co_establish_connection_cancel() is called from
nbd_cancel_in_flight() (to cancel the request that is doing
nbd_co_establish_connection()) and from reconnect_delay_timer_cb()
(previously we didn't need it, as reconnect delay only should cancel
active requests not the reconnection itself). But now reconnection
itself is done in the separate thread (we now call
nbd_client_connection_enable_retry() in nbd_open()), and we need to
cancel the requests that wait in nbd_co_establish_connection()
now).
2A. We do receive headers in request coroutine. But we also should
dispatch replies for other pending requests. So,
nbd_connection_entry() is turned into nbd_receive_replies(), which
does reply dispatching while it receives other request headers, and
returns when it receives the requested header.
3. All old staff around drained sections and context switch is dropped.
In details:
- we don't need to move connection_co to new aio context, as we
don't have connection_co anymore
- we don't have a fake "request" of connection_co (extra increasing
in_flight), so don't care with it in drain_begin/end
- we don't stop reconnection during drained section anymore. This
means that drain_begin may wait for a long time (up to
reconnect_delay). But that's an improvement and more correct
behavior see below[*]
4. In nbd_teardown_connection() we don't have to wait for
connection_co, as it is dropped. And cleanup for s->ioc and nbd_yank
is moved here from removed connection_co.
5. In nbd_co_do_establish_connection() we now should handle
NBD_CLIENT_CONNECTING_NOWAIT: if new request comes when we are in
NBD_CLIENT_CONNECTING_NOWAIT, it still should call
nbd_co_establish_connection() (who knows, maybe the connection was
already established by another thread in the background). But we
shouldn't wait: if nbd_co_establish_connection() can't return new
channel immediately the request should fail (we are in
NBD_CLIENT_CONNECTING_NOWAIT state).
6. nbd_reconnect_attempt() is simplified: it's now easier to wait for
other requests in the caller, so here we just assert that fact.
Also delay time is now initialized here: we can easily detect first
attempt and start a timer.
7. nbd_co_reconnect_loop() is dropped, we don't need it. Reconnect
retries are fully handle by thread (nbd/client-connection.c), delay
timer we initialize in nbd_reconnect_attempt(), we don't have to
bother with s->drained and friends. nbd_reconnect_attempt() now
called from nbd_co_send_request().
8. nbd_connection_entry is dropped: reconnect is now handled by
nbd_co_send_request(), receiving reply is now handled by
nbd_receive_replies(): all handled from request coroutines.
9. So, welcome new nbd_receive_replies() called from request coroutine,
that receives reply header instead of nbd_connection_entry().
Like with sending requests, only one coroutine may receive in a
moment. So we introduce receive_mutex, which is locked around
nbd_receive_reply(). It also protects some related fields. Still,
full audit of thread-safety in nbd driver is a separate task.
New function waits for a reply with specified handle being received
and works rather simple:
Under mutex:
- if current handle is 0, do receive by hand. If another handle
received - switch to other request coroutine, release mutex and
yield. Otherwise return success
- if current handle == requested handle, we are done
- otherwise, release mutex and yield
10: in nbd_co_send_request() we now do nbd_reconnect_attempt() if
needed. Also waiting in free_sema queue we now wait for one of two
conditions:
- connectED, in_flight < MAX_NBD_REQUESTS (so we can start new one)
- connectING, in_flight == 0, so we can call
nbd_reconnect_attempt()
And this logic is protected by s->send_mutex
Also, on failure we don't have to care of removed s->connection_co
11. nbd_co_do_receive_one_chunk(): now instead of yield() and wait for
s->connection_co we just call new nbd_receive_replies().
12. nbd_co_receive_one_chunk(): place where s->reply.handle becomes 0,
which means that handling of the whole reply is finished. Here we
need to wake one of coroutines sleeping in nbd_receive_replies().
If none are sleeping - do nothing. That's another behavior change: we
don't have endless recv() in the idle time. It may be considered as
a drawback. If so, it may be fixed later.
13. nbd_reply_chunk_iter_receive(): don't care about removed
connection_co, just ping in_flight waiters.
14. Don't create connection_co, enable retry in the connection thread
(we don't have own reconnect loop anymore)
15. We now need to add a nbd_co_establish_connection_cancel() call in
nbd_cancel_in_flight(), to cancel the request that is doing a
connection attempt.
[*], ok, now we don't cancel reconnect on drain begin. That's correct:
reconnect feature leads to possibility of long-running requests (up
to reconnect delay). Still, drain begin is not a reason to kill
long requests. We should wait for them.
This also means, that we can again reproduce a dead-lock, described
in 8c517de24a.
Why we are OK with it:
1. Now this is not absolutely-dead dead-lock: the vm is unfrozen
after reconnect delay. Actually 8c517de24a fixed a bug in
NBD logic, that was not described in 8c517de24a and led to
forever dead-lock. The problem was that nobody woke the free_sema
queue, but drain_begin can't finish until there is a request in
free_sema queue. Now we have a reconnect delay timer that works
well.
2. It's not a problem of the NBD driver, but of the ide code,
because it does drain_begin under the global mutex; the problem
doesn't reproduce when using scsi instead of ide.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210902103805.25686-5-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
[eblake: grammar and comment tweaks]
Signed-off-by: Eric Blake <eblake@redhat.com>
When we don't have a connection and blocking is false, we return NULL
but don't set errp. That's wrong.
We have two paths for calling nbd_co_establish_connection():
1. nbd_open() -> nbd_do_establish_connection() -> ...
but that will never set blocking=false
2. nbd_reconnect_attempt() -> nbd_co_do_establish_connection() -> ...
but that uses errp=NULL
So, we are safe with our wrong errp policy in
nbd_co_establish_connection(). Still let's fix it.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210906190654.183421-2-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
The NBD protocol just relaxed the requirements on
NBD_OPT_LIST_META_CONTEXT:
https://github.com/NetworkBlockDevice/nbd/commit/13a4e33a87
Since listing is not stateful (unlike SET_META_CONTEXT), we don't care
if a client asks for meta contexts without first requesting structured
replies. Well-behaved clients will still ask for structured reply
first (if for no other reason than for back-compat to older servers),
but that's no reason to avoid this change.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20210907173505.1499709-1-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
From clang-13:
nbd/server.c:976:22: error: variable 'bitmaps' set but not used \
[-Werror,-Wunused-but-set-variable]
which is incorrect; see //bugs.llvm.org/show_bug.cgi?id=3888.
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
We'll need a possibility of non-blocking nbd_co_establish_connection(),
so that it returns immediately, and it returns success only if a
connections was previously established in background.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20210610100802.5888-30-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
block/nbd doesn't need underlying sioc channel anymore. So, we can
update nbd/client-connection interface to return only one top-most io
channel, which is more straight forward.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20210610100802.5888-27-vsementsov@virtuozzo.com>
[eblake: squash in Vladimir's fixes for uninit usage caught by clang]
Signed-off-by: Eric Blake <eblake@redhat.com>
Now, when a thread can do negotiation and retry, it may run relatively
long. We need a mechanism to stop it, when the user is not interested
in a result any more. So, on nbd_client_connection_release() let's
shutdown the socket, and do not retry connection if thread is detached.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210610100802.5888-22-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Add an option for a thread to retry connecting until it succeeds. We'll
use nbd/client-connection both for reconnect and for initial connection
in nbd_open(), so we need a possibility to use same NBDClientConnection
instance to connect once in nbd_open() and then use retry semantics for
reconnect.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210610100802.5888-21-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
[eblake: grammar tweak]
Signed-off-by: Eric Blake <eblake@redhat.com>
Add arguments and logic to support nbd negotiation in the same thread
after successful connection.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210610100802.5888-20-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
We don't update connect_thread_func() to use QEMU_LOCK_GUARD, as it
will get more complex critical sections logic in further commit, where
QEMU_LOCK_GUARD doesn't help.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210610100802.5888-19-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
We now have bs-independent connection API, which consists of four
functions:
nbd_client_connection_new()
nbd_client_connection_release()
nbd_co_establish_connection()
nbd_co_establish_connection_cancel()
Move them to a separate file together with NBDClientConnection
structure which becomes private to the new API.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20210610100802.5888-18-vsementsov@virtuozzo.com>
[eblake: comment tweaks]
Signed-off-by: Eric Blake <eblake@redhat.com>
- NBD server: Fix crashes related to switching between AioContexts
- file-posix: Workaround for discard/write_zeroes on buggy filesystems
- Follow-up fixes for the reopen vs. permission changes
- quorum: Fix error handling for flush
- block-copy: Refactor copy_range handling
- docs: Describe how to use 'null-co' block driver
-----BEGIN PGP SIGNATURE-----
iQJFBAABCAAvFiEE3D3rFZqa+V09dFb+fwmycsiPL9YFAmC3iy8RHGt3b2xmQHJl
ZGhhdC5jb20ACgkQfwmycsiPL9ZP0hAAuh07CFWLzHCcRC7PBzekSPfzRYYBLDSW
EObJ1Ov4mvz8UZoP6BDJ5QVzLPhel6hXkxTd83B1D7t/Dq+yJYR0z8Kv3USpaVJ4
2U26SsoGQM8BmtVDL1Q8tQ5eDWQ4ykxNx6F2/lKBe1EH1lfaun04Xj1rNh7jpilo
nNmKMDDI1UOkH0lKDR3tqfEV0XQE7o+ZKfPlIbvYMjXk9ZPKUHfjNPGVdCLQVnqH
VJI01hF7eEx1ykSMdlC+TzNoVGG+mCBokGuW0JlUvOpX6FcGnAlxXQx8u53c1I8O
lggZV8b2IbrNBUVwXQHLLrXxjOo+u54Ct9y/gXUTAj8qai+9jVRp60Y1pnCyeIeu
DzFx10xwy04PGleb7AAZ4dT8du2+PTkuyQ9KmlvQ2U4IUcgW124CrDeO7XYr1aif
hCTJPeEDrC9YNU6AQ8rLXrYUtkumSm2zUzU5nZ//i5WH41475/vsmgP5A+Jr457A
Xu0yiI2Gqkr9CNsP9ZzMkNj03oIBhPFuGxiwibLQsy/6UVnaDYS0+rQ8FXYnF5+K
iEpgXe3vKTWxM097kzJMBTDVRMXRa75NtK7KWXMDgVpHTbcv1t1otsn+6dfv+B55
ULJM1ETsyYS0T6BqNglvdytsraSt7JgSF+ZLHbYk3KVDshwnq/0ksgSqHNNA14ca
kYTzHhMgo5w=
=gSq/
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging
Block layer patches
- NBD server: Fix crashes related to switching between AioContexts
- file-posix: Workaround for discard/write_zeroes on buggy filesystems
- Follow-up fixes for the reopen vs. permission changes
- quorum: Fix error handling for flush
- block-copy: Refactor copy_range handling
- docs: Describe how to use 'null-co' block driver
# gpg: Signature made Wed 02 Jun 2021 14:44:15 BST
# gpg: using RSA key DC3DEB159A9AF95D3D7456FE7F09B272C88F2FD6
# gpg: issuer "kwolf@redhat.com"
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" [full]
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6
* remotes/kevin/tags/for-upstream:
docs/secure-coding-practices: Describe how to use 'null-co' block driver
block-copy: refactor copy_range handling
block-copy: fix block_copy_task_entry() progress update
nbd/server: Use drained block ops to quiesce the server
block-backend: add drained_poll
block: improve permission conflict error message
block: simplify bdrv_child_user_desc()
block/vvfat: inherit child_vvfat_qcow from child_of_bds
block: improve bdrv_child_get_parent_desc()
block-backend: improve blk_root_get_parent_desc()
block: document child argument of bdrv_attach_child_common()
block/file-posix: Try other fallbacks after invalid FALLOC_FL_ZERO_RANGE
block/file-posix: Fix problem with fallocate(PUNCH_HOLE) on GPFS
block: drop BlockBackendRootState::read_only
block: drop BlockDriverState::read_only
block: consistently use bdrv_is_read_only()
block/vvfat: fix vvfat_child_perm crash
block/vvfat: child_vvfat_qcow: add .get_parent_aio_context, fix crash
qemu-io-cmds: assert that we don't have .perm requested in no-blk case
block/quorum: Provide .bdrv_co_flush instead of .bdrv_co_flush_to_disk
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Before switching between AioContexts we need to make sure that we're
fully quiesced ("nb_requests == 0" for every client) when entering the
drained section.
To do this, we set "quiescing = true" for every client on
".drained_begin" to prevent new coroutines from being created, and
check if "nb_requests == 0" on ".drained_poll". Finally, once we're
exiting the drained section, on ".drained_end" we set "quiescing =
false" and call "nbd_client_receive_next_request()" to resume the
processing of new requests.
With these changes, "blk_aio_attach()" and "blk_aio_detach()" can be
reverted to be as simple as they were before f148ae7d36.
RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1960137
Suggested-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Sergio Lopez <slp@redhat.com>
Message-Id: <20210602060552.17433-3-slp@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Commit e50caf4a5c ("tracing: convert documentation to rST")
converted docs/devel/tracing.txt to docs/devel/tracing.rst.
We still have several references to the old file, so let's fix them
with the following command:
sed -i s/tracing.txt/tracing.rst/ $(git grep -l docs/devel/tracing.txt)
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20210517151702.109066-2-sgarzare@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
When querying image extents for raw image, qemu-nbd reports holes as
zero:
$ qemu-nbd -t -r -f raw empty-6g.raw
$ qemu-img map --output json nbd://localhost
[{ "start": 0, "length": 6442450944, "depth": 0, "zero": true, "data": true, "offset": 0}]
$ qemu-img map --output json empty-6g.raw
[{ "start": 0, "length": 6442450944, "depth": 0, "zero": true, "data": false, "offset": 0}]
Turns out that qemu-img map reports a hole based on BDRV_BLOCK_DATA, but
nbd server reports a hole based on BDRV_BLOCK_ALLOCATED.
The NBD protocol says:
NBD_STATE_HOLE (bit 0): if set, the block represents a hole (and
future writes to that area may cause fragmentation or encounter an
NBD_ENOSPC error); if clear, the block is allocated or the server
could not otherwise determine its status.
qemu-img manual says:
whether the sectors contain actual data or not (boolean field data;
if false, the sectors are either unallocated or stored as
optimized all-zero clusters);
To me, data=false looks compatible with NBD_STATE_HOLE. From user point
of view, getting same results from qemu-nbd and qemu-img is more
important than being more correct about allocation status.
Changing nbd server to report holes using BDRV_BLOCK_DATA makes qemu-nbd
results compatible with qemu-img map:
$ qemu-img map --output json nbd://localhost
[{ "start": 0, "length": 6442450944, "depth": 0, "zero": true, "data": false, "offset": 0}]
Signed-off-by: Nir Soffer <nsoffer@redhat.com>
Message-Id: <20210219160752.1826830-1-nsoffer@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
When switching between AIO contexts we need to me make sure that both
recv_coroutine and send_coroutine are not scheduled to run. Otherwise,
QEMU may crash while attaching the new context with an error like
this one:
aio_co_schedule: Co-routine was already scheduled in 'aio_co_schedule'
To achieve this we need a local implementation of
'qio_channel_readv_all_eof' named 'nbd_read_eof' (a trick already done
by 'nbd/client.c') that allows us to interrupt the operation and to
know when recv_coroutine is yielding.
With this in place, we delegate detaching the AIO context to the
owning context with a BH ('nbd_aio_detach_bh') scheduled using
'aio_wait_bh_oneshot'. This BH signals that we need to quiesce the
channel by setting 'client->quiescing' to 'true', and either waits for
the coroutine to finish using AIO_WAIT_WHILE or, if it's yielding in
'nbd_read_eof', actively enters the coroutine to interrupt it.
RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1900326
Signed-off-by: Sergio Lopez <slp@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20201214170519.223781-4-slp@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
* Update NetBSD VM to version 9.1
* Misc fixes (e.g. categorize some devices)
-----BEGIN PGP SIGNATURE-----
iQJFBAABCAAvFiEEJ7iIR+7gJQEY8+q5LtnXdP5wLbUFAl+zld8RHHRodXRoQHJl
ZGhhdC5jb20ACgkQLtnXdP5wLbWb8Q/+IRvGUGjGcPfbTvwoOqJVy4Hm7huW5i1s
wHe/6nitNtpvaAqcxbQHBIvWX9xTzppWcFiEkIs8rPwLOUFKh5xJ+NbEdf4acQaJ
m4G2mEY5bYt/o5e6p7ZK1RgS2EjD1eQ6BwMWQKeUHET7MTv0UabKtvWmBWpMqFxA
vl/3SbVWsSwGB9gOA5oksYhKY5ZRcVaDxsGk89f7iwgaStcxWNxVFEXddbBmqhfW
Q4ZPt0K7yod7NDBOaGEoc2hOjIfr0TvovHojDuAxt+2tKdYi1vwtnwKbFqTWp7Ca
7ttzoQUSsteiOjAhHRpa2PEbfrNs+loIm9fem5fQ9i7POlbS/Ozv2RnPCZm1X8pW
n7Jvsh25V066AFnHat7PnjcBVBRFfmR3xtA61PqvAMGEKW8tortbZbpqXO18Pv5p
6P/GG9G3QE0v2rEsU5BNFWp/aD7fiWy/VPu3dGFUkI9/S3biatocldHn/9eyXz94
k75Xzhe5x6n5Jf8QYFQ/6BO0qSoidNbAVg1W8+QyRXIJJhWRnvW9eYa7tSx5ezJg
5+oCo4oh6Qd9nvrl5pIwvX6QMDf2kPxzp7PsHeemqt7+QNmXErAVsIi1HUVsLWRP
Qb/BbKyKNeWJwvWWLAm/2kXVmNQfjLVNCwg04xa8tkQemhIDekVrCpMoX3cNHjWf
EWa1vEtbq9k=
=A3/B
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/huth-gitlab/tags/pull-request-2020-11-17' into staging
* Fixes for compiling on Haiku, and add Haiku VM for compile-testing
* Update NetBSD VM to version 9.1
* Misc fixes (e.g. categorize some devices)
# gpg: Signature made Tue 17 Nov 2020 09:20:31 GMT
# gpg: using RSA key 27B88847EEE0250118F3EAB92ED9D774FE702DB5
# gpg: issuer "thuth@redhat.com"
# gpg: Good signature from "Thomas Huth <th.huth@gmx.de>" [full]
# gpg: aka "Thomas Huth <thuth@redhat.com>" [full]
# gpg: aka "Thomas Huth <huth@tuxfamily.org>" [full]
# gpg: aka "Thomas Huth <th.huth@posteo.de>" [unknown]
# Primary key fingerprint: 27B8 8847 EEE0 2501 18F3 EAB9 2ED9 D774 FE70 2DB5
* remotes/huth-gitlab/tags/pull-request-2020-11-17:
max111x: put it into the 'misc' category
nand: put it into the 'storage' category
ads7846: put it into the 'input' category
ssd0323: put it into the 'display' category
gitlab-ci: Use $CI_REGISTRY instead of hard-coding registry.gitlab.com
target/microblaze: Fix possible array out of bounds in mmu_write()
tests/vm: update NetBSD to 9.1
tests/vm: Add Haiku test based on their vagrant images
configure: Add a proper check for sys/ioccom.h and use it in tpm_ioctl.h
configure: Do not build pc-bios/optionrom on Haiku
configure: Fix the _BSD_SOURCE define for the Haiku build
qemu/bswap: Remove unused qemu_bswap_len()
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
On Solaris and Haiku, the _IO() macros are defined in <sys/ioccom.h>.
Add a proper check for this header to our build system, and make sure
to include the header in tpm_ioctl.h to fix a build failure on Solaris
and Haiku.
Message-Id: <20201115152317.42752-1-thuth@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Coverity noticed (CID 1436125) that we check the return value of
nbd_extent_array_add in most places, but not at the end of
bitmap_to_extents(). The return value exists to break loops before a
future iteration, so there is nothing to check if we are already done
iterating. Adding a cast to void, plus a comment why, pacifies
Coverity.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20201111163510.713855-1-eblake@redhat.com>
[eblake: Prefer cast to void over odd && usage]
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Allow the server to expose an additional metacontext to be requested
by savvy clients. qemu-nbd adds a new option -A to expose the
qemu:allocation-depth metacontext through NBD_CMD_BLOCK_STATUS; this
can also be set via QMP when using block-export-add.
qemu as client is hacked into viewing the key aspects of this new
context by abusing the already-experimental x-dirty-bitmap option to
collapse all depths greater than 2, which results in a tri-state value
visible in the output of 'qemu-img map --output=json' (yes, that means
x-dirty-bitmap is now a bit of a misnomer, but I didn't feel like
renaming it as it would introduce a needless break of back-compat,
even though we make no compat guarantees with x- members):
unallocated (depth 0) => "zero":false, "data":true
local (depth 1) => "zero":false, "data":false
backing (depth 2+) => "zero":true, "data":true
libnbd as client is probably a nicer way to get at the information
without having to decipher such hacks in qemu as client. ;)
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20201027050556.269064-11-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
'qemu-img map' provides a way to determine which extents of an image
come from the top layer vs. inherited from a backing chain. This is
useful information worth exposing over NBD. There is a proposal to
add a QMP command block-dirty-bitmap-populate which can create a dirty
bitmap that reflects allocation information, at which point the
qemu:dirty-bitmap:NAME metadata context can expose that information
via the creation of a temporary bitmap, but we can shorten the effort
by adding a new qemu:allocation-depth metadata context that does the
same thing without an intermediate bitmap (this patch does not
eliminate the need for that proposal, as it will have other uses as
well).
While documenting things, remember that although the NBD protocol has
NBD_OPT_SET_META_CONTEXT, the rest of its documentation refers to
'metadata context', which is a more apt description of what is
actually being used by NBD_CMD_BLOCK_STATUS: the user is requesting
metadata by passing one or more context names. So I also touched up
some existing wording to prefer the term 'metadata context' where it
makes sense.
Note that this patch does not actually enable any way to request a
server to enable this context; that will come in the next patch.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20201027050556.269064-10-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
With this, 'qemu-nbd -B b0 -B b1 -f qcow2 img.qcow2' can let you sniff
out multiple bitmaps from one server. qemu-img as client can still
only read one bitmap per client connection, but other NBD clients
(hello libnbd) can now read multiple bitmaps in a single pass.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20201027050556.269064-8-eblake@redhat.com>
Rather than open-code the count of negotiated contexts at several
sites, embed it directly into the struct. This will make it easier
for upcoming commits to support even more simultaneous contexts.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20201027050556.269064-7-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Each dirty bitmap already knows its name; by reducing the scope of the
places where we construct "qemu:dirty-bitmap:NAME" strings, tracking
the name is more localized, and there are fewer per-export fields to
worry about. This in turn will make it easier for an upcoming patch
to export more than one bitmap at once.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20201027050556.269064-6-eblake@redhat.com>
Since 'block-export-add' is new to 5.2, we can still tweak the
interface; there, allowing 'bitmaps':['str'] is nicer than
'bitmap':'str'. This wires up the qapi and qemu-nbd changes to permit
passing multiple bitmaps as distinct metadata contexts that the NBD
client may request, but the actual support for more than one will
require a further patch to the server.
Note that there are no changes made to the existing deprecated
'nbd-server-add' command; this required splitting the QAPI type
BlockExportOptionsNbd, which fortunately does not affect QMP
introspection.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20201027050556.269064-5-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Make it possible to specify the iothread where the export will run. By
default the block node can be moved to other AioContexts later and the
export will follow. The fixed-iothread option forces strict behavior
that prevents changing AioContext while the export is active. See the
QAPI docs for details.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20200929125516.186715-5-stefanha@redhat.com
[Fix stray '#' character in block-export.json and add missing "(since:
5.2)" as suggested by Eric Blake.
--Stefan]
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Block exports are used by softmmu, qemu-storage-daemon, and qemu-nbd.
They are not used by other programs and are not otherwise needed in
libblock.
Undo the recent move of blockdev-nbd.c from blockdev_ss into block_ss.
Since bdrv_close_all() (libblock) calls blk_exp_close_all()
(libblockdev) a stub function is required..
Make qemu-nbd.c use signal handling utility functions instead of
duplicating the code. This helps because os-posix.c is in libblockdev
and it depends on a qemu_system_killed() symbol that qemu-nbd.c lacks.
Once we use the signal handling utility functions we also end up
providing the necessary symbol.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 20200929125516.186715-4-stefanha@redhat.com
[Fixed s/ndb/nbd/ typo in commit description as suggested by Eric Blake
--Stefan]
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
We had a premature optimization of trying to read as little from the
wire as possible while handling NBD_OPT_SET_META_CONTEXT in phases.
But in reality, we HAVE to read the entire string from the client
before we can get to the next command, and it is easier to just read
it all at once than it is to read it in pieces. And once we do that,
several functions end up no longer performing I/O, so they can drop
length and errp parameters, and just return a bool instead of
modifying through a pointer.
Our iotests still pass; I also checked that libnbd's testsuite (which
covers more corner cases of odd meta context requests) still passes.
There are cases where the sequence of trace messages produced differs
(for example, when no bitmap is exported, a query for "qemu:" now
produces two trace lines instead of one), but trace points are for
debug and have no effect on what the client sees.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200930121105.667049-4-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
[eblake: enhance commit message]
Signed-off-by: Eric Blake <eblake@redhat.com>
The NBD spec is clear that any string sent from the client must not
contain embedded NUL characters. If the client passes "a\0", we
should reject that option request rather than act on "a".
Testing this is not possible with a compliant client, but I was able
to use gdb to coerce libnbd into temporarily behaving as such a
client.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200930121105.667049-3-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
gcc 10 from Fedora 32 gives me:
Compiling C object libblock.fa.p/nbd_server.c.o
../nbd/server.c: In function ‘nbd_co_client_start’:
../nbd/server.c:625:14: error: ‘namelen’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
625 | rc = nbd_negotiate_send_info(client, NBD_INFO_NAME, namelen, name,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
626 | errp);
| ~~~~~
../nbd/server.c:564:14: note: ‘namelen’ was declared here
564 | uint32_t namelen;
| ^~~~~~~
cc1: all warnings being treated as errors
As I cannot see how this can happen, let uns silence the warning.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Message-Id: <20200930155859.303148-3-borntraeger@de.ibm.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
There is no real reason any more why nbd_export_new() and
nbd_export_create() should be separate functions. The latter only
performs a few checks before it calls the former.
What makes the current state stand out is that it's the only function in
BlockExportDriver that is not a static function inside nbd/server.c, but
a small wrapper in blockdev-nbd.c that then calls back into nbd/server.c
for the real functionality.
Move all the checks to nbd/server.c and make the resulting function
static to improve readability.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200924152717.287415-27-kwolf@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The 'writable' option is a basic option that will probably be applicable
to most if not all export types that we will implement. Move it from NBD
to the generic BlockExport layer.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200924152717.287415-26-kwolf@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Every export type will need a BlockBackend, so creating it centrally in
blk_exp_add() instead of the .create driver callback avoids duplication.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200924152717.287415-24-kwolf@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Every block export has a BlockBackend representing the disk that is
exported. It should live in BlockExport therefore.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200924152717.287415-23-kwolf@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Implement a new QMP command block-export-del and make nbd-server-remove
a wrapper around it.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200924152717.287415-21-kwolf@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The reference owned by the user/monitor that is created when adding the
export and dropped when removing it was tied to the 'exports' list in
nbd/server.c. Every block export will have a user reference, so move it
to the block export level and tie it to the 'block_exports' list in
block/export/export.c instead. This is necessary for introducing a QMP
command for removing exports.
Note that exports are present in block_exports even after the user has
requested shutdown. This is different from NBD's exports where exports
are immediately removed on a shutdown request, even if they are still in
the process of shutting down. In order to avoid that the user still
interacts with an export that is shutting down (and possibly removes it
a second time), we need to remember if the user actually still owns it.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200924152717.287415-20-kwolf@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This adds a function to shut down all block exports, and another one to
shut down the block exports of a single type. The latter is used for now
when stopping the NBD server. As soon as we implement support for
multiple NBD servers, we'll need a per-server list of exports and it
will be replaced by a function using that.
As a side effect, the BlockExport layer has a list tracking all existing
exports now. closed_exports loses its only user and can go away.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200924152717.287415-18-kwolf@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Instead of letting the driver allocate and return the BlockExport
object, allocate it already in blk_exp_add() and pass it. This allows us
to initialise the generic part before calling into the driver so that
the driver can just use these values instead of having to parse the
options a second time.
For symmetry, move freeing the BlockExport to blk_exp_unref().
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200924152717.287415-17-kwolf@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200924152717.287415-15-kwolf@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Having a refcount makes sense for all types of block exports. It is also
a prerequisite for keeping a list of all exports at the BlockExport
level.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200924152717.287415-14-kwolf@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Closing export is somewhat convoluted because nbd_export_close() and
nbd_export_put() call each other and the ways they actually end up being
nested is not necessarily obvious.
However, it is not really necessary to call nbd_export_close() from
nbd_export_put() when putting the last reference because it only does
three things:
1. Close all clients. We're going to refcount 0 and all clients hold a
reference, so we know there is no active client any more.
2. Close the user reference (represented by exp->name being non-NULL).
The same argument applies: If the export were still named, we would
still have a reference.
3. Freeing exp->description. This is really cleanup work to be done when
the export is finally freed. There is no reason to already clear it
while clients are still in the process of shutting down.
So after moving the cleanup of exp->description, the code can be
simplified so that only nbd_export_close() calls nbd_export_put(), but
never the other way around.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200924152717.287415-13-kwolf@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The export close callback is unused by the built-in NBD server. qemu-nbd
uses it only during shutdown to wait for the unrefed export to actually
go away. It can just use nbd_export_close_all() instead and do without
the callback.
This removes the close callback from nbd_export_new() and makes both
callers of it more similar.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200924152717.287415-11-kwolf@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
nbd-server-add tries to be convenient and adds two questionable
features that we don't want to share in block-export-add, even for NBD
exports:
1. When requesting a writable export of a read-only device, the export
is silently downgraded to read-only. This should be an error in the
context of block-export-add.
2. When using a BlockBackend name, unplugging the device from the guest
will automatically stop the NBD server, too. This may sometimes be
what you want, but it could also be very surprising. Let's keep
things explicit with block-export-add. If the user wants to stop the
export, they should tell us so.
Move these things into the nbd-server-add QMP command handler so that
they apply only there.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200924152717.287415-8-kwolf@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Instead of implementing qemu-nbd --offset in the NBD code, just put a
raw block node with the requested offset on top of the user image and
rely on that doing the job.
This does not only simplify the nbd_export_new() interface and bring it
closer to the set of options that the nbd-server-add QMP command offers,
but in fact it also eliminates a potential source for bugs in the NBD
code which previously had to add the offset manually in all relevant
places.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200924152717.287415-7-kwolf@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
We want to have a common set of commands for all types of block exports.
Currently, this is only NBD, but we're going to add more types.
This patch adds the basic BlockExport and BlockExportDriver structs and
a QMP command block-export-add that creates a new export based on the
given BlockExportOptions.
qmp_nbd_server_add() becomes a wrapper around qmp_block_export_add().
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200924152717.287415-5-kwolf@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200924152717.287415-2-kwolf@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
When looking for a dirty bitmap to share, we should handle filters by
just including them in the search (so they do not break backing chains).
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Meson doesn't enjoy the same flexibility we have with Make in choosing
the include path. In particular the tracing headers are using
$(build_root)/$(<D).
In order to keep the include directives unchanged,
the simplest solution is to generate headers with patterns like
"trace/trace-audio.h" and place forwarding headers in the source tree
such that for example "audio/trace.h" includes "trace/trace-audio.h".
This patch is too ugly to be applied to the Makefiles now. It's only
a way to separate the changes to the tracing header files from the
Meson rewrite of the tracing logic.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Although qemu as NBD client limits requests to <2G, the NBD protocol
allows clients to send requests almost all the way up to 4G. But
because our block layer is not yet 64-bit clean, we accidentally wrap
such requests into a negative size, and fail with EIO instead of
performing the intended operation.
The bug is visible in modern systems with something as simple as:
$ qemu-img create -f qcow2 /tmp/image.img 5G
$ sudo qemu-nbd --connect=/dev/nbd0 /tmp/image.img
$ sudo blkdiscard /dev/nbd0
or with user-space only:
$ truncate --size=3G file
$ qemu-nbd -f raw file
$ nbdsh -u nbd://localhost:10809 -c 'h.trim(3*1024*1024*1024,0)'
Although both blk_co_pdiscard and blk_pwrite_zeroes currently return 0
on success, this is also a good time to fix our code to a more robust
paradigm that treats all non-negative values as success.
Alas, our iotests do not currently make it easy to add external
dependencies on blkdiscard or nbdsh, so we have to rely on manual
testing for now.
This patch can be reverted when we later improve the overall block
layer to be 64-bit clean, but for now, a minimal fix was deemed less
risky prior to release.
CC: qemu-stable@nongnu.org
Fixes: 1f4d6d18ed
Fixes: 1c6c4bb7f0
Fixes: https://github.com/systemd/systemd/issues/16242
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200722212231.535072-1-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
[eblake: rework success tests to use >=0]