qemu-e2k

Author	SHA1	Message	Date
Kevin Wolf	9c707525cb	nbd/server: Fix race in draining the export When draining an NBD export, nbd_drained_begin() first sets client->quiescing so that nbd_client_receive_next_request() won't start any new request coroutines. Then nbd_drained_poll() tries to makes sure that we wait for any existing request coroutines by checking that client->nb_requests has become 0. However, there is a small window between creating a new request coroutine and increasing client->nb_requests. If a coroutine is in this state, it won't be waited for and drain returns too early. In the context of switching to a different AioContext, this means that blk_aio_attached() will see client->recv_coroutine != NULL and fail its assertion. Fix this by increasing client->nb_requests immediately when starting the coroutine. Doing this after the checks if we should create a new coroutine is okay because client->lock is held. Cc: qemu-stable@nongnu.org Fixes: `fd6afc501a` ("nbd/server: Use drained block ops to quiesce the server") Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-ID: <20240314165825.40261-2-kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2024-03-18 12:38:02 +01:00
Stefan Hajnoczi	7075d23511	nbd/server: introduce NBDClient->lock to protect fields NBDClient has a number of fields that are accessed by both the export AioContext and the main loop thread. When the AioContext lock is removed these fields will need another form of protection. Add NBDClient->lock and protect fields that are accessed by both threads. Also add assertions where possible and otherwise add doc comments stating assumptions about which thread and lock holding. Note this patch moves the client->recv_coroutine assertion from nbd_co_receive_request() to nbd_trip() where client->lock is held. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-ID: <20231221192452.1785567-7-stefanha@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2023-12-21 21:59:10 +01:00
Stefan Hajnoczi	f816310d0c	nbd/server: only traverse NBDExport->clients from main loop thread The NBD clients list is currently accessed from both the export AioContext and the main loop thread. When the AioContext lock is removed there will be nothing protecting the clients list. Adding a lock around the clients list is tricky because NBDClient structs are refcounted and may be freed from the export AioContext or the main loop thread. nbd_export_request_shutdown() -> client_close() -> nbd_client_put() is also tricky because the list lock would be held while indirectly dropping references to NDBClients. A simpler approach is to only allow nbd_client_put() and client_close() calls from the main loop thread. Then the NBD clients list is only accessed from the main loop thread and no fancy locking is needed. nbd_trip() just needs to reschedule itself in the main loop AioContext before calling nbd_client_put() and client_close(). This costs more CPU cycles per NBD request so add nbd_client_put_nonzero() to optimize the common case where more references to NBDClient remain. Note that nbd_client_get() can still be called from either thread, so make NBDClient->refcount atomic. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-ID: <20231221192452.1785567-6-stefanha@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2023-12-21 21:59:07 +01:00
Stefan Hajnoczi	efade66d58	nbd/server: avoid per-NBDRequest nbd_client_get/put() nbd_trip() processes a single NBD request from start to finish and holds an NBDClient reference throughout. NBDRequest does not outlive the scope of nbd_trip(). Therefore it is unnecessary to ref/unref NBDClient for each NBDRequest. Removing these nbd_client_get()/nbd_client_put() calls will make thread-safety easier in the commits that follow. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20231221192452.1785567-5-stefanha@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2023-12-21 21:59:03 +01:00
Kevin Wolf	372b69f503	block: Mark bdrv_filter_or_cow_bs() and callers GRAPH_RDLOCK This adds GRAPH_RDLOCK annotations to declare that callers of bdrv_filter_or_cow_bs() need to hold a reader lock for the graph because it calls bdrv_filter_or_cow_child(), which accesses bs->file/backing. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-ID: <20231027155333.420094-7-kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2023-11-07 19:14:19 +01:00
Eric Blake	2dcbb11b39	nbd/server: Add FLAG_PAYLOAD support to CMD_BLOCK_STATUS Allow a client to request a subset of negotiated meta contexts. For example, a client may ask to use a single connection to learn about both block status and dirty bitmaps, but where the dirty bitmap queries only need to be performed on a subset of the disk; forcing the server to compute that information on block status queries in the rest of the disk is wasted effort (both at the server, and on the amount of traffic sent over the wire to be parsed and ignored by the client). Qemu as an NBD client never requests to use more than one meta context, so it has no need to use block status payloads. Testing this instead requires support from libnbd, which CAN access multiple meta contexts in parallel from a single NBD connection; an interop test submitted to the libnbd project at the same time as this patch demonstrates the feature working, as well as testing some corner cases (for example, when the payload length is longer than the export length), although other corner cases (like passing the same id duplicated) requires a protocol fuzzer because libnbd is not wired up to break the protocol that badly. This also includes tweaks to 'qemu-nbd --list' to show when a server is advertising the capability, and to the testsuite to reflect the addition to that output. Of note: qemu will always advertise the new feature bit during NBD_OPT_INFO if extended headers have alreay been negotiated (regardless of whether any NBD_OPT_SET_META_CONTEXT negotiation has occurred); but for NBD_OPT_GO, qemu only advertises the feature if block status is also enabled (that is, if the client does not negotiate any contexts, then NBD_CMD_BLOCK_STATUS cannot be used, so the feature is not advertised). Signed-off-by: Eric Blake <eblake@redhat.com> Message-ID: <20230925192229.3186470-26-eblake@redhat.com> [eblake: fix logic to reject unnegotiated contexts] Signed-off-by: Eric Blake <eblake@redhat.com>	2023-10-05 11:02:08 -05:00
Eric Blake	1dec4643d1	nbd/server: Prepare for per-request filtering of BLOCK_STATUS The next commit will add support for the optional extension NBD_CMD_FLAG_PAYLOAD during NBD_CMD_BLOCK_STATUS, where the client can request that the server only return a subset of negotiated contexts, rather than all contexts. To make that task easier, this patch populates the list of contexts to return on a per-command basis (for now, identical to the full set of negotiated contexts). Signed-off-by: Eric Blake <eblake@redhat.com> Message-ID: <20230925192229.3186470-25-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>	2023-10-05 11:02:08 -05:00
Eric Blake	fd358d8390	nbd/server: Refactor list of negotiated meta contexts Peform several minor refactorings of how the list of negotiated meta contexts is managed, to make upcoming patches easier: Promote the internal type NBDExportMetaContexts to the public opaque type NBDMetaContexts, and mark exp const. Use a shorter member name in NBDClient. Hoist calls to nbd_check_meta_context() earlier in their callers, as the number of negotiated contexts may impact the flags exposed in regards to an export, which in turn requires a new parameter. Drop a redundant parameter to nbd_negotiate_meta_queries. No semantic change intended on the success path; on the failure path, dropping context in nbd_check_meta_export even when reporting an error is safer. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Message-ID: <20230925192229.3186470-24-eblake@redhat.com>	2023-10-05 11:02:08 -05:00
Eric Blake	9c1d261437	nbd/server: Enable initial support for extended headers Time to start supporting clients that request extended headers. Now we can finally reach the code added across several previous patches. Even though the NBD spec has been altered to allow us to accept NBD_CMD_READ larger than the max payload size (provided our response is a hole or broken up over more than one data chunk), we are not planning to take advantage of that, and continue to cap NBD_CMD_READ to 32M regardless of header size. For NBD_CMD_WRITE_ZEROES and NBD_CMD_TRIM, the block layer already supports 64-bit operations without any effort on our part. For NBD_CMD_BLOCK_STATUS, the client's length is a hint, and the previous patch took care of implementing the required NBD_REPLY_TYPE_BLOCK_STATUS_EXT. We do not yet support clients that want to do request payload filtering of NBD_CMD_BLOCK_STATUS; that will be added in later patches, but is not essential for qemu as a client since qemu only requests the single context base:allocation. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Message-ID: <20230925192229.3186470-19-eblake@redhat.com>	2023-10-05 11:02:08 -05:00
Eric Blake	bcc16cc19e	nbd/server: Support 64-bit block status The NBD spec states that if the client negotiates extended headers, the server must avoid NBD_REPLY_TYPE_BLOCK_STATUS and instead use NBD_REPLY_TYPE_BLOCK_STATUS_EXT which supports 64-bit lengths, even if the reply does not need more than 32 bits. As of this patch, client->mode is still never NBD_MODE_EXTENDED, so the code added here does not take effect until the next patch enables negotiation. For now, all metacontexts that we know how to export never populate more than 32 bits of information, so we don't have to worry about NBD_REP_ERR_EXT_HEADER_REQD or filtering during handshake, and we always send all zeroes for the upper 32 bits of status during NBD_CMD_BLOCK_STATUS. Note that we previously had some interesting size-juggling on call chains, such as: nbd_co_send_block_status(uint32_t length) -> blockstatus_to_extents(uint32_t bytes) -> bdrv_block_status_above(bytes, &uint64_t num) -> nbd_extent_array_add(uint64_t num) -> store num in 32-bit length But we were lucky that it never overflowed: bdrv_block_status_above never sets num larger than bytes, and we had previously been capping 'bytes' at 32 bits (since the protocol does not allow sending a larger request without extended headers). This patch adds some assertions that ensure we continue to avoid overflowing 32 bits for a narrow client, while fully utilizing 64-bits all the way through when the client understands that. Even in 64-bit math, overflow is not an issue, because all lengths are coming from the block layer, and we know that the block layer does not support images larger than off_t (if lengths were coming from the network, the story would be different). Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Message-ID: <20230925192229.3186470-18-eblake@redhat.com>	2023-10-05 11:02:08 -05:00
Eric Blake	11d3355fc8	nbd/server: Prepare to send extended header replies Although extended mode is not yet enabled, once we do turn it on, we need to reply with extended headers to all messages. Update the low level entry points necessary so that all other callers automatically get the right header based on the current mode. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Message-ID: <20230925192229.3186470-17-eblake@redhat.com>	2023-10-05 11:02:08 -05:00
Eric Blake	c8720ca03e	nbd/server: Prepare to receive extended header requests Although extended mode is not yet enabled, once we do turn it on, we need to accept extended requests for all messages. Previous patches have already taken care of supporting 64-bit lengths, now we just need to read it off the wire. Note that this implementation will block indefinitely on a buggy client that sends a non-extended payload (that is, we try to read a full packet before we ever check the magic number, but a client that mistakenly sends a simple request after negotiating extended headers doesn't send us enough bytes), but it's no different from any other client that stops talking to us partway through a packet and thus not worth coding around. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Message-ID: <20230925192229.3186470-16-eblake@redhat.com>	2023-10-05 11:02:08 -05:00
Eric Blake	009cd86650	nbd/server: Support a request payload Upcoming additions to support NBD 64-bit effect lengths allow for the possibility to distinguish between payload length (capped at 32M) and effect length (64 bits, although we generally assume 63 bits because of off_t limitations). Without that extension, only the NBD_CMD_WRITE request has a payload; but with the extension, it makes sense to allow at least NBD_CMD_BLOCK_STATUS to have both a payload and effect length in a future patch (where the payload is a limited-size struct that in turn gives the real effect length as well as a subset of known ids for which status is requested). Other future NBD commands may also have a request payload, so the 64-bit extension introduces a new NBD_CMD_FLAG_PAYLOAD_LEN that distinguishes between whether the header length is a payload length or an effect length, rather than hard-coding the decision based on the command. According to the spec, a client should never send a command with a payload without the negotiation phase proving such extension is available. So in the unlikely event the bit is set or cleared incorrectly, the client is already at fault; if the client then provides the payload, we can gracefully consume it off the wire and fail the command with NBD_EINVAL (subsequent checks for magic numbers ensure we are still in sync), while if the client fails to send payload we block waiting for it (basically deadlocking our connection to the bad client, but not negatively impacting our ability to service other clients, so not a security risk). Note that we do not support the payload version of BLOCK_STATUS yet. This patch also fixes a latent bug introduced in `b2578459`: once request->len can be 64 bits, assigning it to a 32-bit payload_len can cause wraparound to 0 which then sets req->complete prematurely; thankfully, the bug was not possible back then (it takes this and later patches to even allow request->len larger than 32 bits; and since previously the only 'payload_len = request->len' assignment was in NBD_CMD_WRITE which also sets check_length, which in turn rejects lengths larger than 32M before relying on any possibly-truncated value stored in payload_len). Signed-off-by: Eric Blake <eblake@redhat.com> Message-ID: <20230925192229.3186470-15-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> [eblake: enhance comment on handling client error, fix type bug] Signed-off-by: Eric Blake <eblake@redhat.com>	2023-10-05 11:02:08 -05:00
Eric Blake	8db7e2d657	nbd/server: Refactor handling of command sanity checks Upcoming additions to support NBD 64-bit effect lengths will add a new command flag NBD_CMD_FLAG_PAYLOAD_LEN that needs to be considered in our sanity checks of the client's messages (that is, more than just CMD_WRITE have the potential to carry a client payload when extended headers are in effect). But before we can start to support that, it is easier to first refactor the existing set of various if statements over open-coded combinations of request->type to instead be a single switch statement over all command types that sets witnesses, then straight-line processing based on the witnesses. No semantic change is intended. Signed-off-by: Eric Blake <eblake@redhat.com> Message-ID: <20230829175826.377251-24-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>	2023-09-25 08:43:22 -05:00
Eric Blake	b257845932	nbd: Prepare for 64-bit request effect lengths Widen the length field of NBDRequest to 64-bits, although we can assert that all current uses are still under 32 bits: either because of NBD_MAX_BUFFER_SIZE which is even smaller (and where size_t can still be appropriate, even on 32-bit platforms), or because nothing ever puts us into NBD_MODE_EXTENDED yet (and while future patches will allow larger transactions, the lengths in play here are still capped at 32-bit). There are no semantic changes, other than a typo fix in a couple of error messages. Signed-off-by: Eric Blake <eblake@redhat.com> Message-ID: <20230829175826.377251-23-eblake@redhat.com> [eblake: fix assertion bug in nbd_co_send_simple_reply] Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>	2023-09-25 08:35:06 -05:00
Eric Blake	d95ffb6fe6	nbd: Add types for extended headers Add the constants and structs necessary for later patches to start implementing the NBD_OPT_EXTENDED_HEADERS extension in both the client and server, matching recent upstream nbd.git (through commit e6f3b94a934). This patch does not change any existing behavior, but merely sets the stage for upcoming patches. This patch does not change the status quo that neither the client nor server use a packed-struct representation for the request header. While most of the patch adds new types, there is also some churn for renaming the existing NBDExtent to NBDExtent32 to contrast it with NBDExtent64, which I thought was a nicer name than NBDExtentExt. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Message-ID: <20230829175826.377251-22-eblake@redhat.com>	2023-09-22 17:21:08 -05:00
Eric Blake	ac132d0520	nbd: Replace bool structured_reply with mode enum The upcoming patches for 64-bit extensions requires various points in the protocol to make decisions based on what was negotiated. While we could easily add a 'bool extended_headers' alongside the existing 'bool structured_reply', this does not scale well if more modes are added in the future. Better is to expose the mode enum added in the recent commit `bfe04d0a7d` out to a wider use in the code base. Where the code previously checked for structured_reply being set or clear, it now prefers checking for an inequality; this works because the nodes are in a continuum of increasing abilities, and allows us to touch fewer places if we ever insert other modes in the middle of the enum. There should be no semantic change in this patch. Signed-off-by: Eric Blake <eblake@redhat.com> Message-ID: <20230829175826.377251-20-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>	2023-09-22 17:19:27 -05:00
Stefan Hajnoczi	06e0f098d6	io: follow coroutine AioContext in qio_channel_yield() The ongoing QEMU multi-queue block layer effort makes it possible for multiple threads to process I/O in parallel. The nbd block driver is not compatible with the multi-queue block layer yet because QIOChannel cannot be used easily from coroutines running in multiple threads. This series changes the QIOChannel API to make that possible. In the current API, calling qio_channel_attach_aio_context() sets the AioContext where qio_channel_yield() installs an fd handler prior to yielding: qio_channel_attach_aio_context(ioc, my_ctx); ... qio_channel_yield(ioc); // my_ctx is used here ... qio_channel_detach_aio_context(ioc); This API design has limitations: reading and writing must be done in the same AioContext and moving between AioContexts involves a cumbersome sequence of API calls that is not suitable for doing on a per-request basis. There is no fundamental reason why a QIOChannel needs to run within the same AioContext every time qio_channel_yield() is called. QIOChannel only uses the AioContext while inside qio_channel_yield(). The rest of the time, QIOChannel is independent of any AioContext. In the new API, qio_channel_yield() queries the AioContext from the current coroutine using qemu_coroutine_get_aio_context(). There is no need to explicitly attach/detach AioContexts anymore and qio_channel_attach_aio_context() and qio_channel_detach_aio_context() are gone. One coroutine can read from the QIOChannel while another coroutine writes from a different AioContext. This API change allows the nbd block driver to use QIOChannel from any thread. It's important to keep in mind that the block driver already synchronizes QIOChannel access and ensures that two coroutines never read simultaneously or write simultaneously. This patch updates all users of qio_channel_attach_aio_context() to the new API. Most conversions are simple, but vhost-user-server requires a new qemu_coroutine_yield() call to quiesce the vu_client_trip() coroutine when not attached to any AioContext. While the API is has become simpler, there is one wart: QIOChannel has a special case for the iohandler AioContext (used for handlers that must not run in nested event loops). I didn't find an elegant way preserve that behavior, so I added a new API called qio_channel_set_follow_coroutine_ctx(ioc, true\|false) for opting in to the new AioContext model. By default QIOChannel uses the iohandler AioHandler. Code that formerly called qio_channel_attach_aio_context() now calls qio_channel_set_follow_coroutine_ctx(ioc, true) once after the QIOChannel is created. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Acked-by: Daniel P. Berrangé <berrange@redhat.com> Message-ID: <20230830224802.493686-5-stefanha@redhat.com> [eblake: also fix migration/rdma.c] Signed-off-by: Eric Blake <eblake@redhat.com>	2023-09-07 20:32:11 -05:00
Eric Blake	22efd81104	nbd: s/handle/cookie/ to match NBD spec Externally, libnbd exposed the 64-bit opaque marker for each client NBD packet as the "cookie", because it was less confusing when contrasted with 'struct nbd_handle *' holding all libnbd state. It also avoids confusion between the noun 'handle' as a way to identify a packet and the verb 'handle' for reacting to things like signals. Upstream NBD changed their spec to favor the name "cookie" based on libnbd's recommendations[1], so we can do likewise. [1] https://github.com/NetworkBlockDevice/nbd/commit/ca4392eb2b Signed-off-by: Eric Blake <eblake@redhat.com> Message-ID: <20230608135653.2918540-6-eblake@redhat.com> [eblake: typo fix] Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>	2023-07-19 15:25:30 -05:00
Eric Blake	66d4f4fe2f	nbd/server: Refactor to pass full request around Part of NBD's 64-bit headers extension involves passing the client's requested offset back as part of the reply header (one reason it stated for this change: converting absolute offsets stored in NBD_REPLY_TYPE_OFFSET_DATA to relative offsets within the buffer is easier if the absolute offset of the buffer is also available). This is a refactoring patch to pass the full request around the reply stack, rather than just the handle, so that later patches can then access request->from when extended headers are active. Meanwhile, this patch enables us to now assert that simple replies are only attempted when appropriate, and otherwise has no semantic change. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Message-ID: <20230608135653.2918540-5-eblake@redhat.com>	2023-07-19 15:25:27 -05:00
Eric Blake	a7c8ed36bf	nbd/server: Prepare for alternate-size headers Upstream NBD now documents[1] an extension that supports 64-bit effect lengths in requests. As part of that extension, the size of the reply headers will change in order to permit a 64-bit length in the reply for symmetry[2]. Additionally, where the reply header is currently 16 bytes for simple reply, and 20 bytes for structured reply; with the extension enabled, there will only be one extended reply header, of 32 bytes, with both structured and extended modes sending identical payloads for chunked replies. Since we are already wired up to use iovecs, it is easiest to allow for this change in header size by splitting each structured reply across multiple iovecs, one for the header (which will become wider in a future patch according to client negotiation), and the other(s) for the chunk payload, and removing the header from the payload struct definitions. Rename the affected functions with s/structured/chunk/ to make it obvious that the code will be reused in extended mode. Interestingly, the client side code never utilized the packed types, so only the server code needs to be updated. [1] https://github.com/NetworkBlockDevice/nbd/blob/extension-ext-header/doc/proto.md as of NBD commit e6f3b94a934 [2] Note that on the surface, this is because some future server might permit a 4G+ NBD_CMD_READ and need to reply with that much data in one transaction. But even though the extended reply length is widened to 64 bits, for now the NBD spec is clear that servers will not reply with more than a maximum payload bounded by the 32-bit NBD_INFO_BLOCK_SIZE field; allowing a client and server to mutually agree to transactions larger than 4G would require yet another extension. Signed-off-by: Eric Blake <eblake@redhat.com> Message-ID: <20230608135653.2918540-4-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>	2023-07-19 15:25:27 -05:00
Philippe Mathieu-Daudé	7d5b0d6864	bulk: Remove pointless QOM casts Mechanical change running Coccinelle spatch with content generated from the qom-cast-macro-clean-cocci-gen.py added in the previous commit. Suggested-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-Id: <20230601093452.38972-3-philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Thomas Huth <thuth@redhat.com>	2023-06-05 20:48:34 +02:00
Kevin Wolf	7c1f51bf38	nbd/server: Fix drained_poll to wake coroutine in right AioContext nbd_drained_poll() generally runs in the main thread, not whatever iothread the NBD server coroutine is meant to run in, so it can't directly reenter the coroutines to wake them up. The code seems to have the right intention, it specifies the correct AioContext when it calls qemu_aio_coroutine_enter(). However, this functions doesn't schedule the coroutine to run in that AioContext, but it assumes it is already called in the home thread of the AioContext. To fix this, add a new thread-safe qio_channel_wake_read() that can be called in the main thread to wake up the coroutine in its AioContext, and use this in nbd_drained_poll(). Cc: qemu-stable@nongnu.org Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20230517152834.277483-3-kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2023-05-19 19:16:53 +02:00
Paolo Bonzini	d2223cddce	nbd: mark more coroutine_fns, do not use co_wrappers Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2023-04-25 13:17:28 +02:00
Paolo Bonzini	dd5b6780f7	nbd: a BlockExport always has a BlockBackend exp->common.blk cannot be NULL, nbd_export_delete() is only called (through a bottom half) from blk_exp_unref() and in turn that can only happen after blk_exp_add() has asserted exp->blk != NULL. Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2023-04-20 11:17:36 +02:00
Eric Blake	f1426881a8	nbd/server: Request TCP_NODELAY Nagle's algorithm adds latency in order to reduce network packet overhead on small packets. But when we are already using corking to merge smaller packets into transactional requests, the extra delay from TCP defaults just gets in the way (see recent commit `bd2cd4a4`). For reference, qemu as an NBD client already requests TCP_NODELAY (see nbd_connect() in nbd/client-connection.c); as does libnbd as a client [1], and nbdkit as a server [2]. Furthermore, the NBD spec recommends the use of TCP_NODELAY [3]. [1] https://gitlab.com/nbdkit/libnbd/-/blob/a48a1142/generator/states-connect.c#L39 [2] https://gitlab.com/nbdkit/nbdkit/-/blob/45b72f5b/server/sockets.c#L430 [3] https://github.com/NetworkBlockDevice/nbd/blob/master/doc/proto.md#protocol-phases CC: Florian Westphal <fw@strlen.de> Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20230404004047.142086-1-eblake@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>	2023-04-04 08:13:15 -05:00
Florian Westphal	bd2cd4a441	nbd/server: push pending frames after sending reply qemu-nbd doesn't set TCP_NODELAY on the tcp socket. Kernel waits for more data and avoids transmission of small packets. Without TLS this is barely noticeable, but with TLS this really shows. Booting a VM via qemu-nbd on localhost (with tls) takes more than 2 minutes on my system. tcpdump shows frequent wait periods, where no packets get sent for a 40ms period. Add explicit (un)corking when processing (and responding to) requests. "TCP_CORK, &zero" after earlier "CORK, &one" will flush pending data. VM Boot time: main: no tls: 23s, with tls: 2m45s patched: no tls: 14s, with tls: 15s VM Boot time, qemu-nbd via network (same lan): main: no tls: 18s, with tls: 1m50s patched: no tls: 17s, with tls: 18s Future optimization: if we could detect if there is another pending request we could defer the uncork operation because more data would be appended. Signed-off-by: Florian Westphal <fw@strlen.de> Message-Id: <20230324104720.2498-1-fw@strlen.de> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2023-03-27 13:44:29 +02:00
Markus Armbruster	e2c1c34f13	include/block: Untangle inclusion loops We have two inclusion loops: block/block.h -> block/block-global-state.h -> block/block-common.h -> block/blockjob.h -> block/block.h block/block.h -> block/block-io.h -> block/block-common.h -> block/blockjob.h -> block/block.h I believe these go back to Emanuele's reorganization of the block API, merged a few months ago in commit `d7e2fe4aac`. Fortunately, breaking them is merely a matter of deleting unnecessary includes from headers, and adding them back in places where they are now missing. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20221221133551.3967339-2-armbru@redhat.com>	2023-01-20 07:24:28 +01:00
Emanuele Giuseppe Esposito	ff7e261bb9	block-backend: replace bdrv__above with blk__above Avoid mixing bdrv_* functions with blk_, so create blk_ counterparts for bdrv_block_status_above and bdrv_is_allocated_above. Note that since blk_co_block_status_above only calls the g_c_w function bdrv_common_block_status_above and is marked as coroutine_fn, call directly bdrv_co_common_block_status_above() to avoid using a g_c_w. Same applies to blk_co_is_allocated_above. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Message-Id: <20221128142337.657646-5-eesposit@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2022-12-15 16:07:43 +01:00
Emanuele Giuseppe Esposito	6f58ac5539	nbd/server.c: add coroutine_fn annotations These functions end up calling bdrv_*() implemented as generated_co_wrapper functions. In addition, they also happen to be always called in coroutine context, meaning all callers are coroutine_fn. This means that the g_c_w function will enter the qemu_in_coroutine() case and eventually suspend (or in other words call qemu_coroutine_yield()). Therefore we can mark such functions coroutine_fn too. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Message-Id: <20221128142337.657646-4-eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2022-12-15 16:07:43 +01:00
Markus Armbruster	8461b4d601	nbd/server: Clean up abuse of BlockExportOptionsNbd member @arg block-export-add argument @name defaults to the value of argument @node-name. nbd_export_create() implements this by copying @node_name to @name. It leaves @has_node_name false, violating the "has_node_name == !!node_name" invariant. Unclean. Falls apart when we elide @has_node_name (next commit): then QAPI frees the same value twice, once for @node_name and once @name. iotest 307 duly explodes. Goes back to commit `c62d24e906` "blockdev-nbd: Boxed argument type for nbd-server-add" (v5.0.0). Got moved from qmp_nbd_server_add() to nbd_export_create() (commit `56ee86261e`), then copied back (commit `b6076afcab`). Commit `8675cbd68b` "nbd: Utilize QAPI_CLONE for type conversion" (v5.2.0) cleaned up the copy in qmp_nbd_server_add() noting Second, our assignment to arg->name is fishy: the generated QAPI code for qapi_free_NbdServerAddOptions does not visit arg->name if arg->has_name is false, but if it DID visit it, we would have introduced a double-free situation when arg is finally freed. Exactly. However, the copy in nbd_export_create() remained dirty. Clean it up. Since the value stored in member @name is not actually used outside this function, use a local variable instead of modifying the QAPI object. Signed-off-by: Markus Armbruster <armbru@redhat.com> Cc: Eric Blake <eblake@redhat.com> Cc: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Cc: qemu-block@nongnu.org Message-Id: <20221104160712.3005652-10-armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>	2022-12-13 18:31:37 +01:00
Alberto Faria	a9262f551e	block: Change blk_{pread,pwrite}() param order Swap 'buf' and 'bytes' around for consistency with blk_co_{pread,pwrite}(), and in preparation to implement these functions using generated_co_wrapper. Callers were updated using this Coccinelle script: @@ expression blk, offset, buf, bytes, flags; @@ - blk_pread(blk, offset, buf, bytes, flags) + blk_pread(blk, offset, bytes, buf, flags) @@ expression blk, offset, buf, bytes, flags; @@ - blk_pwrite(blk, offset, buf, bytes, flags) + blk_pwrite(blk, offset, bytes, buf, flags) It had no effect on hw/block/nand.c, presumably due to the #if, so that file was updated manually. Overly-long lines were then fixed by hand. Signed-off-by: Alberto Faria <afaria@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Hanna Reitz <hreitz@redhat.com> Message-Id: <20220705161527.1054072-4-afaria@redhat.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>	2022-07-12 12:14:56 +02:00
Alberto Faria	3b35d4542c	block: Add a 'flags' param to blk_pread() For consistency with other I/O functions, and in preparation to implement it using generated_co_wrapper. Callers were updated using this Coccinelle script: @@ expression blk, offset, buf, bytes; @@ - blk_pread(blk, offset, buf, bytes) + blk_pread(blk, offset, buf, bytes, 0) It had no effect on hw/block/nand.c, presumably due to the #if, so that file was updated manually. Overly-long lines were then fixed by hand. Signed-off-by: Alberto Faria <afaria@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Hanna Reitz <hreitz@redhat.com> Message-Id: <20220705161527.1054072-3-afaria@redhat.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>	2022-07-12 12:14:56 +02:00
Eric Blake	58a6fdcc9e	nbd/server: Allow MULTI_CONN for shared writable exports According to the NBD spec, a server that advertises NBD_FLAG_CAN_MULTI_CONN promises that multiple client connections will not see any cache inconsistencies: when properly separated by a single flush, actions performed by one client will be visible to another client, regardless of which client did the flush. We always satisfy these conditions in qemu - even when we support multiple clients, ALL clients go through a single point of reference into the block layer, with no local caching. The effect of one client is instantly visible to the next client. Even if our backend were a network device, we argue that any multi-path caching effects that would cause inconsistencies in back-to-back actions not seeing the effect of previous actions would be a bug in that backend, and not the fault of caching in qemu. As such, it is safe to unconditionally advertise CAN_MULTI_CONN for any qemu NBD server situation that supports parallel clients. Note, however, that we don't want to advertise CAN_MULTI_CONN when we know that a second client cannot connect (for historical reasons, qemu-nbd defaults to a single connection while nbd-server-add and QMP commands default to unlimited connections; but we already have existing means to let either style of NBD server creation alter those defaults). This is visible by no longer advertising MULTI_CONN for 'qemu-nbd -r' without -e, as in the iotest nbd-qemu-allocation. The harder part of this patch is setting up an iotest to demonstrate behavior of multiple NBD clients to a single server. It might be possible with parallel qemu-io processes, but I found it easier to do in python with the help of libnbd, and help from Nir and Vladimir in writing the test. Signed-off-by: Eric Blake <eblake@redhat.com> Suggested-by: Nir Soffer <nsoffer@redhat.com> Suggested-by: Vladimir Sementsov-Ogievskiy <v.sementsov-og@mail.ru> Message-Id: <20220512004924.417153-3-eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2022-05-12 13:10:52 +02:00
Vladimir Sementsov-Ogievskiy	e5fb29d5d0	qapi: nbd-export: allow select bitmaps by node/name pair Hi all! Current logic of relying on search through backing chain is not safe neither convenient. Sometimes it leads to necessity of extra bitmap copying. Also, we are going to add "snapshot-access" driver, to access some snapshot state through NBD. And this driver is not formally a filter, and of course it's not a COW format driver. So, searching through backing chain will not work. Instead of widening the workaround of bitmap searching, let's extend the interface so that user can select bitmap precisely. Note, that checking for bitmap active status is not copied to the new API, I don't see a reason for it, user should understand the risks. And anyway, bitmap from other node is unrelated to this export being read-only or read-write. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@openvz.org> Message-Id: <20220314213226.362217-3-v.sementsov-og@mail.ru> [eblake: Adjust S-o-b to Vladimir's new email, with permission] Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2022-04-26 13:15:19 -05:00
Marc-André Lureau	e0e7fe07e1	Remove trailing ; after G_DEFINE_AUTO macro The macro doesn't need it. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>	2022-03-22 14:46:18 +04:00
Marc-André Lureau	9edc6313da	Replace GCC_FMT_ATTR with G_GNUC_PRINTF One less qemu-specific macro. It also helps to make some headers/units only depend on glib, and thus moved in standalone projects eventually. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Richard W.M. Jones <rjones@redhat.com>	2022-03-22 14:40:51 +04:00
Peter Maydell	fdee2c9692	nbd patches for 2022-03-07 - Dan Berrange: Allow qemu-nbd to support TLS over Unix sockets - Eric Blake: Minor cleanups related to 64-bit block operations -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEccLMIrHEYCkn0vOqp6FrSiUnQ2oFAmImtE8ACgkQp6FrSiUn Q2ovmgf/aksDqf2eNcahs++fez+8Qi9ll5OY/qGyjnzBgsatYKjrK+xF7OnjoJox eRX026lh81Q4EQK7oZBUnr2UCY4bncDBTI7MTLh603EV/tId5ZLwx007ERhzvtC1 mIsQHXNuO9X25LQG2eWnfunY9YztQpiT5r/g3khD2yPBqJWIvBfblzPLx6FkF7px /WM8xEKCihmGr1Wr3b+zGYL083YkaBWCvHoR8mJt3tEFUj+Qie8XcdV0OVyI0XUj 5goIFRcpVwBE8P2nLtfUKNzEXz22cmdonOJUX7E5IvGO21k5F/HrWlHdo8JnuSUZ t0w5L9yCxBrRpY1burz30b77J0WMCw== =C8Dd -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/ericb/tags/pull-nbd-2022-03-07' into staging nbd patches for 2022-03-07 - Dan Berrange: Allow qemu-nbd to support TLS over Unix sockets - Eric Blake: Minor cleanups related to 64-bit block operations # gpg: Signature made Tue 08 Mar 2022 01:41:35 GMT # gpg: using RSA key 71C2CC22B1C4602927D2F3AAA7A16B4A2527436A # gpg: Good signature from "Eric Blake <eblake@redhat.com>" [full] # gpg: aka "Eric Blake (Free Software Programmer) <ebb9@byu.net>" [full] # gpg: aka "[jpeg image of size 6874]" [full] # Primary key fingerprint: 71C2 CC22 B1C4 6029 27D2 F3AA A7A1 6B4A 2527 436A * remotes/ericb/tags/pull-nbd-2022-03-07: qemu-io: Allow larger write zeroes under no fallback qemu-io: Utilize 64-bit status during map nbd/server: Minor cleanups tests/qemu-iotests: validate NBD TLS with UNIX sockets and PSK tests/qemu-iotests: validate NBD TLS with UNIX sockets tests/qemu-iotests: validate NBD TLS with hostname mismatch tests/qemu-iotests: convert NBD TLS test to use standard filters tests/qemu-iotests: introduce filter for qemu-nbd export list tests/qemu-iotests: expand _filter_nbd rules tests/qemu-iotests: add QEMU_IOTESTS_REGEN=1 to update reference file block/nbd: don't restrict TLS usage to IP sockets qemu-nbd: add --tls-hostname option for TLS certificate validation block/nbd: support override of hostname for TLS certificate validation block: pass desired TLS hostname through from block driver client crypto: mandate a hostname when checking x509 creds on a client Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2022-03-09 11:38:29 +00:00
Eric Blake	314b902621	nbd/server: Minor cleanups Spelling fixes, grammar improvements and consistent spacing, noticed while preparing other patches in this file. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20211203231539.3900865-2-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>	2022-03-07 19:23:25 -06:00
Peter Maydell	5df022cf2e	osdep: Move memalign-related functions to their own header Move the various memalign-related functions out of osdep.h and into their own header, which we include only where they are used. While we're doing this, add some brief documentation comments. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-id: 20220226180723.1706285-10-peter.maydell@linaro.org	2022-03-07 13:16:49 +00:00
Nir Soffer	523f5a9971	nbd/server.c: Remove unused field NBDRequestData struct has unused QSIMPLEQ_ENTRY field. It seems that this field exists since the first git commit and was never used. Signed-off-by: Nir Soffer <nsoffer@redhat.com> Message-Id: <20220111194313.581486-1-nsoffer@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Fixes: `d9a73806` ("qemu-nbd: introduce NBDRequest", v1.1) Signed-off-by: Eric Blake <eblake@redhat.com>	2022-01-28 16:48:28 -06:00
Eric Blake	e35574226a	nbd/server: Simplify zero and trim Now that the block layer supports 64-bit operations (see commit `2800637a` and friends, new to v6.2), we no longer have to self-fragment requests larger than 2G, reverting the workaround added in `890cbccb08` ("nbd: Fix large trim/zero requests", v5.1.0). Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20211117170230.1128262-3-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>	2021-11-22 07:37:15 -06:00
Eric Blake	1644cccea5	nbd/server: Don't complain on certain client disconnects When a client disconnects abruptly, but did not have any pending requests (for example, when using nbdsh without calling h.shutdown), we used to output the following message: $ qemu-nbd -f raw file $ nbdsh -u 'nbd://localhost:10809' -c 'h.trim(1,0)' qemu-nbd: Disconnect client, due to: Failed to read request: Unexpected end-of-file before all bytes were read Then in commit `f148ae7`, we refactored nbd_receive_request() to use nbd_read_eof(); when this returns 0, we regressed into tracing uninitialized memory (if tracing is enabled) and reporting a less-specific: qemu-nbd: Disconnect client, due to: Request handling failed in intermediate state Note that with Unix sockets, we have yet another error message, unchanged by the 6.0 regression: $ qemu-nbd -k /tmp/sock -f raw file $ nbdsh -u 'nbd+unix:///?socket=/tmp/sock' -c 'h.trim(1,0)' qemu-nbd: Disconnect client, due to: Failed to send reply: Unable to write to socket: Broken pipe But in all cases, the error message goes away if the client performs a soft shutdown by using NBD_CMD_DISC, rather than a hard shutdown by abrupt disconnect: $ nbdsh -u 'nbd://localhost:10809' -c 'h.trim(1,0)' -c 'h.shutdown()' This patch fixes things to avoid uninitialized memory, and in general avoids warning about a client that does a hard shutdown when not in the middle of a packet. A client that aborts mid-request, or which does not read the full server's reply, can still result in warnings, but those are indeed much more unusual situations. CC: qemu-stable@nongnu.org Fixes: `f148ae7d36` ("nbd/server: Quiesce coroutines on context switch", v6.0.0) Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> [eblake: defer unrelated typo fixes to later patch] Message-Id: <20211117170230.1128262-2-eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-11-22 07:37:14 -06:00
Eric Blake	76df2b8d69	nbd/server: Silence clang sanitizer warning clang's sanitizer is picky: memset(NULL, x, 0) is technically undefined behavior, even though no sane implementation of memset() deferences the NULL. Caught by the nbd-qemu-allocation iotest. The alternative to checking before each memset is to instead force an allocation of 1 element instead of g_new0(type, 0)'s behavior of returning NULL for a 0-length array. Reported-by: Peter Maydell <peter.maydell@linaro.org> Fixes: `3b1f244c59` (nbd: Allow export of multiple bitmaps for one device) Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20211115223943.626416-1-eblake@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>	2021-11-16 08:45:33 -06:00
Eric Blake	da24597dd3	nbd/server: Allow LIST_META_CONTEXT without STRUCTURED_REPLY The NBD protocol just relaxed the requirements on NBD_OPT_LIST_META_CONTEXT: https://github.com/NetworkBlockDevice/nbd/commit/13a4e33a87 Since listing is not stateful (unlike SET_META_CONTEXT), we don't care if a client asks for meta contexts without first requesting structured replies. Well-behaved clients will still ask for structured reply first (if for no other reason than for back-compat to older servers), but that's no reason to avoid this change. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20210907173505.1499709-1-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>	2021-09-29 13:46:32 -05:00
Richard Henderson	cd1675f8d7	nbd/server: Mark variable unused in nbd_negotiate_meta_queries From clang-13: nbd/server.c:976:22: error: variable 'bitmaps' set but not used \ [-Werror,-Wunused-but-set-variable] which is incorrect; see //bugs.llvm.org/show_bug.cgi?id=3888. Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2021-07-26 07:06:25 -10:00
Sergio Lopez	fd6afc501a	nbd/server: Use drained block ops to quiesce the server Before switching between AioContexts we need to make sure that we're fully quiesced ("nb_requests == 0" for every client) when entering the drained section. To do this, we set "quiescing = true" for every client on ".drained_begin" to prevent new coroutines from being created, and check if "nb_requests == 0" on ".drained_poll". Finally, once we're exiting the drained section, on ".drained_end" we set "quiescing = false" and call "nbd_client_receive_next_request()" to resume the processing of new requests. With these changes, "blk_aio_attach()" and "blk_aio_detach()" can be reverted to be as simple as they were before `f148ae7d36`. RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1960137 Suggested-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Sergio Lopez <slp@redhat.com> Message-Id: <20210602060552.17433-3-slp@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-06-02 14:23:20 +02:00
Nir Soffer	0da9856851	nbd: server: Report holes for raw images When querying image extents for raw image, qemu-nbd reports holes as zero: $ qemu-nbd -t -r -f raw empty-6g.raw $ qemu-img map --output json nbd://localhost [{ "start": 0, "length": 6442450944, "depth": 0, "zero": true, "data": true, "offset": 0}] $ qemu-img map --output json empty-6g.raw [{ "start": 0, "length": 6442450944, "depth": 0, "zero": true, "data": false, "offset": 0}] Turns out that qemu-img map reports a hole based on BDRV_BLOCK_DATA, but nbd server reports a hole based on BDRV_BLOCK_ALLOCATED. The NBD protocol says: NBD_STATE_HOLE (bit 0): if set, the block represents a hole (and future writes to that area may cause fragmentation or encounter an NBD_ENOSPC error); if clear, the block is allocated or the server could not otherwise determine its status. qemu-img manual says: whether the sectors contain actual data or not (boolean field data; if false, the sectors are either unallocated or stored as optimized all-zero clusters); To me, data=false looks compatible with NBD_STATE_HOLE. From user point of view, getting same results from qemu-nbd and qemu-img is more important than being more correct about allocation status. Changing nbd server to report holes using BDRV_BLOCK_DATA makes qemu-nbd results compatible with qemu-img map: $ qemu-img map --output json nbd://localhost [{ "start": 0, "length": 6442450944, "depth": 0, "zero": true, "data": false, "offset": 0}] Signed-off-by: Nir Soffer <nsoffer@redhat.com> Message-Id: <20210219160752.1826830-1-nsoffer@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-03-08 13:08:45 -06:00
Sergio Lopez	f148ae7d36	nbd/server: Quiesce coroutines on context switch When switching between AIO contexts we need to me make sure that both recv_coroutine and send_coroutine are not scheduled to run. Otherwise, QEMU may crash while attaching the new context with an error like this one: aio_co_schedule: Co-routine was already scheduled in 'aio_co_schedule' To achieve this we need a local implementation of 'qio_channel_readv_all_eof' named 'nbd_read_eof' (a trick already done by 'nbd/client.c') that allows us to interrupt the operation and to know when recv_coroutine is yielding. With this in place, we delegate detaching the AIO context to the owning context with a BH ('nbd_aio_detach_bh') scheduled using 'aio_wait_bh_oneshot'. This BH signals that we need to quiesce the channel by setting 'client->quiescing' to 'true', and either waits for the coroutine to finish using AIO_WAIT_WHILE or, if it's yielding in 'nbd_read_eof', actively enters the coroutine to interrupt it. RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1900326 Signed-off-by: Sergio Lopez <slp@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20201214170519.223781-4-slp@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-01-20 14:48:54 -06:00
Eric Blake	c0b21f2e22	nbd: Silence Coverity false positive Coverity noticed (CID 1436125) that we check the return value of nbd_extent_array_add in most places, but not at the end of bitmap_to_extents(). The return value exists to break loops before a future iteration, so there is nothing to check if we are already done iterating. Adding a cast to void, plus a comment why, pacifies Coverity. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20201111163510.713855-1-eblake@redhat.com> [eblake: Prefer cast to void over odd && usage] Reviewed-by: Richard Henderson <richard.henderson@linaro.org>	2020-11-16 14:51:12 -06:00

1 2 3 4 5 ...

260 Commits