qemu-e2k

Author	SHA1	Message	Date
Eric Blake	890cbccb08	nbd: Fix large trim/zero requests Although qemu as NBD client limits requests to <2G, the NBD protocol allows clients to send requests almost all the way up to 4G. But because our block layer is not yet 64-bit clean, we accidentally wrap such requests into a negative size, and fail with EIO instead of performing the intended operation. The bug is visible in modern systems with something as simple as: $ qemu-img create -f qcow2 /tmp/image.img 5G $ sudo qemu-nbd --connect=/dev/nbd0 /tmp/image.img $ sudo blkdiscard /dev/nbd0 or with user-space only: $ truncate --size=3G file $ qemu-nbd -f raw file $ nbdsh -u nbd://localhost:10809 -c 'h.trim(310241024*1024,0)' Although both blk_co_pdiscard and blk_pwrite_zeroes currently return 0 on success, this is also a good time to fix our code to a more robust paradigm that treats all non-negative values as success. Alas, our iotests do not currently make it easy to add external dependencies on blkdiscard or nbdsh, so we have to rely on manual testing for now. This patch can be reverted when we later improve the overall block layer to be 64-bit clean, but for now, a minimal fix was deemed less risky prior to release. CC: qemu-stable@nongnu.org Fixes: `1f4d6d18ed` Fixes: `1c6c4bb7f0` Fixes: https://github.com/systemd/systemd/issues/16242 Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20200722212231.535072-1-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> [eblake: rework success tests to use >=0]	2020-07-28 08:49:29 -05:00
Vladimir Sementsov-Ogievskiy	453cc6be0a	nbd: make nbd_export_close_all() synchronous Consider nbd_export_close_all(). The call-stack looks like this: nbd_export_close_all() -> nbd_export_close -> call client_close() for each client. client_close() doesn't guarantee that client is closed: nbd_trip() keeps reference to it. So, nbd_export_close_all() just reduce reference counter on export and removes it from the list, but doesn't guarantee that nbd_trip() finished neither export actually removed. Let's wait for all exports actually removed. Without this fix, the following crash is possible: - export bitmap through internal Qemu NBD server - connect a client - shutdown Qemu On shutdown nbd_export_close_all is called, but it actually don't wait for nbd_trip() to finish and to release its references. So, export is not release, and exported bitmap remains busy, and on try to remove the bitmap (which is part of bdrv_close()) the assertion fails: bdrv_release_dirty_bitmap_locked: Assertion `!bdrv_dirty_bitmap_busy(bitmap)' failed Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200714162234.13113-2-vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-07-17 14:20:57 +02:00
Vladimir Sementsov-Ogievskiy	795d946d07	nbd: Use ERRP_GUARD() If we want to check error after errp-function call, we need to introduce local_err and then propagate it to errp. Instead, use the ERRP_GUARD() macro, benefits are: 1. No need of explicit error_propagate call 2. No need of explicit local_err variable: use errp directly 3. ERRP_GUARD() leaves errp as is if it's not NULL or &error_fatal, this means that we don't break error_abort (we'll abort on error_set, not on error_propagate) If we want to add some info to errp (by error_prepend() or error_append_hint()), we must use the ERRP_GUARD() macro. Otherwise, this info will not be added when errp == &error_fatal (the program will exit prior to the error_append_hint() or error_prepend() call). Fix several such cases, e.g. in nbd_read(). This commit is generated by command sed -n '/^Network Block Device (NBD)$/,/^$/{s/^F: //p}' \ MAINTAINERS \| \ xargs git ls-files \| grep '\.[hc]$' \| \ xargs spatch \ --sp-file scripts/coccinelle/errp-guard.cocci \ --macro-file scripts/cocci-macro-file.h \ --in-place --no-show-diff --max-width 80 Reported-by: Kevin Wolf <kwolf@redhat.com> Reported-by: Greg Kurz <groug@kaod.org> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> [Commit message tweaked] Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20200707165037.1026246-8-armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> [ERRP_AUTO_PROPAGATE() renamed to ERRP_GUARD(), and auto-propagated-errp.cocci to errp-guard.cocci. Commit message tweaked again.]	2020-07-10 15:18:09 +02:00
Eric Blake	5c4fe018c0	nbd/server: Avoid long error message assertions CVE-2020-10761 Ever since commit `36683283` (v2.8), the server code asserts that error strings sent to the client are well-formed per the protocol by not exceeding the maximum string length of 4096. At the time the server first started sending error messages, the assertion could not be triggered, because messages were completely under our control. However, over the years, we have added latent scenarios where a client could trigger the server to attempt an error message that would include the client's information if it passed other checks first: - requesting NBD_OPT_INFO/GO on an export name that is not present (commit `0cfae925` in v2.12 echoes the name) - requesting NBD_OPT_LIST/SET_META_CONTEXT on an export name that is not present (commit `e7b1948d` in v2.12 echoes the name) At the time, those were still safe because we flagged names larger than 256 bytes with a different message; but that changed in commit `93676c88` (v4.2) when we raised the name limit to 4096 to match the NBD string limit. (That commit also failed to change the magic number 4096 in nbd_negotiate_send_rep_err to the just-introduced named constant.) So with that commit, long client names appended to server text can now trigger the assertion, and thus be used as a denial of service attack against a server. As a mitigating factor, if the server requires TLS, the client cannot trigger the problematic paths unless it first supplies TLS credentials, and such trusted clients are less likely to try to intentionally crash the server. We may later want to further sanitize the user-supplied strings we place into our error messages, such as scrubbing out control characters, but that is less important to the CVE fix, so it can be a later patch to the new nbd_sanitize_name. Consideration was given to changing the assertion in nbd_negotiate_send_rep_verr to instead merely log a server error and truncate the message, to avoid leaving a latent path that could trigger a future CVE DoS on any new error message. However, this merely complicates the code for something that is already (correctly) flagging coding errors, and now that we are aware of the long message pitfall, we are less likely to introduce such errors in the future, which would make such error handling dead code. Reported-by: Xueqiang Wei <xuwei@redhat.com> CC: qemu-stable@nongnu.org Fixes: https://bugzilla.redhat.com/1843684 CVE-2020-10761 Fixes: `93676c88d7` Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20200610163741.3745251-2-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>	2020-06-10 12:58:59 -05:00
Vladimir Sementsov-Ogievskiy	dacbb6eb8a	nbd/server: use bdrv_dirty_bitmap_next_dirty_area Use bdrv_dirty_bitmap_next_dirty_area for bitmap_to_extents. Since bdrv_dirty_bitmap_next_dirty_area is very accurate in its interface, we'll never exceed requested region with last chunk. So, we don't need dont_fragment, and bitmap_to_extents() interface becomes clean enough to not require any comment. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 20200205112041.6003-10-vsementsov@virtuozzo.com Signed-off-by: John Snow <jsnow@redhat.com>	2020-03-18 14:03:46 -04:00
Vladimir Sementsov-Ogievskiy	89cbc7e308	nbd/server: introduce NBDExtentArray Introduce NBDExtentArray class, to handle extents list creation in more controlled way and with fewer OUT parameters in functions. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 20200205112041.6003-9-vsementsov@virtuozzo.com Signed-off-by: John Snow <jsnow@redhat.com>	2020-03-18 14:03:46 -04:00
Vladimir Sementsov-Ogievskiy	642700fda0	block/dirty-bitmap: switch _next_dirty_area and _next_zero to int64_t We are going to introduce bdrv_dirty_bitmap_next_dirty so that same variable may be used to store its return value and to be its parameter, so it would int64_t. Similarly, we are going to refactor hbitmap_next_dirty_area to use hbitmap_next_dirty together with hbitmap_next_zero, therefore we want hbitmap_next_zero parameter type to be int64_t too. So, for convenience update all parameters of _next_zero and _next_dirty_area to be int64_t. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20200205112041.6003-6-vsementsov@virtuozzo.com Signed-off-by: John Snow <jsnow@redhat.com>	2020-03-18 14:03:46 -04:00
Eric Blake	73e064ccf0	nbd: Fix regression with multiple meta contexts Detected by a hang in the libnbd testsuite. If a client requests multiple meta contexts (both base:allocation and qemu:dirty-bitmap:x) at the same time, our attempt to silence a false-positive warning about a potential uninitialized variable introduced botched logic: we were short-circuiting the second context, and never sending the NBD_REPLY_FLAG_DONE. Combining two 'if' into one 'if/else' in `bdf200a55` was wrong (I'm a bit embarrassed that such a change was my initial suggestion after the v1 patch, then I did not review the v2 patch that actually got committed). Revert that, and instead silence the false positive warning by replacing 'return ret' with 'return 0' (the value it always has at that point in the code, even though it eluded the deduction abilities of the robot that reported the false positive). Fixes: `bdf200a553` Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20200206173832.130004-1-eblake@redhat.com> Reviewed-by: Laurent Vivier <laurent@vivier.eu>	2020-02-26 14:45:02 -06:00
Pan Nengyuan	bdf200a553	nbd: fix uninitialized variable warning Fixes: /mnt/sdb/qemu/nbd/server.c: In function 'nbd_handle_request': /mnt/sdb/qemu/nbd/server.c:2313:9: error: 'ret' may be used uninitialized in this function [-Werror=maybe-uninitialized] int ret; Reported-by: Euler Robot <euler.robot@huawei.com> Signed-off-by: Pan Nengyuan <pannengyuan@huawei.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Laurent Vivier <laurent@vivier.eu> Message-Id: <20200108025132.46956-1-pannengyuan@huawei.com> Signed-off-by: Laurent Vivier <laurent@vivier.eu>	2020-01-08 16:08:17 +01:00
Eric Blake	93676c88d7	nbd: Don't send oversize strings Qemu as server currently won't accept export names larger than 256 bytes, nor create dirty bitmap names longer than 1023 bytes, so most uses of qemu as client or server have no reason to get anywhere near the NBD spec maximum of a 4k limit per string. However, we weren't actually enforcing things, ignoring when the remote side violates the protocol on input, and also having several code paths where we send oversize strings on output (for example, qemu-nbd --description could easily send more than 4k). Tighten things up as follows: client: - Perform bounds check on export name and dirty bitmap request prior to handing it to server - Validate that copied server replies are not too long (ignoring NBD_INFO_* replies that are not copied is not too bad) server: - Perform bounds check on export name and description prior to advertising it to client - Reject client name or metadata query that is too long - Adjust things to allow full 4k name limit rather than previous 256 byte limit Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20191114024635.11363-4-eblake@redhat.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>	2019-11-18 16:01:34 -06:00
Eric Blake	9d7ab222da	nbd/server: Prefer heap over stack for parsing client names As long as we limit NBD names to 256 bytes (the bare minimum permitted by the standard), stack-allocation works for parsing a name received from the client. But as mentioned in a comment, we eventually want to permit up to the 4k maximum of the NBD standard, which is too large for stack allocation; so switch everything in the server to use heap allocation. For now, there is no change in actually supported name length. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20191114024635.11363-2-eblake@redhat.com> [eblake: fix uninit variable compile failure] Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>	2019-11-18 16:01:34 -06:00
Eric Blake	61bc846d8c	nbd: Grab aio context lock in more places When iothreads are in use, the failure to grab the aio context results in an assertion failure when trying to unlock things during blk_unref, when trying to unlock a mutex that was not locked. In short, all calls to nbd_export_put need to done while within the correct aio context. But since nbd_export_put can recursively reach itself via nbd_export_close, and recursively grabbing the context would deadlock, we can't do the context grab directly in those functions, but must do so in their callers. Hoist the use of the correct aio_context from nbd_export_new() to its caller qmp_nbd_server_add(). Then tweak qmp_nbd_server_remove(), nbd_eject_notifier(), and nbd_esport_close_all() to grab the right context, so that all callers during qemu now own the context before nbd_export_put() can call blk_unref(). Remaining uses in qemu-nbd don't matter (since that use case does not support iothreads). Suggested-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20190917023917.32226-1-eblake@redhat.com> Reviewed-by: Sergio Lopez <slp@redhat.com>	2019-09-24 07:30:19 -05:00
Sergio Lopez	b4961249af	nbd/server: attach client channel to the export's AioContext On creation, the export's AioContext is set to the same one as the BlockBackend, while the AioContext in the client QIOChannel is left untouched. As a result, when using data-plane, nbd_client_receive_next_request() schedules coroutines in the IOThread AioContext, while the client's QIOChannel is serviced from the main_loop, potentially triggering the assertion at qio_channel_restart_[read\|write]. To fix this, as soon we have the export corresponding to the client, we call qio_channel_attach_aio_context() to attach the QIOChannel context to the export's AioContext. This matches with the logic at blk_aio_attached(). RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1748253 Signed-off-by: Sergio Lopez <slp@redhat.com> Message-Id: <20190912110032.26395-1-slp@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2019-09-24 07:30:19 -05:00
Eric Blake	b491dbb7f8	nbd: Implement server use of NBD FAST_ZERO The server side is fairly straightforward: we can always advertise support for detection of fast zero, and implement it by mapping the request to the block layer BDRV_REQ_NO_FALLBACK. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20190823143726.27062-5-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> [eblake: update iotests 223, 233]	2019-09-05 16:04:53 -05:00
Eric Blake	0a4795455c	nbd: Prepare for NBD_CMD_FLAG_FAST_ZERO Commit `fe0480d6` and friends added BDRV_REQ_NO_FALLBACK as a way to avoid wasting time on a preliminary write-zero request that will later be rewritten by actual data, if it is known that the write-zero request will use a slow fallback; but in doing so, could not optimize for NBD. The NBD specification is now considering an extension that will allow passing on those semantics; this patch updates the new protocol bits and 'qemu-nbd --list' output to recognize the bit, as well as the new errno value possible when using the new flag; while upcoming patches will improve the client to use the feature when present, and the server to advertise support for it. The NBD spec recommends (but not requires) that ENOTSUP be avoided for all but failures of a fast zero (the only time it is mandatory to avoid an ENOTSUP failure is when fast zero is supported but not requested during write zeroes; the questionable use is for ENOTSUP to other actions like a normal write request). However, clients that get an unexpected ENOTSUP will either already be treating it the same as EINVAL, or may appreciate the extra bit of information. We were equally loose for returning EOVERFLOW in more situations than recommended by the spec, so if it turns out to be a problem in practice, a later patch can tighten handling for both error codes. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20190823143726.27062-3-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> [eblake: tweak commit message, also handle EOPNOTSUPP]	2019-09-05 16:03:13 -05:00
Eric Blake	dbb38caac5	nbd: Improve per-export flag handling in server When creating a read-only image, we are still advertising support for TRIM and WRITE_ZEROES to the client, even though the client should not be issuing those commands. But seeing this requires looking across multiple functions: All callers to nbd_export_new() passed a single flag based solely on whether the export allows writes. Later, we then pass a constant set of flags to nbd_negotiate_options() (namely, the set of flags which we always support, at least for writable images), which is then further dynamically modified with NBD_FLAG_SEND_DF based on client requests for structured options. Finally, when processing NBD_OPT_EXPORT_NAME or NBD_OPT_EXPORT_GO we bitwise-or the original caller's flag with the runtime set of flags we've built up over several functions. Let's refactor things to instead compute a baseline of flags as soon as possible which gets shared between multiple clients, in nbd_export_new(), and changing the signature for the callers to pass in a simpler bool rather than having to figure out flags. We can then get rid of the 'myflags' parameter to various functions, and instead refer to client for everything we need (we still have to perform a bitwise-OR for NBD_FLAG_SEND_DF during NBD_OPT_EXPORT_NAME and NBD_OPT_EXPORT_GO, but it's easier to see what is being computed). This lets us quit advertising senseless flags for read-only images, as well as making the next patch for exposing FAST_ZERO support easier to write. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20190823143726.27062-2-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> [eblake: improve commit message, update iotest 223]	2019-09-05 16:02:54 -05:00
Eric Blake	df18c04edf	nbd: Use g_autofree in a few places Thanks to our recent move to use glib's g_autofree, I can join the bandwagon. Getting rid of gotos is fun ;) There are probably more places where we could register cleanup functions and get rid of more gotos; this patch just focuses on the labels that existed merely to call g_free. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20190824172813.29720-2-eblake@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>	2019-09-05 15:52:45 -05:00
Eric Blake	61cc872456	nbd: Advertise multi-conn for shared read-only connections The NBD specification defines NBD_FLAG_CAN_MULTI_CONN, which can be advertised when the server promises cache consistency between simultaneous clients (basically, rules that determine what FUA and flush from one client are able to guarantee for reads from another client). When we don't permit simultaneous clients (such as qemu-nbd without -e), the bit makes no sense; and for writable images, we probably have a lot more work before we can declare that actions from one client are cache-consistent with actions from another. But for read-only images, where flush isn't changing any data, we might as well advertise multi-conn support. What's more, advertisement of the bit makes it easier for clients to determine if 'qemu-nbd -e' was in use, where a second connection will succeed rather than hang until the first client goes away. This patch affects qemu as server in advertising the bit. We may want to consider patches to qemu as client to attempt parallel connections for higher throughput by spreading the load over those connections when a server advertises multi-conn, but for now sticking to one connection per nbd:// BDS is okay. See also: https://bugzilla.redhat.com/1708300 Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20190815185024.7010-1-eblake@redhat.com> [eblake: tweak blockdev-nbd.c to not request shared when writable, fix iotest 233] Reviewed-by: John Snow <jsnow@redhat.com>	2019-09-05 15:51:55 -05:00
John Snow	28636b8211	block/dirty-bitmap: add bdrv_dirty_bitmap_get Add a public interface for get. While we're at it, rename "bdrv_get_dirty_bitmap_locked" to "bdrv_dirty_bitmap_get_locked". (There are more functions to rename to the bdrv_dirty_bitmap_VERB form, but they will wait until the conclusion of this series.) Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20190709232550.10724-11-jsnow@redhat.com Signed-off-by: John Snow <jsnow@redhat.com>	2019-08-16 16:28:02 -04:00
Peter Maydell	c6a2225a5a	nbd patches for 2019-08-15 - Addition of InetSocketAddress keep-alive - Addition of BDRV_REQ_PREFETCH for more efficient copy-on-read - Initial refactoring in preparation of NBD reconnect -----BEGIN PGP SIGNATURE----- iQEcBAABCAAGBQJdVaRZAAoJEKeha0olJ0NqrGoIAJSvVLMDeWZIkHr3CQ5AbMHy 6IHUntBwv4PEHw0FyyDU7lLgEWubTwe/7RfvyJ69kQYSJLjvHa3KEic0aa7SOETK hGUlSoIFHEugi+XDcYyy9EG+ItUR7jnunkwomxvFRm4XzjEHFO9ck8fOS+uq/23e LGDHwdoZI6vawUPftbBuRAlB3egCEcBtTWXYMk8lm3MXHOHL7O18DRkfWvwcHfl6 mNIKgTVMtl1gYoJznCUmC5VLHL4jQy+kSNXnyHBQOEEvTcORu0EztJS81H+BODni sxa9seem7JL9NLUTmkJsbGfSM6RKdfypX34oik9yakqUnXRrlxkxI+IX26XfdQ4= =2MAO -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/ericb/tags/pull-nbd-2019-08-15' into staging nbd patches for 2019-08-15 - Addition of InetSocketAddress keep-alive - Addition of BDRV_REQ_PREFETCH for more efficient copy-on-read - Initial refactoring in preparation of NBD reconnect # gpg: Signature made Thu 15 Aug 2019 19:28:41 BST # gpg: using RSA key A7A16B4A2527436A # gpg: Good signature from "Eric Blake <eblake@redhat.com>" [full] # gpg: aka "Eric Blake (Free Software Programmer) <ebb9@byu.net>" [full] # gpg: aka "[jpeg image of size 6874]" [full] # Primary key fingerprint: 71C2 CC22 B1C4 6029 27D2 F3AA A7A1 6B4A 2527 436A * remotes/ericb/tags/pull-nbd-2019-08-15: block/nbd: refactor nbd connection parameters block/nbd: add cmdline and qapi parameter reconnect-delay block/nbd: move from quit to state block/nbd: use non-blocking io channel for nbd negotiation block/nbd: split connection_co start out of nbd_client_connect nbd: improve CMD_CACHE: use BDRV_REQ_PREFETCH block/stream: use BDRV_REQ_PREFETCH block: implement BDRV_REQ_PREFETCH qapi: Add InetSocketAddress member keep-alive Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2019-08-16 15:53:37 +01:00
Markus Armbruster	dc5e9ac716	Include qemu/queue.h slightly less Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20190812052359.30071-20-armbru@redhat.com>	2019-08-16 13:31:52 +02:00
Vladimir Sementsov-Ogievskiy	7fa5c5657f	nbd: improve CMD_CACHE: use BDRV_REQ_PREFETCH This helps to avoid extra io, allocations and memory copying. We assume here that CMD_CACHE is always used with copy-on-read, as otherwise it's a noop. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20190725100550.33801-4-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2019-08-15 13:22:13 -05:00
Eric Blake	416e34bd04	nbd/server: Nicer spelling of max BLOCK_STATUS reply length Commit `3d068aff` (3.0) introduced NBD_MAX_BITMAP_EXTENTS as a limit on how large we would allow a reply to NBD_CMD_BLOCK_STATUS to grow when it is visiting a qemu:dirty-bitmap: context. Later, commit `fb7afc79` (3.1) reused the constant to limit base:allocation context replies, although the name is now less appropriate in that situation. Rename things, and improve the macro to use units.h for better legibility. Then reformat the comment to comply with checkpatch rules added in the meantime. No semantic change. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20190510151735.29687-1-eblake@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>	2019-06-13 08:56:10 -05:00
Kevin Wolf	d861ab3acf	block: Add BlockBackend.ctx This adds a new parameter to blk_new() which requires its callers to declare from which AioContext this BlockBackend is going to be used (or the locks of which AioContext need to be taken anyway). The given context is only stored and kept up to date when changing AioContexts. Actually applying the stored AioContext to the root node is saved for another commit. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-06-04 15:22:22 +02:00
Kevin Wolf	45e92a9011	nbd-server: Call blk_set_allow_aio_context_change() The NBD server uses an AioContext notifier, so it can tolerate that its BlockBackend is switched to a different AioContext. Before we start actually calling bdrv_try_set_aio_context(), which checks for consistency, outside of test cases, we need to make sure that the NBD server actually allows this. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2019-06-04 15:22:22 +02:00
Eric Blake	099fbcd65c	nbd/server: Don't fail NBD_OPT_INFO for byte-aligned sources In commit `0c1d50bd`, I added a couple of TODO comments about whether we consult bl.request_alignment when responding to NBD_OPT_INFO. At the time, qemu as server was hard-coding an advertised alignment of 512 to clients that promised to obey constraints, and there was no function for getting at a device's preferred alignment. But in hindsight, advertising 512 when the block device prefers 1 caused other compliance problems, and commit `b0245d64` changed one of the two TODO comments to advertise a more accurate alignment. Time to fix the other TODO. Doesn't really impact qemu as client (our normal client doesn't use NBD_OPT_INFO, and qemu-nbd --list promises to obey block sizes), but it might prove useful to other clients. Fixes: `b0245d64` Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20190403030526.12258-4-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>	2019-04-08 13:49:25 -05:00
Eric Blake	6e280648d2	nbd/server: Trace client noncompliance on unaligned requests We've recently added traces for clients to flag server non-compliance; let's do the same for servers to flag client non-compliance. According to the spec, if the client requests NBD_INFO_BLOCK_SIZE, it is promising to send all requests aligned to those boundaries. Of course, if the client does not request NBD_INFO_BLOCK_SIZE, then it made no promises so we shouldn't flag anything; and because we are willing to handle clients that made no promises (the spec allows us to use NBD_REP_ERR_BLOCK_SIZE_REQD if we had been unwilling), we already have to handle unaligned requests (which the block layer already does on our behalf). So even though the spec allows us to return EINVAL for clients that promised to behave, it's easier to always answer unaligned requests. Still, flagging non-compliance can be useful in debugging a client that is trying to be maximally portable. Qemu as client used to have one spot where it sent non-compliant requests: if the server sends an unaligned reply to NBD_CMD_BLOCK_STATUS, and the client was iterating over the entire disk, the next request would start at that unaligned point; this was fixed in commit `a39286dd` when the client was taught to work around server non-compliance; but is equally fixed if the server is patched to not send unaligned replies in the first place (yes, qemu 4.0 as server still has few such bugs, although they will be patched in 4.1). Fortunately, I did not find any more spots where qemu as client was non-compliant. I was able to test the patch by using the following hack to convince qemu-io to run various unaligned commands, coupled with serving 512-byte alignment by intentionally omitting '-f raw' on the server while viewing server traces. \| diff --git i/nbd/client.c w/nbd/client.c \| index 427980bdd22..1858b2aac35 100644 \| --- i/nbd/client.c \| +++ w/nbd/client.c \| @@ -449,6 +449,7 @@ static int nbd_opt_info_or_go(QIOChannel *ioc, uint32_t opt, \| nbd_send_opt_abort(ioc); \| return -1; \| } \| + info->min_block = 1;//hack \| if (!is_power_of_2(info->min_block)) { \| error_setg(errp, "server minimum block size %" PRIu32 \| " is not a power of two", info->min_block); Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20190403030526.12258-3-eblake@redhat.com> [eblake: address minor review nits] Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>	2019-04-08 13:42:24 -05:00
Eric Blake	2178a569be	nbd/server: Fix blockstatus trace Don't increment remaining_bytes until we know that we will actually be including the current block status extent in the reply; otherwise, the value traced will include a bytes value that is oversized by the length of the next block status extent which did not get sent because it instead ended the loop. Fixes: `fb7afc79` Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20190403030526.12258-2-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>	2019-04-08 13:36:04 -05:00
Eric Blake	b0245d6478	nbd/server: Advertise actual minimum block size Both NBD_CMD_BLOCK_STATUS and structured NBD_CMD_READ will split their reply according to bdrv_block_status() boundaries. If the block device has a request_alignment smaller than 512, but we advertise a block alignment of 512 to the client, then this can result in the server reply violating client expectations by reporting a smaller region of the export than what the client is permitted to address (although this is less of an issue for qemu 4.0 clients, given recent client patches to overlook our non-compliance at EOF). Since it's always better to be strict in what we send, it is worth advertising the actual minimum block limit rather than blindly rounding it up to 512. Note that this patch is not foolproof - it is still possible to provoke non-compliant server behavior using: $ qemu-nbd --image-opts driver=blkdebug,align=512,image.driver=file,image.filename=/path/to/non-aligned-file That is arguably a bug in the blkdebug driver (it should never pass back block status smaller than its alignment, even if it has to make multiple bdrv_get_status calls and determine the least-common-denominator status among the group to return). It may also be possible to observe issues with a backing layer with smaller alignment than the active layer, although so far I have been unable to write a reliable iotest for that scenario (but again, an issue like that could be argued to be a bug in the block layer, or something where we need a flag to bdrv_block_status() to state whether the result must be aligned to the current layer's limits or can be subdivided for accuracy when chasing backing files). Anyways, as blkdebug is not normally used, and as this patch makes our server more interoperable with qemu 3.1 clients, it is worth applying now, even while we still work on a larger patch series for the 4.1 timeframe to have byte-accurate file lengths. Note that the iotests output changes - for 223 and 233, we can see the server's better granularity advertisement; and for 241, the three test cases have the following effects: - natural alignment: the server's smaller alignment is now advertised, and the hole reported at EOF is now the right result; we've gotten rid of the server's non-compliance - forced server alignment: the server still advertises 512 bytes, but still sends a mid-sector hole. This is still a server compliance bug, which needs to be fixed in the block layer in a later patch; output does not change because the client is already being tolerant of the non-compliance - forced client alignment: the server's smaller alignment means that the client now sees the server's status change mid-sector without any protocol violations, but the fact that the map shows an unaligned mid-sector hole is evidence of the block layer problems with aligned block status, to be fixed in a later patch Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20190329042750.14704-7-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> [eblake: rebase to enhanced iotest 241 coverage]	2019-04-01 08:52:28 -05:00
John Snow	3ae96d6684	block/dirty-bitmaps: add block_dirty_bitmap_check function Instead of checking against busy, inconsistent, or read only directly, use a check function with permissions bits that let us streamline the checks without reproducing them in many places. Included in this patch are permissions changes that simply add the inconsistent check to existing permissions call spots, without addressing existing bugs. In general, this means that busy+readonly checks become BDRV_BITMAP_DEFAULT, which checks against all three conditions. busy-only checks become BDRV_BITMAP_ALLOW_RO. Notably, remove allows inconsistent bitmaps, so it doesn't follow the pattern. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20190301191545.8728-4-jsnow@redhat.com Signed-off-by: John Snow <jsnow@redhat.com>	2019-03-12 12:05:49 -04:00
John Snow	27a1b301a4	block/dirty-bitmaps: unify qmp_locked and user_locked calls These mean the same thing now. Unify them and rename the merged call bdrv_dirty_bitmap_busy to indicate semantically what we are describing, as well as help disambiguate from the various _locked and _unlocked versions of bitmap helpers that refer to mutex locks. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20190223000614.13894-8-jsnow@redhat.com Signed-off-by: John Snow <jsnow@redhat.com>	2019-03-12 12:05:48 -04:00
John Snow	3b78a92776	nbd: change error checking order for bitmaps Check that the bitmap is not in use prior to it checking if it is not enabled/recording guest writes. The bitmap being busy was likely at the behest of the user, so this error has a greater chance of being understood by the user. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20190223000614.13894-6-jsnow@redhat.com Signed-off-by: John Snow <jsnow@redhat.com>	2019-03-12 12:05:48 -04:00
Daniel P. Berrange	b25e12daff	qemu-nbd: add support for authorization of TLS clients Currently any client which can complete the TLS handshake is able to use the NBD server. The server admin can turn on the 'verify-peer' option for the x509 creds to require the client to provide a x509 certificate. This means the client will have to acquire a certificate from the CA before they are permitted to use the NBD server. This is still a fairly low bar to cross. This adds a '--tls-authz OBJECT-ID' option to the qemu-nbd command which takes the ID of a previously added 'QAuthZ' object instance. This will be used to validate the client's x509 distinguished name. Clients failing the authorization check will not be permitted to use the NBD server. For example to setup authorization that only allows connection from a client whose x509 certificate distinguished name is CN=laptop.example.com,O=Example Org,L=London,ST=London,C=GB escape the commas in the name and use: qemu-nbd --object tls-creds-x509,id=tls0,dir=/home/berrange/qemutls,\ endpoint=server,verify-peer=yes \ --object 'authz-simple,id=auth0,identity=CN=laptop.example.com,,\ O=Example Org,,L=London,,ST=London,,C=GB' \ --tls-creds tls0 \ --tls-authz authz0 \ ....other qemu-nbd args... NB: a real shell command line would not have leading whitespace after the line continuation, it is just included here for clarity. Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Message-Id: <20190227162035.18543-2-berrange@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> [eblake: split long line in --help text, tweak 233 to show that whitespace after ,, in identity= portion is actually okay] Signed-off-by: Eric Blake <eblake@redhat.com>	2019-03-06 11:05:27 -06:00
Eric Blake	269ee27e99	nbd/server: Kill pointless shadowed variable lgtm.com pointed out that commit `678ba275` introduced a shadowed declaration of local variable 'bs'; thankfully, the inner 'bs' obtained by 'blk_bs(blk)' matches the outer one given that we had 'blk_insert_bs(blk, bs, errp)' a few lines earlier, and there are no later uses of 'bs' beyond the scope of the 'if (bitmap)' to care if we change the value stored in 'bs' while traveling the backing chain to find a bitmap. So simply get rid of the extra declaration. Reported-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20190207191357.6665-1-eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2019-02-11 14:35:43 -06:00
Vladimir Sementsov-Ogievskiy	e6798f06a6	nbd: generalize usage of nbd_read We generally do very similar things around nbd_read: error_prepend specifying what we have tried to read, and be_to_cpu conversion of integers. So, it seems reasonable to move common things to helper functions, which: 1. simplify code a bit 2. generalize nbd_read error descriptions, all starting with "Failed to read" 3. make it more difficult to forget to convert things from BE Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20190128165830.165170-1-vsementsov@virtuozzo.com> [eblake: rename macro to DEF_NBD_READ_N and formatting tweaks; checkpatch has false positive complaint] Signed-off-by: Eric Blake <eblake@redhat.com>	2019-02-04 15:11:27 -06:00
Eric Blake	9d26dfcbab	nbd/server: Favor [u]int64_t over off_t Although our compile-time environment is set up so that we always support long files with 64-bit off_t, we have no guarantee whether off_t is the same type as int64_t. This requires casts when printing values, and prevents us from directly using qemu_strtoi64() (which will be done in the next patch). Let's just flip to uint64_t where possible, and stick to int64_t for detecting failure of blk_getlength(); we also keep the assertions added in the previous patch that the resulting values fit in 63 bits. The overflow check in nbd_co_receive_request() was already sane (request->from is validated to fit in 63 bits, and request->len is 32 bits, so the addition can't overflow 64 bits), but rewrite it in a form easier to recognize as a typical overflow check. Rename the variable 'description' to keep line lengths reasonable. Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20190117193658.16413-7-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>	2019-01-21 15:49:51 -06:00
Eric Blake	7596bbb390	nbd/server: Hoist length check to qmp_nbd_server_add We only had two callers to nbd_export_new; qemu-nbd.c always passed a valid offset/length pair (because it already checked the file length, to ensure that offset was in bounds), while blockdev-nbd.c always passed 0/-1. Then nbd_export_new reduces the size to a multiple of BDRV_SECTOR_SIZE (can only happen when offset is not sector-aligned, since bdrv_getlength() currently rounds up) (someday, it would be nice to have byte-accurate lengths - but not today). However, I'm finding it easier to work with the code if we are consistent on having both callers pass in a valid length, and just assert that things are sane in nbd_export_new, meaning that no negative values were passed, and that offset+size does not exceed 63 bits (as that really is a fundamental limit to later operations, whether we use off_t or uint64_t). Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20190117193658.16413-6-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>	2019-01-21 15:49:51 -06:00
Vladimir Sementsov-Ogievskiy	76d570dc49	dirty-bitmap: improve bdrv_dirty_bitmap_next_zero Add bytes parameter to the function, to limit searched range. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>	2019-01-15 18:26:49 -05:00
Eric Blake	678ba275c7	nbd: Merge nbd_export_bitmap into nbd_export_new We only have one caller that wants to export a bitmap name, which it does right after creation of the export. But there is still a brief window of time where an NBD client could see the export but not the dirty bitmap, which a robust client would have to interpret as meaning the entire image should be treated as dirty. Better is to eliminate the window entirely, by inlining nbd_export_bitmap() into nbd_export_new(), and refusing to create the bitmap in the first place if the requested bitmap can't be located. We also no longer need logic for setting a different bitmap name compared to the bitmap being exported. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20190111194720.15671-8-eblake@redhat.com>	2019-01-14 10:09:46 -06:00
Eric Blake	3fa4c76590	nbd: Merge nbd_export_set_name into nbd_export_new The existing NBD code had a weird split where nbd_export_new() created an export but did not add it to the list of exported names until a later nbd_export_set_name() came along and grabbed a second reference on the object; later, the first call to nbd_export_close() drops the second reference while removing the export from the list. This is in part because the QAPI NbdServerRemoveNode enum documents the possibility of adding a mode where we could do a soft disconnect: preventing new clients, but waiting for existing clients to gracefully quit, based on the mode used when calling nbd_export_close(). But in spite of all that, note that we never change the name of an NBD export while it is exposed, which means it is easier to just inline the process of setting the name as part of creating the export. Inline the contents of nbd_export_set_name() and nbd_export_set_description() into the two points in an export lifecycle where they matter, then adjust both callers to pass the name up front. Note that for creation, all callers pass a non-NULL name, (passing NULL at creation was for old style servers, but we removed support for that in commit `7f7dfe2a`), so we can add an assert and do things unconditionally; but for cleanup, because of the dual nature of nbd_export_close(), we still have to be careful to avoid use-after-free. Along the way, add a comment reminding ourselves of the potential of adding a middle mode disconnect. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20190111194720.15671-5-eblake@redhat.com>	2019-01-14 10:09:46 -06:00
Eric Blake	702aa50d61	nbd: Only require disabled bitmap for read-only exports Our initial implementation of x-nbd-server-add-bitmap put in a restriction because of incremental backups: in that usage, we are exporting one qcow2 file (the temporary overlay target of a blockdev-backup sync:none job) and a dirty bitmap owned by a second qcow2 file (the source of the blockdev-backup, which is the backing file of the temporary). While both qcow2 files are still writable (the target in order to capture copy-on-write of old contents, and the source in order to track live guest writes in the meantime), the NBD client expects to see constant data, including the dirty bitmap. An enabled bitmap in the source would be modified by guest writes, which is at odds with the NBD export being a read-only constant view, hence the initial code choice of enforcing a disabled bitmap (the intent is that the exposed bitmap was disabled in the same transaction that started the blockdev-backup job, although we don't want to track enough state to actually enforce that). However, consider the case of a bitmap contained in a read-only node (including when the bitmap is found in a backing layer of the active image). Because the node can't be modified, the bitmap won't change due to writes, regardless of whether it is still enabled. Forbidding the export unless the bitmap is disabled is awkward, paritcularly since we can't change the bitmap to be disabled (because the node is read-only). Alternatively, consider the case of live storage migration, where management directs the destination to create a writable NBD server, then performs a drive-mirror from the source to the target, prior to doing the rest of the live migration. Since storage migration can be time-consuming, it may be wise to let the destination include a dirty bitmap to track which portions it has already received, where even if the migration is interrupted and restarted, the source can query the destination block status in order to potentially minimize re-sending data that has not changed in the meantime on a second attempt. Such code has not been written, and might not be trivial (after all, a cluster being marked dirty in the bitmap does not necessarily guarantee it has the desired contents), but it makes sense that letting an active dirty bitmap be exposed and changing alongside writes may prove useful in the future. Solve both issues by gating the restriction against a disabled bitmap to only happen when the caller has requested a read-only export, and where the BDS that owns the bitmap (whether or not it is the BDS handed to nbd_export_new() or from its backing chain) is still writable. We could drop the check altogether (if management apps are prepared to deal with a changing bitmap even on a read-only image), but for now keeping a check for the read-only case still stands a chance of preventing management errors. Update iotest 223 to show the looser behavior by leaving a bitmap enabled the whole run; note that we have to tear down and re-export a node when handling an error. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20190111194720.15671-4-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>	2019-01-14 10:09:46 -06:00
Eric Blake	e31d802479	nbd/server: Advertise all contexts in response to bare LIST The NBD spec, and even our code comment, says that if the client asks for NBD_OPT_LIST_META_CONTEXT with 0 queries, then we should reply with (a possibly-compressed representation of) ALL contexts that we are willing to let them try. But commit `3d068aff` forgot to advertise qemu:dirty-bitmap:FOO. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20181130023232.3079982-2-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>	2018-11-30 13:55:18 -06:00
Eric Blake	3e99ebb9d3	nbd/server: Ignore write errors when replying to NBD_OPT_ABORT Commit `37ec36f6` intentionally ignores errors when trying to reply to an NBD_OPT_ABORT request for plaintext clients, but did not make the same change for a TLS server. Since NBD_OPT_ABORT is documented as being a potential for an EPIPE when the client hangs up without waiting for our reply, we don't need to pollute the server's output with that failure. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20181117223221.2198751-1-eblake@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>	2018-11-19 11:16:46 -06:00
Daniel P. Berrangé	0b0bb124bb	nbd: fix whitespace in server error message A space was missing after the option number was printed: Option 0x8not permitted before TLS becomes Option 0x8 not permitted before TLS This fixes commit `3668328303` Author: Eric Blake <eblake@redhat.com> Date: Fri Oct 14 13:33:09 2016 -0500 nbd: Send message along with server NBD_REP_ERR errors Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <20181116155325.22428-2-berrange@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> [eblake: move lone space to next line] Signed-off-by: Eric Blake <eblake@redhat.com>	2018-11-19 10:08:19 -06:00
John Snow	d9782022bd	nbd: forbid use of frozen bitmaps Whether it's "locked" or "frozen", it's in use and should not be allowed for the purposes of this operation. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20181002230218.13949-7-jsnow@redhat.com Signed-off-by: John Snow <jsnow@redhat.com>	2018-10-29 16:23:16 -04:00
Vladimir Sementsov-Ogievskiy	7f7dfe2a53	nbd/server: drop old-style negotiation After the previous commit, nbd_client_new's first parameter is always NULL. Let's drop it with all corresponding old-style negotiation code path which is unreachable now. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20181003170228.95973-3-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> [eblake: re-wrap short line] Signed-off-by: Eric Blake <eblake@redhat.com>	2018-10-03 15:52:32 -05:00
Vladimir Sementsov-Ogievskiy	2f454defc2	nbd/server: fix NBD_CMD_CACHE We should not go to structured-read branch on CACHE command, fix that. Bug introduced in `bc37b06a5c` "nbd/server: introduce NBD_CMD_CACHE" with the whole feature and affects 3.0.0 release. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> CC: qemu-stable@nongnu.org Message-Id: <20181003144738.70670-1-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> [eblake: commit message typo fix] Signed-off-by: Eric Blake <eblake@redhat.com>	2018-10-03 09:58:43 -05:00
Peter Maydell	80c7c2b00d	nbd: Don't take address of fields in packed structs Taking the address of a field in a packed struct is a bad idea, because it might not be actually aligned enough for that pointer type (and thus cause a crash on dereference on some host architectures). Newer versions of clang warn about this. Avoid the bug by not using the "modify in place" byte swapping functions. This patch was produced with the following spatch script: @@ expression E; @@ -be16_to_cpus(&E); +E = be16_to_cpu(E); @@ expression E; @@ -be32_to_cpus(&E); +E = be32_to_cpu(E); @@ expression E; @@ -be64_to_cpus(&E); +E = be64_to_cpu(E); @@ expression E; @@ -cpu_to_be16s(&E); +E = cpu_to_be16(E); @@ expression E; @@ -cpu_to_be32s(&E); +E = cpu_to_be32(E); @@ expression E; @@ -cpu_to_be64s(&E); +E = cpu_to_be64(E); Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Message-Id: <20180927164200.15097-1-peter.maydell@linaro.org> Reviewed-by: Eric Blake <eblake@redhat.com> [eblake: rebase, and squash in missed changes] Signed-off-by: Eric Blake <eblake@redhat.com>	2018-10-03 09:58:43 -05:00
Vladimir Sementsov-Ogievskiy	fb7afc797e	nbd/server: send more than one extent of base:allocation context This is necessary for efficient block-status export, for clients which support it. (qemu is not yet such a client, but could become one.) Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20180704112302.471456-3-vsementsov@virtuozzo.com> [eblake: grammar tweaks] Signed-off-by: Eric Blake <eblake@redhat.com>	2018-09-26 21:37:48 -05:00
Vladimir Sementsov-Ogievskiy	6545916d52	nbd/server: fix bitmap export bitmap_to_extents function is broken: it switches dirty variable after every iteration, however it can process only part of dirty (or zero) area during one iteration in case when this area is too large for one extent. Fortunately, the bug doesn't produce wrong extent flags: it just inserts a zero-length extent between sequential extents representing large dirty (or zero) area. However, zero-length extents are forbidden by the NBD protocol. So, a careful client should consider such a reply as a server fault, while a less-careful will likely ignore zero-length extents. The bug can only be triggered by a client that requests block status for nearly 4G at once (a request of 4G and larger is impossible per the protocol, and requests smaller than 4G less the bitmap granularity cause the loop to quit iterating rather than revisit the tail of the large area); it also cannot trigger if the client used the NBD_CMD_FLAG_REQ_ONE flag. Since qemu 3.0 as client (using the x-dirty-bitmap extension) always passes the flag, it is immune; and we are not aware of other open-source clients that know how to request qemu:dirty-bitmap:FOO contexts. Clients that want to avoid the bug could cap block status requests to a smaller length, such as 2G or 3G. Fix this by more careful handling of dirty variable. Bug was introduced in `3d068aff16` "nbd/server: implement dirty bitmap export", with the whole function. and is present in v3.0.0 release. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20180914165116.23182-1-vsementsov@virtuozzo.com> CC: qemu-stable@nongnu.org Reviewed-by: Eric Blake <eblake@redhat.com> [eblake: improved commit message] Signed-off-by: Eric Blake <eblake@redhat.com>	2018-09-26 10:08:55 -05:00
Vladimir Sementsov-Ogievskiy	0c0eaed147	nbd/server: fix nbd_co_send_block_status Call nbd_co_send_extents() with correct length parameter (extent.length may be smaller than original length). Also, switch length parameter type to uint32_t, to correspond with request->len and similar nbd_co_send_bitmap(). Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20180704112302.471456-2-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2018-07-07 20:30:09 -05:00
Eric Blake	7606c99a04	nbd/server: Fix dirty bitmap logic regression In my hurry to fix a build failure, I introduced a logic bug. The assertion conditional is backwards, meaning that qemu will now abort instead of reporting dirty bitmap status. The bug can only be tickled by an NBD client using an exported dirty bitmap (which is still an experimental QMP command), so it's not the end of the world for supported usage (and neither 'make check' nor qemu-iotests fails); but it also shows that we really want qemu-io support for reading dirty bitmaps if only so that I can add iotests coverage to prevent future brown-bag-of-shame events like this one. Fixes: `45eb6fb6` Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20180622153509.375130-1-eblake@redhat.com>	2018-07-02 14:27:48 -05:00
Eric Blake	45eb6fb6ce	nbd/server: Silence gcc false positive The code has a while() loop that always initialized 'end', and the loop always executes at least once (as evidenced by the assert() just prior to the loop). But some versions of gcc still complain that 'end' is used uninitialized, so silence them. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Message-id: 20180622125814.345274-1-eblake@redhat.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2018-06-22 14:18:36 +01:00
Vladimir Sementsov-Ogievskiy	bc37b06a5c	nbd/server: introduce NBD_CMD_CACHE Handle nbd CACHE command. Just do read, without sending read data back. Cache mechanism should be done by exported node driver chain. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20180413143156.11409-1-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> [eblake: fix two missing case labels in switch statements] Signed-off-by: Eric Blake <eblake@redhat.com>	2018-06-21 09:41:39 -05:00
Vladimir Sementsov-Ogievskiy	3d068aff16	nbd/server: implement dirty bitmap export Handle a new NBD meta namespace: "qemu", and corresponding queries: "qemu:dirty-bitmap:<export bitmap name>". With the new metadata context negotiated, BLOCK_STATUS query will reply with dirty-bitmap data, converted to extents. The new public function nbd_export_bitmap selects which bitmap to export. For now, only one bitmap may be exported. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20180609151758.17343-5-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> [eblake: wording tweaks, minor cleanups, additional tracing] Signed-off-by: Eric Blake <eblake@redhat.com>	2018-06-21 09:23:58 -05:00
Vladimir Sementsov-Ogievskiy	b0769d8f8d	nbd/server: add nbd_meta_empty_or_pattern helper Add nbd_meta_pattern() and nbd_meta_empty_or_pattern() helpers for metadata query parsing. nbd_meta_pattern() will be reused for the "qemu" namespace in following patches. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20180609151758.17343-4-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> [eblake: comment tweaks] Signed-off-by: Eric Blake <eblake@redhat.com>	2018-06-21 09:23:58 -05:00
Vladimir Sementsov-Ogievskiy	af736e5467	nbd/server: refactor NBDExportMetaContexts Use NBDExport pointer instead of just export name: there is no need to store a duplicated name in the struct; moreover, NBDExport will be used further. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20180609151758.17343-3-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> [eblake: commit message grammar tweak] Signed-off-by: Eric Blake <eblake@redhat.com>	2018-06-21 09:23:58 -05:00
Vladimir Sementsov-Ogievskiy	dbb8b396bb	nbd/server: fix trace Return code = 1 doesn't mean that we parsed base:allocation. Use correct traces in both -parsed and -skipped cases. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20180609151758.17343-2-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> [eblake: comment tweaks] Signed-off-by: Eric Blake <eblake@redhat.com>	2018-06-21 09:23:58 -05:00
Eric Blake	d8b20291cb	nbd/server: Reject 0-length block status request The NBD spec says that behavior is unspecified if the client requests 0 length for block status; but since the structured reply is documenting as returning a non-zero length, it's easier to just diagnose this with an EINVAL error than to figure out what to return. CC: qemu-stable@nongnu.org Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20180621124937.166549-1-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>	2018-06-21 09:21:34 -05:00
Eric Blake	2b53af2523	nbd: trace meta context negotiation Having a more detailed log of the interaction between client and server is invaluable in debugging how meta context negotiation actually works. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20180330130950.1931229-1-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>	2018-04-02 09:10:49 -05:00
Vladimir Sementsov-Ogievskiy	e7b1948d51	nbd: BLOCK_STATUS for standard get_block_status function: server part Minimal realization: only one extent in server answer is supported. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20180312152126.286890-4-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> [eblake: tweak whitespace, move constant from .h to .c, improve logic of check_meta_export_name, simplify nbd_negotiate_options by doing more in nbd_negotiate_meta_queries] Signed-off-by: Eric Blake <eblake@redhat.com>	2018-03-13 15:38:56 -05:00
Vladimir Sementsov-Ogievskiy	12296459f4	nbd/server: add nbd_read_opt_name helper Add helper to read name in format: uint32 len (<= NBD_MAX_NAME_SIZE) len bytes string (not 0-terminated) The helper will be reused in following patch. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20180312152126.286890-3-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> [eblake: grammar fixes, actually check error] Signed-off-by: Eric Blake <eblake@redhat.com>	2018-03-13 15:38:55 -05:00
Vladimir Sementsov-Ogievskiy	2e425fd568	nbd/server: add nbd_opt_invalid helper NBD_REP_ERR_INVALID is often parameter to nbd_opt_drop and it would be used more in following patches. So, let's add a helper. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20180312152126.286890-2-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2018-03-13 15:38:55 -05:00
Eric Blake	65529782f8	nbd/server: Honor FUA request on NBD_CMD_TRIM The NBD spec states that since trim requests can affect disk contents, then they should allow for FUA semantics just like writes for ensuring the disk has settled before returning. As bdrv_[co_]pdiscard() does not support a flags argument, we can't pass FUA down the block layer stack, and must therefore emulate it with a flush at the NBD layer. Note that in all reality, generic well-behaved clients will never send TRIM+FUA (in fact, qemu as a client never does, and we have no intention to plumb flags into bdrv_pdiscard). This is because the NBD protocol states that it is unspecified to READ a trimmed area (you might read stale data, all zeroes, or even random unrelated data) without first rewriting it, and even the experimental BLOCK_STATUS extension states that TRIM need not affect reported status. Thus, in the general case, a client cannot tell the difference between an arbitrary server that ignores TRIM, a server that had a power outage without flushing to disk, and a server that actually affected the disk before returning; so waiting for the trim actions to flush to disk makes little sense. However, for a specific client and server pair, where the client knows the server treats TRIM'd areas as guaranteed reads-zero, waiting for a flush makes sense, hence why the protocol documents that FUA is valid on trim. So, even though the NBD protocol doesn't have a way for the server to advertise what effects (if any) TRIM will actually have, and thus any client that relies on specific effects is probably in error, we can at least support a client that requests TRIM+FUA. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20180307225732.155835-1-eblake@redhat.com>	2018-03-13 15:38:55 -05:00
Vladimir Sementsov-Ogievskiy	6f302e6093	nbd/server: refactor nbd_trip: split out nbd_handle_request Split out request handling logic. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20180308184636.178534-6-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> [eblake: touch up blank line placement] Signed-off-by: Eric Blake <eblake@redhat.com>	2018-03-13 15:38:55 -05:00
Vladimir Sementsov-Ogievskiy	6a4175997b	nbd/server: refactor nbd_trip: cmd_read and generic reply nbd_trip has difficult logic when sending replies: it tries to use one code path for all replies. It is ok for simple replies, but is not comfortable for structured replies. Also, two types of error (and corresponding messages in local_err) - fatal (leading to disconnect) and not-fatal (just to be sent to the client) are difficult to follow. To make things a bit clearer, the following is done: - split CMD_READ logic to separate function. It is the most difficult command for now, and it is definitely cramped inside nbd_trip. Also, it is difficult to follow CMD_READ logic, shared between "case NBD_CMD_READ" and "if"s under "reply:" label. - create separate helper function nbd_send_generic_reply() and use it both in new nbd_do_cmd_read and for other commands in nbd_trip instead of common code-path under "reply:" label in nbd_trip. The helper supports an error message, so logic with local_err in nbd_trip is simplified. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20180308184636.178534-5-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> [eblake: grammar tweaks and blank line placement] Signed-off-by: Eric Blake <eblake@redhat.com>	2018-03-13 15:38:55 -05:00
Vladimir Sementsov-Ogievskiy	a0d7ce20a9	nbd/server: fix: check client->closing before sending reply Since the unchanged code has just set client->recv_coroutine to NULL before calling nbd_client_receive_next_request(), we are spawning a new coroutine unconditionally, but the first thing that coroutine will do is check for client->closing, making it a no-op if we have already detected that the client is going away. Furthermore, for any error other than EIO (where we disconnect, which itself sets client->closing), if the client has already gone away, we'll probably encounter EIO later in the function and attempt disconnect at that point. Logically, as soon as we know the connection is closing, there is no need to try a likely-to-fail a response or spawn a no-op coroutine. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20180308184636.178534-4-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> [eblake: squash in further reordering: hoist check before spawning next coroutine, and document rationale in commit message] Signed-off-by: Eric Blake <eblake@redhat.com>	2018-03-13 15:38:55 -05:00
Vladimir Sementsov-Ogievskiy	37e02aebf8	nbd/server: fix sparse read In case of io error in nbd_co_send_sparse_read we should not "goto reply:", as it was a fatal error and the common behavior is to disconnect in this case. We should not try to send the client an additional error reply, since we already hit a channel-io error on our previous attempt to send one. Fix this by handling block-status error in nbd_co_send_sparse_read, so nbd_co_send_sparse_read fails only on io error. Then just skip common "reply:" code path in nbd_trip. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20180308184636.178534-3-vsementsov@virtuozzo.com> [eblake: grammar tweaks] Signed-off-by: Eric Blake <eblake@redhat.com>	2018-03-13 15:38:55 -05:00
Vladimir Sementsov-Ogievskiy	60ace2bacf	nbd/server: move nbd_co_send_structured_error up To be reused in nbd_co_send_sparse_read() in the following patch. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20180308184636.178534-2-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2018-03-13 15:38:55 -05:00
Peter Xu	1939ccdaa6	qio: non-default context for TLS handshake A new parameter "context" is added to qio_channel_tls_handshake() is to allow the TLS to be run on a non-default context. Still, no functional change. Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>	2018-03-06 10:19:07 +00:00
Vladimir Sementsov-Ogievskiy	28fb494f9b	nbd/client: fix error messages in nbd_handle_reply_err 1. NBD_REP_ERR_INVALID is not only about length, so, make message more general 2. hex format is not very good: it's hard to read something like "option a (set meta context)", so switch to dec. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <1518702707-7077-6-git-send-email-vsementsov@virtuozzo.com> [eblake: expand scope of patch: ALL uses of nbd_opt_lookup and nbd_rep_lookup are now decimal] Signed-off-by: Eric Blake <eblake@redhat.com>	2018-03-01 14:48:23 -06:00
Vladimir Sementsov-Ogievskiy	a3b0dc7582	qapi: add nbd-server-remove Add command for removing an export. It is needed for cases when we don't want to keep the export after the operation on it was completed. The other example is a temporary node, created with blockdev-add. If we want to delete it we should firstly remove any corresponding NBD export. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20180119135719.24745-3-vsementsov@virtuozzo.com> [eblake: drop dead nb_clients code] Signed-off-by: Eric Blake <eblake@redhat.com>	2018-01-26 09:37:20 -06:00
Vladimir Sementsov-Ogievskiy	1d17922a28	nbd/server: structurize option reply sending Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20171122101958.17065-6-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2018-01-17 20:14:12 -06:00
Eric Blake	894e02804c	nbd/server: Add helper functions for parsing option payload Rather than making every callsite perform length sanity checks and error reporting, add the helper functions nbd_opt_read() and nbd_opt_drop() that use the length stored in the client struct; also add an assertion that optlen is 0 before any option (ie. any previous option was fully handled), complementing the assertion added in an earlier patch that optlen is 0 after all negotiation completes. Note that the call in nbd_negotiate_handle_export_name() does not use the new helper (in part because the server cannot reply to NBD_OPT_EXPORT_NAME - it either succeeds or the connection drops). Based on patches by Vladimir Sementsov-Ogievskiy. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20180110230825.18321-6-eblake@redhat.com>	2018-01-17 20:14:12 -06:00
Eric Blake	41f5dfafbb	nbd/server: Add va_list form of nbd_negotiate_send_rep_err() This will be useful for the next patch. Based on a patch by Vladimir Sementsov-Ogievskiy Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20180110230825.18321-5-eblake@redhat.com>	2018-01-17 20:14:12 -06:00
Eric Blake	32f158a635	nbd/server: Better error for NBD_OPT_EXPORT_NAME failure When a client abruptly disconnects before we've finished reading the name sent with NBD_OPT_EXPORT_NAME, we are better off logging the failure as EIO (we can't communicate with the client), rather than EINVAL (the client sent bogus data). Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20180110230825.18321-4-eblake@redhat.com>	2018-01-17 20:14:12 -06:00
Vladimir Sementsov-Ogievskiy	0cfae925d2	nbd/server: refactor negotiation functions parameters Instead of passing currently negotiating option and its length to many of negotiation functions let's just store them on NBDClient struct to be state-variables of negotiation phase. This unifies semantics of negotiation functions and allows tracking changes of remaining option length in future patches. Asssert that optlen is back to 0 after negotiation (including old-style connections which don't negotiate), although we need more patches before we can assert optlen is 0 between options during negotiation. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20171122101958.17065-2-vsementsov@virtuozzo.com> [eblake: rebase, commit message tweak, assert !optlen after negotiation completes] Signed-off-by: Eric Blake <eblake@redhat.com>	2018-01-17 20:14:12 -06:00
Eric Blake	a16a790770	nbd/server: Hoist nbd_reject_length() earlier No semantic change, but will make it easier for an upcoming patch to refactor code without having to add forward declarations. Fix a poor comment while at it. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20180110230825.18321-2-eblake@redhat.com>	2018-01-17 20:14:12 -06:00
Vladimir Sementsov-Ogievskiy	9156245ec4	nbd/server: add additional assert to nbd_export_put This place is not obvious, nbd_export_close may theoretically reduce refcount to 0. It may happen if someone calls nbd_export_put on named export not through nbd_export_set_name when refcount is 1. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20171207155102.66622-2-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2018-01-09 12:53:44 -06:00
Eric Blake	e2de3256c3	nbd/server: Optimize final chunk of sparse read If we are careful to handle 0-length read requests correctly, we can optimize our sparse read to send the NBD_REPLY_FLAG_DONE bit on our last OFFSET_DATA or OFFSET_HOLE chunk rather than needing a separate chunk. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20171107030912.23930-3-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>	2018-01-08 09:12:23 -06:00
Eric Blake	418638d3e4	nbd/server: Implement sparse reads atop structured reply The reason that NBD added structured reply in the first place was to allow for efficient reads of sparse files, by allowing the reply to include chunks to quickly communicate holes to the client without sending lots of zeroes over the wire. Time to implement this in the server; our client can already read such data. We can only skip holes insofar as the block layer can query them; and only if the client is okay with a fragmented request (if a client requests NBD_CMD_FLAG_DF and the entire read is a hole, we could technically return a single NBD_REPLY_TYPE_OFFSET_HOLE, but that's a fringe case not worth catering to here). Sadly, the control flow is a bit wonkier than I would have preferred, but it was minimally invasive to have a split in the action between a fragmented read (handled directly where we recognize NBD_CMD_READ with the right conditions, and sending multiple chunks) vs. a single read (handled at the end of nbd_trip, for both simple and structured replies, when we know there is only one thing being read). Likewise, I didn't make any effort to optimize the final chunk of a fragmented read to set the NBD_REPLY_FLAG_DONE, but unconditionally send that as a separate NBD_REPLY_TYPE_NONE. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20171107030912.23930-2-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>	2018-01-08 09:12:23 -06:00
Eric Blake	51ae4f8455	nbd/server: CVE-2017-15118 Stack smash on large export name Introduced in commit `f37708f6b8` (2.10). The NBD spec says a client can request export names up to 4096 bytes in length, even though they should not expect success on names longer than 256. However, qemu hard-codes the limit of 256, and fails to filter out a client that probes for a longer name; the result is a stack smash that can potentially give an attacker arbitrary control over the qemu process. The smash can be easily demonstrated with this client: $ qemu-io f raw nbd://localhost:10809/$(printf %3000d 1 \| tr ' ' a) If the qemu NBD server binary (whether the standalone qemu-nbd, or the builtin server of QMP nbd-server-start) was compiled with -fstack-protector-strong, the ability to exploit the stack smash into arbitrary execution is a lot more difficult (but still theoretically possible to a determined attacker, perhaps in combination with other CVEs). Still, crashing a running qemu (and losing the VM) is bad enough, even if the attacker did not obtain full execution control. CC: qemu-stable@nongnu.org Signed-off-by: Eric Blake <eblake@redhat.com>	2017-11-28 06:58:01 -06:00
Eric Blake	fdad35ef6c	nbd/server: CVE-2017-15119 Reject options larger than 32M The NBD spec gives us permission to abruptly disconnect on clients that send outrageously large option requests, rather than having to spend the time reading to the end of the option. No real option request requires that much data anyways; and meanwhile, we already have the practice of abruptly dropping the connection on any client that sends NBD_CMD_WRITE with a payload larger than 32M. For comparison, nbdkit drops the connection on any request with more than 4096 bytes; however, that limit is probably too low (as the NBD spec states an export name can theoretically be up to 4096 bytes, which means a valid NBD_OPT_INFO could be even longer) - even if qemu doesn't permit exports longer than 256 bytes. It could be argued that a malicious client trying to get us to read nearly 4G of data on a bad request is a form of denial of service. In particular, if the server requires TLS, but a client that does not know the TLS credentials sends any option (other than NBD_OPT_STARTTLS or NBD_OPT_EXPORT_NAME) with a stated payload of nearly 4G, then the server was keeping the connection alive trying to read all the payload, tying up resources that it would rather be spending on a client that can get past the TLS handshake. Hence, this warranted a CVE. Present since at least 2.5 when handling known options, and made worse in 2.6 when fixing support for NBD_FLAG_C_FIXED_NEWSTYLE to handle unknown options. CC: qemu-stable@nongnu.org Signed-off-by: Eric Blake <eblake@redhat.com>	2017-11-28 06:42:26 -06:00
Eric Blake	fed5f8f820	nbd/server: Fix error reporting for bad requests The NBD spec says an attempt to NBD_CMD_TRIM on a read-only export should fail with EPERM, as a trim has the potential to change disk contents, but we were relying on the block layer to catch that for us, which might not always give the right error (and even if it does, it does not let us pass back a sane message for structured replies). The NBD spec says an attempt to NBD_CMD_WRITE_ZEROES out of bounds should fail with ENOSPC, not EINVAL. Our check for u64 offset + u32 length wraparound up front is pointless; nothing uses offset until after the second round of sanity checks, and we can just as easily ensure there is no wraparound by checking whether offset is in bounds (since a disk size cannot exceed off_t which is 63 bits, adding a 32-bit number for a valid offset can't overflow). Bonus: dropping the up-front check lets us keep the connection alive after NBD_CMD_WRITE, whereas before we would drop the connection (of course, any client sending a packet that would trigger the failure is already buggy, so it's also okay to drop the connection, but better quality-of-implementation never hurts). Solve all of these issues by some code motion and improved request validation. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20171115213557.3548-1-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>	2017-11-17 08:38:38 -06:00
Eric Blake	ef8c887ee0	nbd/server: Fix structured read of length 0 The NBD spec was recently clarified to state that a read of length 0 should not be attempted by a compliant client; but that a server must still handle it correctly in an unspecified manner (that is, either a successful no-op or an error reply, but not a crash) [1]. However, it also implies that NBD_REPLY_TYPE_OFFSET_DATA must have a non-zero payload length, but our existing code was replying with a chunk that a picky client could reject as invalid because it was missing a payload (our own client implementation was recently patched to be that picky, after first fixing it to not send 0-length requests). We are already doing successful no-ops for 0-length writes and for non-structured reads; so for consistency, we want structured reply reads to also be a no-op. The easiest way to do this is to return a NBD_REPLY_TYPE_NONE chunk; this is best done via a new helper function (especially since future patches for other structured replies may benefit from using the same helper). [1] https://github.com/NetworkBlockDevice/nbd/commit/ee926037 Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20171108215703.9295-8-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>	2017-11-09 10:25:11 -06:00
Eric Blake	efdc0c103d	nbd: Fix struct name for structured reads A closer read of the NBD spec shows that a structured reply chunk for a hole is not quite identical to the prefix of a data chunk, because the hole has to also send a 32-bit size field. Although we do not yet send holes, we should fix the misleading information in our header and make it easier for a future patch to support sparse reads. Messed up in commit `bae245d1`. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20171108215703.9295-5-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>	2017-11-09 10:17:12 -06:00
Vladimir Sementsov-Ogievskiy	46321d6b5f	nbd/server: fix nbd_negotiate_handle_info namelen should be here, length is unrelated, and always 0 at this point. Broken in introduction in commit `f37708f6`, but mostly harmless (replying with '' as the name does not violate protocol, and does not confuse qemu as the nbd client since our implementation does not ask for the name; but might confuse some other client that does ask for the name especially if the default export is different than the export name being queried). Adding an assert makes it obvious that we are not skipping any bytes in the client's message, as well as making it obvious that we were using the wrong variable. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> CC: qemu-stable@nongnu.org Message-Id: <20171101154204.27146-1-vsementsov@virtuozzo.com> [eblake: improve commit message, squash in assert addition] Signed-off-by: Eric Blake <eblake@redhat.com>	2017-11-08 16:32:26 -06:00
Eric Blake	a57f6dea02	nbd/server: Include human-readable message in structured errors The NBD spec permits including a human-readable error string if structured replies are in force, so we might as well send the client the message that we logged on any error. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20171027104037.8319-9-eblake@redhat.com>	2017-10-30 21:48:11 +01:00
Vladimir Sementsov-Ogievskiy	5c54e7fa71	nbd: Minimal structured read for server Minimal implementation of structured read: one structured reply chunk, no segmentation. Minimal structured error implementation: no text message. Support DF flag, but just ignore it, as there is no segmentation any way. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20171027104037.8319-8-eblake@redhat.com>	2017-10-30 21:48:06 +01:00
Eric Blake	e68c35cfb8	nbd/server: Refactor zero-length option check Consolidate the response for a non-zero-length option payload into a new function, nbd_reject_length(). This check will also be used when introducing support for structured replies. Note that STARTTLS response differs based on time: if the connection is still unencrypted, we set fatal to true (a client that can't request TLS correctly may still think that we are ready to start the TLS handshake, so we must disconnect); while if the connection is already encrypted, the client is sending a bogus request but is no longer at risk of being confused by continuing the connection. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20171027104037.8319-7-eblake@redhat.com> [eblake: correct return value on STARTTLS] Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>	2017-10-30 21:47:18 +01:00
Eric Blake	8cbee49ed7	nbd/server: Simplify nbd_negotiate_options loop Instead of making each caller check whether a transmission error occurred, we can sink a common error check to the end of the loop. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20171027104037.8319-6-eblake@redhat.com> [eblake: squash in compiler warning fix] Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>	2017-10-30 21:07:59 +01:00
Eric Blake	8fb48b8b38	nbd/server: Report error for write to read-only export When the server is read-only, we were already reporting an error message for NBD_CMD_WRITE_ZEROES, but failed to set errp for a similar NBD_CMD_WRITE. This will matter more once structured replies allow the server to propagate the errp information back to the client. While at it, use an error message that makes a bit more sense if viewed on the client side. Note that when using qemu-io to test qemu-nbd behavior, it is rather difficult to convince qemu-io to send protocol violations (such as a read beyond bounds), because we have a lot of active checking on the client side that a qemu-io request makes sense before it ever goes over the wire to the server. The case of a client attempting a write when the server is started as 'qemu-nbd -r' is one of the few places where we can easily test error path handling, without having to resort to hacking in known temporary bugs to either the server or client. [Maybe we want a future patch to the client to do up-front checking on writes to a read-only export, the way it does up-front bounds checking; but I don't see anything in the NBD spec that points to a protocol violation in our current behavior.] Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20171027104037.8319-5-eblake@redhat.com>	2017-10-30 21:07:44 +01:00
Eric Blake	bae245d19a	nbd: Expose constants and structs for structured read Upcoming patches will implement the NBD structured reply extension [1] for both client and server roles. Declare the constants, structs, and lookup routines that will be valuable whether the server or client code is backported in isolation. This includes moving one constant from an internal header to the public header, as part of the structured read processing will be done in block/nbd-client.c rather than nbd/client.c. [1]https://github.com/NetworkBlockDevice/nbd/blob/extension-structured-reply/doc/proto.md Based on patches from Vladimir Sementsov-Ogievskiy. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20171027104037.8319-4-eblake@redhat.com>	2017-10-30 21:07:21 +01:00
Eric Blake	e7a78d0eff	nbd: Include error names in trace messages NBD errors were originally sent over the wire based on Linux errno values; but not all the world is Linux, and not all platforms share the same values. Since a number isn't very easy to decipher on all platforms, update the trace messages to include the name of NBD errors being sent/received over the wire. Tweak the trace messages to be at the point where we are using the NBD error, not the translation to the host errno values. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20171027104037.8319-2-eblake@redhat.com>	2017-10-30 21:07:21 +01:00
Vladimir Sementsov-Ogievskiy	de79bfc36f	nbd/server: simplify reply transmission Send qiov via qio_channel_writev_all instead of calling nbd_write twice with a cork. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20171012095319.136610-8-vsementsov@virtuozzo.com> [eblake: rebase to tweaks earlier in series] Signed-off-by: Eric Blake <eblake@redhat.com>	2017-10-13 08:05:16 -05:00
Vladimir Sementsov-Ogievskiy	978df1b6bf	nbd/server: refactor nbd_co_send_simple_reply parameters Pass client and buffer (*data) parameters directly, to make the function consistent with further structured reply sending functions. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20171012095319.136610-7-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2017-10-13 08:05:14 -05:00
Vladimir Sementsov-Ogievskiy	14cea41d39	nbd/server: do not use NBDReply structure NBDReply structure will be upgraded in future patches to handle both simple and structured replies and will be used only in the client Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20171012095319.136610-6-vsementsov@virtuozzo.com> [eblake: rebase to tweaks earlier in series] Signed-off-by: Eric Blake <eblake@redhat.com>	2017-10-13 08:05:11 -05:00
Vladimir Sementsov-Ogievskiy	caad53845a	nbd/server: structurize simple reply header sending Use packed structure instead of pointer arithmetics. Also, merge two redundant traces into one. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20171012095319.136610-5-vsementsov@virtuozzo.com> [eblake: tweak and mention impact on traces, fix errp usage] Signed-off-by: Eric Blake <eblake@redhat.com>	2017-10-12 16:53:15 -05:00
Vladimir Sementsov-Ogievskiy	7b3158f951	nbd: rename some simple-request related objects to be _simple_ To be consistent when their _structured_ analogs will be introduced. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20171012095319.136610-4-vsementsov@virtuozzo.com> [eblake: also tweak trace message contents] Signed-off-by: Eric Blake <eblake@redhat.com>	2017-10-12 16:27:34 -05:00
Marc-André Lureau	e8d3eb74bf	NBD: use g_new() family of functions Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20171006235023.11952-22-f4bug@amsat.org> Signed-off-by: Eric Blake <eblake@redhat.com>	2017-10-12 15:56:06 -05:00

1 2 3 4 5

233 Commits