Closing export is somewhat convoluted because nbd_export_close() and
nbd_export_put() call each other and the ways they actually end up being
nested is not necessarily obvious.
However, it is not really necessary to call nbd_export_close() from
nbd_export_put() when putting the last reference because it only does
three things:
1. Close all clients. We're going to refcount 0 and all clients hold a
reference, so we know there is no active client any more.
2. Close the user reference (represented by exp->name being non-NULL).
The same argument applies: If the export were still named, we would
still have a reference.
3. Freeing exp->description. This is really cleanup work to be done when
the export is finally freed. There is no reason to already clear it
while clients are still in the process of shutting down.
So after moving the cleanup of exp->description, the code can be
simplified so that only nbd_export_close() calls nbd_export_put(), but
never the other way around.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200924152717.287415-13-kwolf@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The export close callback is unused by the built-in NBD server. qemu-nbd
uses it only during shutdown to wait for the unrefed export to actually
go away. It can just use nbd_export_close_all() instead and do without
the callback.
This removes the close callback from nbd_export_new() and makes both
callers of it more similar.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200924152717.287415-11-kwolf@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
nbd-server-add tries to be convenient and adds two questionable
features that we don't want to share in block-export-add, even for NBD
exports:
1. When requesting a writable export of a read-only device, the export
is silently downgraded to read-only. This should be an error in the
context of block-export-add.
2. When using a BlockBackend name, unplugging the device from the guest
will automatically stop the NBD server, too. This may sometimes be
what you want, but it could also be very surprising. Let's keep
things explicit with block-export-add. If the user wants to stop the
export, they should tell us so.
Move these things into the nbd-server-add QMP command handler so that
they apply only there.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200924152717.287415-8-kwolf@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Instead of implementing qemu-nbd --offset in the NBD code, just put a
raw block node with the requested offset on top of the user image and
rely on that doing the job.
This does not only simplify the nbd_export_new() interface and bring it
closer to the set of options that the nbd-server-add QMP command offers,
but in fact it also eliminates a potential source for bugs in the NBD
code which previously had to add the offset manually in all relevant
places.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200924152717.287415-7-kwolf@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
We want to have a common set of commands for all types of block exports.
Currently, this is only NBD, but we're going to add more types.
This patch adds the basic BlockExport and BlockExportDriver structs and
a QMP command block-export-add that creates a new export based on the
given BlockExportOptions.
qmp_nbd_server_add() becomes a wrapper around qmp_block_export_add().
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200924152717.287415-5-kwolf@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200924152717.287415-2-kwolf@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
When looking for a dirty bitmap to share, we should handle filters by
just including them in the search (so they do not break backing chains).
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Meson doesn't enjoy the same flexibility we have with Make in choosing
the include path. In particular the tracing headers are using
$(build_root)/$(<D).
In order to keep the include directives unchanged,
the simplest solution is to generate headers with patterns like
"trace/trace-audio.h" and place forwarding headers in the source tree
such that for example "audio/trace.h" includes "trace/trace-audio.h".
This patch is too ugly to be applied to the Makefiles now. It's only
a way to separate the changes to the tracing header files from the
Meson rewrite of the tracing logic.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Although qemu as NBD client limits requests to <2G, the NBD protocol
allows clients to send requests almost all the way up to 4G. But
because our block layer is not yet 64-bit clean, we accidentally wrap
such requests into a negative size, and fail with EIO instead of
performing the intended operation.
The bug is visible in modern systems with something as simple as:
$ qemu-img create -f qcow2 /tmp/image.img 5G
$ sudo qemu-nbd --connect=/dev/nbd0 /tmp/image.img
$ sudo blkdiscard /dev/nbd0
or with user-space only:
$ truncate --size=3G file
$ qemu-nbd -f raw file
$ nbdsh -u nbd://localhost:10809 -c 'h.trim(3*1024*1024*1024,0)'
Although both blk_co_pdiscard and blk_pwrite_zeroes currently return 0
on success, this is also a good time to fix our code to a more robust
paradigm that treats all non-negative values as success.
Alas, our iotests do not currently make it easy to add external
dependencies on blkdiscard or nbdsh, so we have to rely on manual
testing for now.
This patch can be reverted when we later improve the overall block
layer to be 64-bit clean, but for now, a minimal fix was deemed less
risky prior to release.
CC: qemu-stable@nongnu.org
Fixes: 1f4d6d18ed
Fixes: 1c6c4bb7f0
Fixes: https://github.com/systemd/systemd/issues/16242
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200722212231.535072-1-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
[eblake: rework success tests to use >=0]
Consider nbd_export_close_all(). The call-stack looks like this:
nbd_export_close_all() -> nbd_export_close -> call client_close() for
each client.
client_close() doesn't guarantee that client is closed: nbd_trip()
keeps reference to it. So, nbd_export_close_all() just reduce
reference counter on export and removes it from the list, but doesn't
guarantee that nbd_trip() finished neither export actually removed.
Let's wait for all exports actually removed.
Without this fix, the following crash is possible:
- export bitmap through internal Qemu NBD server
- connect a client
- shutdown Qemu
On shutdown nbd_export_close_all is called, but it actually don't wait
for nbd_trip() to finish and to release its references. So, export is
not release, and exported bitmap remains busy, and on try to remove the
bitmap (which is part of bdrv_close()) the assertion fails:
bdrv_release_dirty_bitmap_locked: Assertion `!bdrv_dirty_bitmap_busy(bitmap)' failed
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200714162234.13113-2-vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
If we want to check error after errp-function call, we need to
introduce local_err and then propagate it to errp. Instead, use
the ERRP_GUARD() macro, benefits are:
1. No need of explicit error_propagate call
2. No need of explicit local_err variable: use errp directly
3. ERRP_GUARD() leaves errp as is if it's not NULL or
&error_fatal, this means that we don't break error_abort
(we'll abort on error_set, not on error_propagate)
If we want to add some info to errp (by error_prepend() or
error_append_hint()), we must use the ERRP_GUARD() macro.
Otherwise, this info will not be added when errp == &error_fatal
(the program will exit prior to the error_append_hint() or
error_prepend() call). Fix several such cases, e.g. in nbd_read().
This commit is generated by command
sed -n '/^Network Block Device (NBD)$/,/^$/{s/^F: //p}' \
MAINTAINERS | \
xargs git ls-files | grep '\.[hc]$' | \
xargs spatch \
--sp-file scripts/coccinelle/errp-guard.cocci \
--macro-file scripts/cocci-macro-file.h \
--in-place --no-show-diff --max-width 80
Reported-by: Kevin Wolf <kwolf@redhat.com>
Reported-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
[Commit message tweaked]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20200707165037.1026246-8-armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
[ERRP_AUTO_PROPAGATE() renamed to ERRP_GUARD(), and
auto-propagated-errp.cocci to errp-guard.cocci. Commit message
tweaked again.]
Ever since commit 36683283 (v2.8), the server code asserts that error
strings sent to the client are well-formed per the protocol by not
exceeding the maximum string length of 4096. At the time the server
first started sending error messages, the assertion could not be
triggered, because messages were completely under our control.
However, over the years, we have added latent scenarios where a client
could trigger the server to attempt an error message that would
include the client's information if it passed other checks first:
- requesting NBD_OPT_INFO/GO on an export name that is not present
(commit 0cfae925 in v2.12 echoes the name)
- requesting NBD_OPT_LIST/SET_META_CONTEXT on an export name that is
not present (commit e7b1948d in v2.12 echoes the name)
At the time, those were still safe because we flagged names larger
than 256 bytes with a different message; but that changed in commit
93676c88 (v4.2) when we raised the name limit to 4096 to match the NBD
string limit. (That commit also failed to change the magic number
4096 in nbd_negotiate_send_rep_err to the just-introduced named
constant.) So with that commit, long client names appended to server
text can now trigger the assertion, and thus be used as a denial of
service attack against a server. As a mitigating factor, if the
server requires TLS, the client cannot trigger the problematic paths
unless it first supplies TLS credentials, and such trusted clients are
less likely to try to intentionally crash the server.
We may later want to further sanitize the user-supplied strings we
place into our error messages, such as scrubbing out control
characters, but that is less important to the CVE fix, so it can be a
later patch to the new nbd_sanitize_name.
Consideration was given to changing the assertion in
nbd_negotiate_send_rep_verr to instead merely log a server error and
truncate the message, to avoid leaving a latent path that could
trigger a future CVE DoS on any new error message. However, this
merely complicates the code for something that is already (correctly)
flagging coding errors, and now that we are aware of the long message
pitfall, we are less likely to introduce such errors in the future,
which would make such error handling dead code.
Reported-by: Xueqiang Wei <xuwei@redhat.com>
CC: qemu-stable@nongnu.org
Fixes: https://bugzilla.redhat.com/1843684 CVE-2020-10761
Fixes: 93676c88d7
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200610163741.3745251-2-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Use bdrv_dirty_bitmap_next_dirty_area for bitmap_to_extents. Since
bdrv_dirty_bitmap_next_dirty_area is very accurate in its interface,
we'll never exceed requested region with last chunk. So, we don't need
dont_fragment, and bitmap_to_extents() interface becomes clean enough
to not require any comment.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 20200205112041.6003-10-vsementsov@virtuozzo.com
Signed-off-by: John Snow <jsnow@redhat.com>
Introduce NBDExtentArray class, to handle extents list creation in more
controlled way and with fewer OUT parameters in functions.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 20200205112041.6003-9-vsementsov@virtuozzo.com
Signed-off-by: John Snow <jsnow@redhat.com>
We are going to introduce bdrv_dirty_bitmap_next_dirty so that same
variable may be used to store its return value and to be its parameter,
so it would int64_t.
Similarly, we are going to refactor hbitmap_next_dirty_area to use
hbitmap_next_dirty together with hbitmap_next_zero, therefore we want
hbitmap_next_zero parameter type to be int64_t too.
So, for convenience update all parameters of *_next_zero and
*_next_dirty_area to be int64_t.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 20200205112041.6003-6-vsementsov@virtuozzo.com
Signed-off-by: John Snow <jsnow@redhat.com>
Detected by a hang in the libnbd testsuite. If a client requests
multiple meta contexts (both base:allocation and qemu:dirty-bitmap:x)
at the same time, our attempt to silence a false-positive warning
about a potential uninitialized variable introduced botched logic: we
were short-circuiting the second context, and never sending the
NBD_REPLY_FLAG_DONE. Combining two 'if' into one 'if/else' in
bdf200a55 was wrong (I'm a bit embarrassed that such a change was my
initial suggestion after the v1 patch, then I did not review the v2
patch that actually got committed). Revert that, and instead silence
the false positive warning by replacing 'return ret' with 'return 0'
(the value it always has at that point in the code, even though it
eluded the deduction abilities of the robot that reported the false
positive).
Fixes: bdf200a553
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200206173832.130004-1-eblake@redhat.com>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Fixes:
/mnt/sdb/qemu/nbd/server.c: In function 'nbd_handle_request':
/mnt/sdb/qemu/nbd/server.c:2313:9: error: 'ret' may be used uninitialized in this function [-Werror=maybe-uninitialized]
int ret;
Reported-by: Euler Robot <euler.robot@huawei.com>
Signed-off-by: Pan Nengyuan <pannengyuan@huawei.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Message-Id: <20200108025132.46956-1-pannengyuan@huawei.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Qemu as server currently won't accept export names larger than 256
bytes, nor create dirty bitmap names longer than 1023 bytes, so most
uses of qemu as client or server have no reason to get anywhere near
the NBD spec maximum of a 4k limit per string.
However, we weren't actually enforcing things, ignoring when the
remote side violates the protocol on input, and also having several
code paths where we send oversize strings on output (for example,
qemu-nbd --description could easily send more than 4k). Tighten
things up as follows:
client:
- Perform bounds check on export name and dirty bitmap request prior
to handing it to server
- Validate that copied server replies are not too long (ignoring
NBD_INFO_* replies that are not copied is not too bad)
server:
- Perform bounds check on export name and description prior to
advertising it to client
- Reject client name or metadata query that is too long
- Adjust things to allow full 4k name limit rather than previous
256 byte limit
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20191114024635.11363-4-eblake@redhat.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
As long as we limit NBD names to 256 bytes (the bare minimum permitted
by the standard), stack-allocation works for parsing a name received
from the client. But as mentioned in a comment, we eventually want to
permit up to the 4k maximum of the NBD standard, which is too large
for stack allocation; so switch everything in the server to use heap
allocation. For now, there is no change in actually supported name
length.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20191114024635.11363-2-eblake@redhat.com>
[eblake: fix uninit variable compile failure]
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
When iothreads are in use, the failure to grab the aio context results
in an assertion failure when trying to unlock things during blk_unref,
when trying to unlock a mutex that was not locked. In short, all
calls to nbd_export_put need to done while within the correct aio
context. But since nbd_export_put can recursively reach itself via
nbd_export_close, and recursively grabbing the context would deadlock,
we can't do the context grab directly in those functions, but must do
so in their callers.
Hoist the use of the correct aio_context from nbd_export_new() to its
caller qmp_nbd_server_add(). Then tweak qmp_nbd_server_remove(),
nbd_eject_notifier(), and nbd_esport_close_all() to grab the right
context, so that all callers during qemu now own the context before
nbd_export_put() can call blk_unref().
Remaining uses in qemu-nbd don't matter (since that use case does not
support iothreads).
Suggested-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20190917023917.32226-1-eblake@redhat.com>
Reviewed-by: Sergio Lopez <slp@redhat.com>
On creation, the export's AioContext is set to the same one as the
BlockBackend, while the AioContext in the client QIOChannel is left
untouched.
As a result, when using data-plane, nbd_client_receive_next_request()
schedules coroutines in the IOThread AioContext, while the client's
QIOChannel is serviced from the main_loop, potentially triggering the
assertion at qio_channel_restart_[read|write].
To fix this, as soon we have the export corresponding to the client,
we call qio_channel_attach_aio_context() to attach the QIOChannel
context to the export's AioContext. This matches with the logic at
blk_aio_attached().
RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1748253
Signed-off-by: Sergio Lopez <slp@redhat.com>
Message-Id: <20190912110032.26395-1-slp@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
I received an off-list report of failure to connect to an NBD server
expecting an x509 certificate, when the client was attempting something
similar to this command line:
$ ./x86_64-softmmu/qemu-system-x86_64 -name 'blah' -machine q35 -nodefaults \
-object tls-creds-x509,id=tls0,endpoint=client,dir=$path_to_certs \
-device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pcie.0,addr=0x6 \
-drive id=drive_image1,if=none,snapshot=off,aio=threads,cache=none,format=raw,file=nbd:localhost:9000,werror=stop,rerror=stop,tls-creds=tls0 \
-device scsi-hd,id=image1,drive=drive_image1,bootindex=0
qemu-system-x86_64: -drive id=drive_image1,if=none,snapshot=off,aio=threads,cache=none,format=raw,file=nbd:localhost:9000,werror=stop,rerror=stop,tls-creds=tls0: TLS negotiation required before option 7 (go)
server reported: Option 0x7 not permitted before TLS
The problem? As specified, -drive is trying to pass tls-creds to the
raw format driver instead of the nbd protocol driver, but before we
get to the point where we can detect that raw doesn't know what to do
with tls-creds, the nbd driver has already failed because the server
complained. The fix to the broken command line? Pass
'...,file.tls-creds=tls0' to ensure the tls-creds option is handed to
nbd, not raw. But since the error message was rather cryptic, I'm
trying to improve the error message.
With this patch, the error message adds a line:
qemu-system-x86_64: -drive id=drive_image1,if=none,snapshot=off,aio=threads,cache=none,format=raw,file=nbd:localhost:9000,werror=stop,rerror=stop,tls-creds=tls0: TLS negotiation required before option 7 (go)
Did you forget a valid tls-creds?
server reported: Option 0x7 not permitted before TLS
And with luck, someone grepping for that error message will find this
commit message and figure out their command line mistake. Sadly, the
only mention of file.tls-creds in our docs relates to an --image-opts
use of PSK encryption with qemu-img as the client, rather than x509
certificate encryption with qemu-kvm as the client.
CC: Tingting Mao <timao@redhat.com>
CC: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20190907172055.26870-1-eblake@redhat.com>
[eblake: squash in iotest 233 fix]
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
While the tracing framework does not forbid trailing newline in
events format string, using them lead to confuse output.
It is the responsibility of the backend to properly end an event
line.
Some of our formats have trailing newlines, remove them.
[Fixed typo in commit description reported by Eric Blake
<eblake@redhat.com>
--Stefan]
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20190916095121.29506-2-philmd@redhat.com
Message-Id: <20190916095121.29506-2-philmd@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
The server side is fairly straightforward: we can always advertise
support for detection of fast zero, and implement it by mapping the
request to the block layer BDRV_REQ_NO_FALLBACK.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20190823143726.27062-5-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
[eblake: update iotests 223, 233]
Commit fe0480d6 and friends added BDRV_REQ_NO_FALLBACK as a way to
avoid wasting time on a preliminary write-zero request that will later
be rewritten by actual data, if it is known that the write-zero
request will use a slow fallback; but in doing so, could not optimize
for NBD. The NBD specification is now considering an extension that
will allow passing on those semantics; this patch updates the new
protocol bits and 'qemu-nbd --list' output to recognize the bit, as
well as the new errno value possible when using the new flag; while
upcoming patches will improve the client to use the feature when
present, and the server to advertise support for it.
The NBD spec recommends (but not requires) that ENOTSUP be avoided for
all but failures of a fast zero (the only time it is mandatory to
avoid an ENOTSUP failure is when fast zero is supported but not
requested during write zeroes; the questionable use is for ENOTSUP to
other actions like a normal write request). However, clients that get
an unexpected ENOTSUP will either already be treating it the same as
EINVAL, or may appreciate the extra bit of information. We were
equally loose for returning EOVERFLOW in more situations than
recommended by the spec, so if it turns out to be a problem in
practice, a later patch can tighten handling for both error codes.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20190823143726.27062-3-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
[eblake: tweak commit message, also handle EOPNOTSUPP]
When creating a read-only image, we are still advertising support for
TRIM and WRITE_ZEROES to the client, even though the client should not
be issuing those commands. But seeing this requires looking across
multiple functions:
All callers to nbd_export_new() passed a single flag based solely on
whether the export allows writes. Later, we then pass a constant set
of flags to nbd_negotiate_options() (namely, the set of flags which we
always support, at least for writable images), which is then further
dynamically modified with NBD_FLAG_SEND_DF based on client requests
for structured options. Finally, when processing NBD_OPT_EXPORT_NAME
or NBD_OPT_EXPORT_GO we bitwise-or the original caller's flag with the
runtime set of flags we've built up over several functions.
Let's refactor things to instead compute a baseline of flags as soon
as possible which gets shared between multiple clients, in
nbd_export_new(), and changing the signature for the callers to pass
in a simpler bool rather than having to figure out flags. We can then
get rid of the 'myflags' parameter to various functions, and instead
refer to client for everything we need (we still have to perform a
bitwise-OR for NBD_FLAG_SEND_DF during NBD_OPT_EXPORT_NAME and
NBD_OPT_EXPORT_GO, but it's easier to see what is being computed).
This lets us quit advertising senseless flags for read-only images, as
well as making the next patch for exposing FAST_ZERO support easier to
write.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20190823143726.27062-2-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
[eblake: improve commit message, update iotest 223]
A server may have a reason to reject a request for structured replies,
beyond just not recognizing them as a valid request; similarly, it may
have a reason for rejecting a request for a meta context. It doesn't
hurt us to continue talking to such a server; otherwise 'qemu-nbd
--list' of such a server fails to display all available details about
the export.
Encountered when temporarily tweaking nbdkit to reply with
NBD_REP_ERR_POLICY. Present since structured reply support was first
added (commit d795299b reused starttls handling, but starttls is
different in that we can't fall back to other behavior on any error).
Note that for an unencrypted client trying to connect to a server that
requires encryption, this defers the point of failure to when we
finally execute a strict command (such as NBD_OPT_GO or NBD_OPT_LIST),
now that the intermediate NBD_OPT_STRUCTURED_REPLY does not diagnose
NBD_REP_ERR_TLS_REQD as fatal; but as the protocol eventually gets us
to a command where we can't continue onwards, the changed error
message doesn't cause any security concerns.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20190824172813.29720-3-eblake@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
[eblake: fix iotest 233]
Thanks to our recent move to use glib's g_autofree, I can join the
bandwagon. Getting rid of gotos is fun ;)
There are probably more places where we could register cleanup
functions and get rid of more gotos; this patch just focuses on the
labels that existed merely to call g_free.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20190824172813.29720-2-eblake@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
The NBD specification defines NBD_FLAG_CAN_MULTI_CONN, which can be
advertised when the server promises cache consistency between
simultaneous clients (basically, rules that determine what FUA and
flush from one client are able to guarantee for reads from another
client). When we don't permit simultaneous clients (such as qemu-nbd
without -e), the bit makes no sense; and for writable images, we
probably have a lot more work before we can declare that actions from
one client are cache-consistent with actions from another. But for
read-only images, where flush isn't changing any data, we might as
well advertise multi-conn support. What's more, advertisement of the
bit makes it easier for clients to determine if 'qemu-nbd -e' was in
use, where a second connection will succeed rather than hang until the
first client goes away.
This patch affects qemu as server in advertising the bit. We may want
to consider patches to qemu as client to attempt parallel connections
for higher throughput by spreading the load over those connections
when a server advertises multi-conn, but for now sticking to one
connection per nbd:// BDS is okay.
See also: https://bugzilla.redhat.com/1708300
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20190815185024.7010-1-eblake@redhat.com>
[eblake: tweak blockdev-nbd.c to not request shared when writable,
fix iotest 233]
Reviewed-by: John Snow <jsnow@redhat.com>
Add a public interface for get. While we're at it,
rename "bdrv_get_dirty_bitmap_locked" to "bdrv_dirty_bitmap_get_locked".
(There are more functions to rename to the bdrv_dirty_bitmap_VERB form,
but they will wait until the conclusion of this series.)
Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190709232550.10724-11-jsnow@redhat.com
Signed-off-by: John Snow <jsnow@redhat.com>
In my "build everything" tree, changing qemu/main-loop.h triggers a
recompile of some 5600 out of 6600 objects (not counting tests and
objects that don't depend on qemu/osdep.h). It includes block/aio.h,
which in turn includes qemu/event_notifier.h, qemu/notify.h,
qemu/processor.h, qemu/qsp.h, qemu/queue.h, qemu/thread-posix.h,
qemu/thread.h, qemu/timer.h, and a few more.
Include qemu/main-loop.h only where it's needed. Touching it now
recompiles only some 1700 objects. For block/aio.h and
qemu/event_notifier.h, these numbers drop from 5600 to 2800. For the
others, they shrink only slightly.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20190812052359.30071-21-armbru@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20190812052359.30071-20-armbru@redhat.com>
No reason to use blocking channel for negotiation and we'll benefit in
further reconnect feature, as qio_channel reads and writes will do
qemu_coroutine_yield while waiting for io completion.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20190618114328.55249-3-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
This helps to avoid extra io, allocations and memory copying.
We assume here that CMD_CACHE is always used with copy-on-read, as
otherwise it's a noop.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20190725100550.33801-4-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Commit 3d068aff (3.0) introduced NBD_MAX_BITMAP_EXTENTS as a limit on
how large we would allow a reply to NBD_CMD_BLOCK_STATUS to grow when
it is visiting a qemu:dirty-bitmap: context. Later, commit fb7afc79
(3.1) reused the constant to limit base:allocation context replies,
although the name is now less appropriate in that situation.
Rename things, and improve the macro to use units.h for better
legibility. Then reformat the comment to comply with checkpatch rules
added in the meantime. No semantic change.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20190510151735.29687-1-eblake@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
This adds a new parameter to blk_new() which requires its callers to
declare from which AioContext this BlockBackend is going to be used (or
the locks of which AioContext need to be taken anyway).
The given context is only stored and kept up to date when changing
AioContexts. Actually applying the stored AioContext to the root node
is saved for another commit.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The NBD server uses an AioContext notifier, so it can tolerate that its
BlockBackend is switched to a different AioContext. Before we start
actually calling bdrv_try_set_aio_context(), which checks for
consistency, outside of test cases, we need to make sure that the NBD
server actually allows this.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Add a missing space to the error message used when giving up on a
server that insists on an alignment which renders the last few bytes
of the export unreadable.
Fixes: 3add3ab78
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20190404145226.32649-1-eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
In commit 0c1d50bd, I added a couple of TODO comments about whether we
consult bl.request_alignment when responding to NBD_OPT_INFO. At the
time, qemu as server was hard-coding an advertised alignment of 512 to
clients that promised to obey constraints, and there was no function
for getting at a device's preferred alignment. But in hindsight,
advertising 512 when the block device prefers 1 caused other
compliance problems, and commit b0245d64 changed one of the two TODO
comments to advertise a more accurate alignment. Time to fix the other
TODO. Doesn't really impact qemu as client (our normal client doesn't
use NBD_OPT_INFO, and qemu-nbd --list promises to obey block sizes),
but it might prove useful to other clients.
Fixes: b0245d64
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20190403030526.12258-4-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
We've recently added traces for clients to flag server non-compliance;
let's do the same for servers to flag client non-compliance. According
to the spec, if the client requests NBD_INFO_BLOCK_SIZE, it is
promising to send all requests aligned to those boundaries. Of
course, if the client does not request NBD_INFO_BLOCK_SIZE, then it
made no promises so we shouldn't flag anything; and because we are
willing to handle clients that made no promises (the spec allows us to
use NBD_REP_ERR_BLOCK_SIZE_REQD if we had been unwilling), we already
have to handle unaligned requests (which the block layer already does
on our behalf). So even though the spec allows us to return EINVAL
for clients that promised to behave, it's easier to always answer
unaligned requests. Still, flagging non-compliance can be useful in
debugging a client that is trying to be maximally portable.
Qemu as client used to have one spot where it sent non-compliant
requests: if the server sends an unaligned reply to
NBD_CMD_BLOCK_STATUS, and the client was iterating over the entire
disk, the next request would start at that unaligned point; this was
fixed in commit a39286dd when the client was taught to work around
server non-compliance; but is equally fixed if the server is patched
to not send unaligned replies in the first place (yes, qemu 4.0 as
server still has few such bugs, although they will be patched in
4.1). Fortunately, I did not find any more spots where qemu as client
was non-compliant. I was able to test the patch by using the following
hack to convince qemu-io to run various unaligned commands, coupled
with serving 512-byte alignment by intentionally omitting '-f raw' on
the server while viewing server traces.
| diff --git i/nbd/client.c w/nbd/client.c
| index 427980bdd22..1858b2aac35 100644
| --- i/nbd/client.c
| +++ w/nbd/client.c
| @@ -449,6 +449,7 @@ static int nbd_opt_info_or_go(QIOChannel *ioc, uint32_t opt,
| nbd_send_opt_abort(ioc);
| return -1;
| }
| + info->min_block = 1;//hack
| if (!is_power_of_2(info->min_block)) {
| error_setg(errp, "server minimum block size %" PRIu32
| " is not a power of two", info->min_block);
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20190403030526.12258-3-eblake@redhat.com>
[eblake: address minor review nits]
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Don't increment remaining_bytes until we know that we will actually be
including the current block status extent in the reply; otherwise, the
value traced will include a bytes value that is oversized by the
length of the next block status extent which did not get sent because
it instead ended the loop.
Fixes: fb7afc79
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20190403030526.12258-2-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Both NBD_CMD_BLOCK_STATUS and structured NBD_CMD_READ will split their
reply according to bdrv_block_status() boundaries. If the block device
has a request_alignment smaller than 512, but we advertise a block
alignment of 512 to the client, then this can result in the server
reply violating client expectations by reporting a smaller region of
the export than what the client is permitted to address (although this
is less of an issue for qemu 4.0 clients, given recent client patches
to overlook our non-compliance at EOF). Since it's always better to
be strict in what we send, it is worth advertising the actual minimum
block limit rather than blindly rounding it up to 512.
Note that this patch is not foolproof - it is still possible to
provoke non-compliant server behavior using:
$ qemu-nbd --image-opts driver=blkdebug,align=512,image.driver=file,image.filename=/path/to/non-aligned-file
That is arguably a bug in the blkdebug driver (it should never pass
back block status smaller than its alignment, even if it has to make
multiple bdrv_get_status calls and determine the
least-common-denominator status among the group to return). It may
also be possible to observe issues with a backing layer with smaller
alignment than the active layer, although so far I have been unable to
write a reliable iotest for that scenario (but again, an issue like
that could be argued to be a bug in the block layer, or something
where we need a flag to bdrv_block_status() to state whether the
result must be aligned to the current layer's limits or can be
subdivided for accuracy when chasing backing files).
Anyways, as blkdebug is not normally used, and as this patch makes our
server more interoperable with qemu 3.1 clients, it is worth applying
now, even while we still work on a larger patch series for the 4.1
timeframe to have byte-accurate file lengths.
Note that the iotests output changes - for 223 and 233, we can see the
server's better granularity advertisement; and for 241, the three test
cases have the following effects:
- natural alignment: the server's smaller alignment is now advertised,
and the hole reported at EOF is now the right result; we've gotten rid
of the server's non-compliance
- forced server alignment: the server still advertises 512 bytes, but
still sends a mid-sector hole. This is still a server compliance bug,
which needs to be fixed in the block layer in a later patch; output
does not change because the client is already being tolerant of the
non-compliance
- forced client alignment: the server's smaller alignment means that
the client now sees the server's status change mid-sector without any
protocol violations, but the fact that the map shows an unaligned
mid-sector hole is evidence of the block layer problems with aligned
block status, to be fixed in a later patch
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20190329042750.14704-7-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
[eblake: rebase to enhanced iotest 241 coverage]
The NBD spec suggests that a server should never advertise a size
inconsistent with its minimum block alignment, as that tail is
effectively inaccessible to a compliant client obeying those block
constraints. Since we have a habit of rounding up rather than
truncating, to avoid losing the last few bytes of user input, and we
cannot access the tail when the server advertises bogus block sizing,
abort the connection to alert the server to fix their bug. And
rejecting such servers matches what we already did for a min_block
that was not a power of 2 or which was larger than max_block.
Does not impact either qemu (which always sends properly aligned
sizes) or nbdkit (which does not send minimum block requirements yet);
so this is mostly aimed at new NBD server implementations, and ensures
that the rest of our code can assume the size is aligned.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20190330155704.24191-1-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Tracked down with cleanup-trace-events.pl. Funnies requiring manual
post-processing:
* block.c and blockdev.c trace points are in block/trace-events.
* hw/block/nvme.c uses the preprocessor to hide its trace point use
from cleanup-trace-events.pl.
* include/hw/xen/xen_common.h trace points are in hw/xen/trace-events.
* net/colo-compare and net/filter-rewriter.c use pseudo trace points
colo_compare_udp_miscompare and colo_filter_rewriter_debug to guard
debug code.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-id: 20190314180929.27722-5-armbru@redhat.com
Message-Id: <20190314180929.27722-5-armbru@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
We spell out sub/dir/ in sub/dir/trace-events' comments pointing to
source files. That's because when trace-events got split up, the
comments were moved verbatim.
Delete the sub/dir/ part from these comments. Gets rid of several
misspellings.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20190314180929.27722-3-armbru@redhat.com
Message-Id: <20190314180929.27722-3-armbru@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Almost all trace-events point to docs/devel/tracing.txt in a comment
right at the beginning. Touch up the ones that don't.
[Updated with Markus' new commit description wording.
--Stefan]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20190314180929.27722-2-armbru@redhat.com
Message-Id: <20190314180929.27722-2-armbru@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Instead of checking against busy, inconsistent, or read only directly,
use a check function with permissions bits that let us streamline the
checks without reproducing them in many places.
Included in this patch are permissions changes that simply add the
inconsistent check to existing permissions call spots, without
addressing existing bugs.
In general, this means that busy+readonly checks become BDRV_BITMAP_DEFAULT,
which checks against all three conditions. busy-only checks become
BDRV_BITMAP_ALLOW_RO.
Notably, remove allows inconsistent bitmaps, so it doesn't follow the pattern.
Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20190301191545.8728-4-jsnow@redhat.com
Signed-off-by: John Snow <jsnow@redhat.com>
These mean the same thing now. Unify them and rename the merged call
bdrv_dirty_bitmap_busy to indicate semantically what we are describing,
as well as help disambiguate from the various _locked and _unlocked
versions of bitmap helpers that refer to mutex locks.
Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20190223000614.13894-8-jsnow@redhat.com
Signed-off-by: John Snow <jsnow@redhat.com>
Check that the bitmap is not in use prior to it checking if it is
not enabled/recording guest writes. The bitmap being busy was likely
at the behest of the user, so this error has a greater chance of being
understood by the user.
Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20190223000614.13894-6-jsnow@redhat.com
Signed-off-by: John Snow <jsnow@redhat.com>
Currently any client which can complete the TLS handshake is able to use
the NBD server. The server admin can turn on the 'verify-peer' option
for the x509 creds to require the client to provide a x509 certificate.
This means the client will have to acquire a certificate from the CA
before they are permitted to use the NBD server. This is still a fairly
low bar to cross.
This adds a '--tls-authz OBJECT-ID' option to the qemu-nbd command which
takes the ID of a previously added 'QAuthZ' object instance. This will
be used to validate the client's x509 distinguished name. Clients
failing the authorization check will not be permitted to use the NBD
server.
For example to setup authorization that only allows connection from a client
whose x509 certificate distinguished name is
CN=laptop.example.com,O=Example Org,L=London,ST=London,C=GB
escape the commas in the name and use:
qemu-nbd --object tls-creds-x509,id=tls0,dir=/home/berrange/qemutls,\
endpoint=server,verify-peer=yes \
--object 'authz-simple,id=auth0,identity=CN=laptop.example.com,,\
O=Example Org,,L=London,,ST=London,,C=GB' \
--tls-creds tls0 \
--tls-authz authz0 \
....other qemu-nbd args...
NB: a real shell command line would not have leading whitespace after
the line continuation, it is just included here for clarity.
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-Id: <20190227162035.18543-2-berrange@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
[eblake: split long line in --help text, tweak 233 to show that whitespace
after ,, in identity= portion is actually okay]
Signed-off-by: Eric Blake <eblake@redhat.com>
Instead of using the convenience wrapper qio_channel_read_all_eof(), use
the lower level QIOChannel API. This means duplicating some code, but
we'll need this because this coroutine yield is special: We want it to
be interruptible so that nbd_client_attach_aio_context() can correctly
reenter the coroutine.
This moves the bdrv_dec/inc_in_flight() pair into nbd_read_eof(), so
that connection_co will always sit in this exact qio_channel_yield()
call when bdrv_drain() returns.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
The only caller of nbd_read_eof() is nbd_receive_reply(), so it doesn't
have to live in the header file, but can move next to its caller.
Also add the missing coroutine_fn to the function and its caller.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
lgtm.com pointed out that commit 678ba275 introduced a shadowed
declaration of local variable 'bs'; thankfully, the inner 'bs'
obtained by 'blk_bs(blk)' matches the outer one given that we had
'blk_insert_bs(blk, bs, errp)' a few lines earlier, and there are
no later uses of 'bs' beyond the scope of the 'if (bitmap)' to
care if we change the value stored in 'bs' while traveling the
backing chain to find a bitmap. So simply get rid of the extra
declaration.
Reported-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20190207191357.6665-1-eblake@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
We generally do very similar things around nbd_read: error_prepend
specifying what we have tried to read, and be_to_cpu conversion of
integers.
So, it seems reasonable to move common things to helper functions,
which:
1. simplify code a bit
2. generalize nbd_read error descriptions, all starting with
"Failed to read"
3. make it more difficult to forget to convert things from BE
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20190128165830.165170-1-vsementsov@virtuozzo.com>
[eblake: rename macro to DEF_NBD_READ_N and formatting tweaks;
checkpatch has false positive complaint]
Signed-off-by: Eric Blake <eblake@redhat.com>
Commit 3d068aff forgot to advertise available qemu: contexts
when the client requests a list with 0 queries. Furthermore,
3.0 shipped with a qemu-img hack of x-dirty-bitmap (commit
216ee365) that _silently_ acts as though the entire image is
clean if a requested bitmap is not present. Both bugs have
been recently fixed, so that a modern qemu server gives full
context output right away, and the client refuses a
connection if a requested x-dirty-bitmap was not found.
Still, it is likely that there will be users that have to
work with a mix of old and new qemu versions, depending on
which features get backported where, at which point being
able to rely on 'qemu-img --list' output to know for sure
whether a given NBD export has the desired dirty bitmap is
much nicer than blindly connecting and risking that the
entire image may appear clean. We can make our --list code
smart enough to work around buggy servers by tracking
whether we've seen any qemu: replies in the original 0-query
list; if not, repeat with a single query on "qemu:" (which
may still have no replies, but then we know for sure we
didn't trip up on the server bug).
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20190117193658.16413-21-eblake@redhat.com>
We want to be able to detect whether a given qemu NBD server is
exposing the right export(s) and dirty bitmaps, at least for
regression testing. We could use 'nbd-client -l' from the upstream
NBD project to list exports, but it's annoying to rely on
out-of-tree binaries; furthermore, nbd-client doesn't necessarily
know about all of the qemu NBD extensions. Thus, we plan on adding
a new mode to qemu-nbd that merely sniffs all possible information
from the server during handshake phase, then disconnects and dumps
the information.
This patch continues the work of the previous patch, by adding the
ability to track the list of available meta contexts into
NBDExportInfo. It benefits from the recent refactoring patches
with a new nbd_list_meta_contexts() that reuses much of the same
framework as setting a meta context.
Note: a malicious server could exhaust memory of a client by feeding
an unending loop of contexts; perhaps we could place a limit on how
many we are willing to receive. But this is no different from our
earlier analysis on a server sending an unending list of exports,
and the death of a client due to memory exhaustion when the client
was going to exit soon anyways is not really a denial of service
attack.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Richard W.M. Jones <rjones@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20190117193658.16413-19-eblake@redhat.com>
We want to be able to detect whether a given qemu NBD server is
exposing the right export(s) and dirty bitmaps, at least for
regression testing. We could use 'nbd-client -l' from the upstream
NBD project to list exports, but it's annoying to rely on
out-of-tree binaries; furthermore, nbd-client doesn't necessarily
know about all of the qemu NBD extensions. Thus, we plan on adding
a new mode to qemu-nbd that merely sniffs all possible information
from the server during handshake phase, then disconnects and dumps
the information.
This patch adds the low-level client code for grabbing the list
of exports. It benefits from the recent refactoring patches, in
order to share as much code as possible when it comes to doing
validation of server replies. The resulting information is stored
in an array of NBDExportInfo which has been expanded to any
description string, along with a convenience function for freeing
the list.
Note: a malicious server could exhaust memory of a client by feeding
an unending loop of exports; perhaps we should place a limit on how
many we are willing to receive. But note that a server could
reasonably be serving an export for every file in a large directory,
where an arbitrary limit in the client means we can't list anything
from such a server; the same happens if we just run until the client
fails to malloc() and thus dies by an abort(), where the limit is
no longer arbitrary but determined by available memory. Since the
client is already planning on being short-lived, it's hard to call
this a denial of service attack that would starve off other uses,
so it does not appear to be a security issue.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Richard W.M. Jones <rjones@redhat.com>
Message-Id: <20190117193658.16413-18-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Rename the function to nbd_opt_info_or_go() with an added parameter
and slight changes to comments and trace messages, in order to
reuse the function for NBD_OPT_INFO.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20190117193658.16413-17-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Another refactoring creating nbd_negotiate_finish_oldstyle()
for further reuse during 'qemu-nbd --list'.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Richard W.M. Jones <rjones@redhat.com>
Message-Id: <20190117193658.16413-16-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
An upcoming patch will add the ability for qemu-nbd to list
the services provided by an NBD server. Share the common
code of the TLS handshake by splitting the initial exchange
into a separate function, leaving only the export handling
in the original function. Functionally, there should be no
change in behavior in this patch, although some of the code
motion may be difficult to follow due to indentation changes
(view with 'git diff -w' for a smaller changeset).
I considered an enum for the return code coordinating state
between the two functions, but in the end just settled with
ample comments.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Richard W.M. Jones <rjones@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20190117193658.16413-15-eblake@redhat.com>
The function could only ever return 0 or -EINVAL; make this
clearer by dropping a useless 'fail:' label.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Richard W.M. Jones <rjones@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20190117193658.16413-14-eblake@redhat.com>
Extract portions of nbd_negotiate_simple_meta_context() to
a new function nbd_receive_one_meta_context() that copies the
pattern of nbd_receive_list() for performing the argument
validation of one reply. The error message when the server
replies with more than one context changes slightly, but
that shouldn't happen in the common case.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20190117193658.16413-13-eblake@redhat.com>
Refactor nbd_negotiate_simple_meta_context() to pull out the
code that can be reused to send a LIST request for 0 or 1 query.
No semantic change. The old comment about 'sizeof(uint32_t)'
being equivalent to '/* number of queries */' is no longer
needed, now that we are computing 'sizeof(queries)' instead.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Richard W.M. Jones <rjones@redhat.com>
Message-Id: <20190117193658.16413-12-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Pass 'info' instead of three separate parameters related to info,
when requesting the server to set the meta context. Update the
NBDExportInfo struct to rename the received id field to match the
fact that we are currently overloading the field to match whatever
context the user supplied through the x-dirty-bitmap hack, as well
as adding a TODO comment to remind future patches about a desire
to request two contexts at once.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Richard W.M. Jones <rjones@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20190117193658.16413-11-eblake@redhat.com>
Refactor the 'name' parameter of nbd_receive_negotiate() from
being a separate parameter into being part of the in-out 'info'.
This also spills over to a simplification of nbd_opt_go().
The main driver for this refactoring is that an upcoming patch
would like to add support to qemu-nbd to list information about
all exports available on a server, where the name(s) will be
provided by the server instead of the client. But another benefit
is that we can now allow the client to explicitly specify the
empty export name "" even when connecting to an oldstyle server
(even if qemu is no longer such a server after commit 7f7dfe2a).
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Richard W.M. Jones <rjones@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20190117193658.16413-10-eblake@redhat.com>
Right now, nbd_receive_list() is only called by
nbd_receive_query_exports(), which in turn is only called if the
server lacks NBD_OPT_GO but has working option negotiation, and is
merely used as a quality-of-implementation trick since servers
can't give decent errors for NBD_OPT_EXPORT_NAME. However, servers
that lack NBD_OPT_GO are becoming increasingly rare (nbdkit was a
latecomer, in Aug 2018, but qemu has been such a server since commit
f37708f6 in July 2017 and released in 2.10), so it no longer makes
sense to micro-optimize that function for performance.
Furthermore, when debugging a server's implementation, tracing the
full reply (both names and descriptions) is useful, not to mention
that upcoming patches adding 'qemu-nbd --list' will want to collect
that data. And when you consider that a server can send an export
name up to the NBD protocol length limit of 4k; but our current
NBD_MAX_NAME_SIZE is only 256, we can't trace all valid server
names without more storage, but 4k is large enough that the heap
is better than the stack for long names.
Thus, I'm changing the division of labor, with nbd_receive_list()
now always malloc'ing a result on success (the malloc is bounded
by the fact that we reject servers with a reply length larger
than 32M), and moving the comparison to 'wantname' to the caller.
There is a minor change in behavior where a server with 0 exports
(an immediate NBD_REP_ACK reply) is now no longer distinguished
from a server without LIST support (NBD_REP_ERR_UNSUP); this
information could be preserved with a complication to the calling
contract to provide a bit more information, but I didn't see the
point. After all, the worst that can happen if our guess at a
match is wrong is that the caller will get a cryptic disconnect
when NBD_OPT_EXPORT_NAME fails (which is no different from what
would happen if we had not tried LIST), while treating an empty
list as immediate failure would prevent connecting to really old
servers that really did lack LIST. Besides, NBD servers with 0
exports are rare (qemu can do it when using QMP nbd-server-start
without nbd-server-add - but qemu understands NBD_OPT_GO and
thus won't tickle this change in behavior).
Fix the spelling of foundExport to match coding standards while
in the area.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Richard W.M. Jones <rjones@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20190117193658.16413-9-eblake@redhat.com>
Although our compile-time environment is set up so that we always
support long files with 64-bit off_t, we have no guarantee whether
off_t is the same type as int64_t. This requires casts when
printing values, and prevents us from directly using qemu_strtoi64()
(which will be done in the next patch). Let's just flip to uint64_t
where possible, and stick to int64_t for detecting failure of
blk_getlength(); we also keep the assertions added in the previous
patch that the resulting values fit in 63 bits. The overflow check
in nbd_co_receive_request() was already sane (request->from is
validated to fit in 63 bits, and request->len is 32 bits, so the
addition can't overflow 64 bits), but rewrite it in a form easier
to recognize as a typical overflow check.
Rename the variable 'description' to keep line lengths reasonable.
Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20190117193658.16413-7-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
We only had two callers to nbd_export_new; qemu-nbd.c always
passed a valid offset/length pair (because it already checked
the file length, to ensure that offset was in bounds), while
blockdev-nbd.c always passed 0/-1. Then nbd_export_new reduces
the size to a multiple of BDRV_SECTOR_SIZE (can only happen
when offset is not sector-aligned, since bdrv_getlength()
currently rounds up) (someday, it would be nice to have
byte-accurate lengths - but not today).
However, I'm finding it easier to work with the code if we are
consistent on having both callers pass in a valid length, and
just assert that things are sane in nbd_export_new, meaning
that no negative values were passed, and that offset+size does
not exceed 63 bits (as that really is a fundamental limit to
later operations, whether we use off_t or uint64_t).
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20190117193658.16413-6-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
We only have one caller that wants to export a bitmap name,
which it does right after creation of the export. But there is
still a brief window of time where an NBD client could see the
export but not the dirty bitmap, which a robust client would
have to interpret as meaning the entire image should be treated
as dirty. Better is to eliminate the window entirely, by
inlining nbd_export_bitmap() into nbd_export_new(), and refusing
to create the bitmap in the first place if the requested bitmap
can't be located.
We also no longer need logic for setting a different bitmap
name compared to the bitmap being exported.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20190111194720.15671-8-eblake@redhat.com>
The existing NBD code had a weird split where nbd_export_new()
created an export but did not add it to the list of exported
names until a later nbd_export_set_name() came along and grabbed
a second reference on the object; later, the first call to
nbd_export_close() drops the second reference while removing
the export from the list. This is in part because the QAPI
NbdServerRemoveNode enum documents the possibility of adding a
mode where we could do a soft disconnect: preventing new clients,
but waiting for existing clients to gracefully quit, based on
the mode used when calling nbd_export_close().
But in spite of all that, note that we never change the name of
an NBD export while it is exposed, which means it is easier to
just inline the process of setting the name as part of creating
the export.
Inline the contents of nbd_export_set_name() and
nbd_export_set_description() into the two points in an export
lifecycle where they matter, then adjust both callers to pass
the name up front. Note that for creation, all callers pass a
non-NULL name, (passing NULL at creation was for old style
servers, but we removed support for that in commit 7f7dfe2a),
so we can add an assert and do things unconditionally; but for
cleanup, because of the dual nature of nbd_export_close(), we
still have to be careful to avoid use-after-free. Along the
way, add a comment reminding ourselves of the potential of
adding a middle mode disconnect.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20190111194720.15671-5-eblake@redhat.com>
Our initial implementation of x-nbd-server-add-bitmap put
in a restriction because of incremental backups: in that
usage, we are exporting one qcow2 file (the temporary overlay
target of a blockdev-backup sync:none job) and a dirty bitmap
owned by a second qcow2 file (the source of the
blockdev-backup, which is the backing file of the temporary).
While both qcow2 files are still writable (the target in
order to capture copy-on-write of old contents, and the
source in order to track live guest writes in the meantime),
the NBD client expects to see constant data, including the
dirty bitmap. An enabled bitmap in the source would be
modified by guest writes, which is at odds with the NBD
export being a read-only constant view, hence the initial
code choice of enforcing a disabled bitmap (the intent is
that the exposed bitmap was disabled in the same transaction
that started the blockdev-backup job, although we don't want
to track enough state to actually enforce that).
However, consider the case of a bitmap contained in a read-only
node (including when the bitmap is found in a backing layer of
the active image). Because the node can't be modified, the
bitmap won't change due to writes, regardless of whether it is
still enabled. Forbidding the export unless the bitmap is
disabled is awkward, paritcularly since we can't change the
bitmap to be disabled (because the node is read-only).
Alternatively, consider the case of live storage migration,
where management directs the destination to create a writable
NBD server, then performs a drive-mirror from the source to
the target, prior to doing the rest of the live migration.
Since storage migration can be time-consuming, it may be wise
to let the destination include a dirty bitmap to track which
portions it has already received, where even if the migration
is interrupted and restarted, the source can query the
destination block status in order to potentially minimize
re-sending data that has not changed in the meantime on a
second attempt. Such code has not been written, and might not
be trivial (after all, a cluster being marked dirty in the
bitmap does not necessarily guarantee it has the desired
contents), but it makes sense that letting an active dirty
bitmap be exposed and changing alongside writes may prove
useful in the future.
Solve both issues by gating the restriction against a
disabled bitmap to only happen when the caller has requested
a read-only export, and where the BDS that owns the bitmap
(whether or not it is the BDS handed to nbd_export_new() or
from its backing chain) is still writable. We could drop
the check altogether (if management apps are prepared to
deal with a changing bitmap even on a read-only image), but
for now keeping a check for the read-only case still stands
a chance of preventing management errors.
Update iotest 223 to show the looser behavior by leaving
a bitmap enabled the whole run; note that we have to tear
down and re-export a node when handling an error.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20190111194720.15671-4-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
There's no need to read into a temporary buffer (oversized
since commit 7d3123e1) followed by a byteswap into a uint64_t
to check for a magic number via memcmp(), when the code
immediately below demonstrates reading into the uint64_t then
byteswapping in place and checking for a magic number via
integer math. What's more, having a different error message
when the server's first reply byte is 0 is unusual - it's no
different from any other wrong magic number, and we already
detected short reads. That whole strlen() issue has been
present and useless since commit 1d45f8b5 in 2010; perhaps it
was leftover debugging (since the correct magic number happens
to be ASCII)? Make the error messages more consistent and
detailed while touching things.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Richard W.M. Jones <rjones@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20181215135324.152629-9-eblake@redhat.com>
Connecting to a /dev/nbdN device is a Linux-specific action.
We were already masking -c and -d from 'qemu-nbd --help' on
non-linux. However, while -d fails with a sensible error
message, it took hunting through a couple of files to prove
that. What's more, the code for -c doesn't fail until after
it has created a pthread and tried to open a device - possibly
even printing an error message with %m on a non-Linux platform
in spite of the comment that %m is glibc-specific. Make the
failure happen sooner, then get rid of stubs that are no
longer needed because of the early exits.
While at it: tweak the blank newlines in --help output to be
consistent, whether or not built on Linux.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20181215135324.152629-7-eblake@redhat.com>
Reviewed-by: Richard W.M. Jones <rjones@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Consolidate on using decimal (not hex), on outputting the
option reply name (not just value), and a consistent comma between
clauses, when the client reports protocol discrepancies from the
server. While it won't affect normal operation, it makes
debugging additions easier.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Richard W.M. Jones <rjones@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20181215135324.152629-6-eblake@redhat.com>
Not all servers send free-form text alongside option error replies, but
for servers that do (such as qemu), we pass the server's message as a
hint alongside our own error reporting. However, it would also be
useful to trace such server messages, since we can't guarantee how the
hint may be consumed.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20181218225714.284495-3-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
These functions are used for formatting pretty trace points. We are
going to add some in block/nbd-client, so, let's publish all these
functions at once. Note, that nbd_reply_type_lookup is already
published, and constants, "named" by these functions live in
include/block/nbd.h too.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20181102151152.288399-3-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
The NBD spec, and even our code comment, says that if the client
asks for NBD_OPT_LIST_META_CONTEXT with 0 queries, then we should
reply with (a possibly-compressed representation of) ALL contexts
that we are willing to let them try. But commit 3d068aff forgot
to advertise qemu:dirty-bitmap:FOO.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20181130023232.3079982-2-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Commit 37ec36f6 intentionally ignores errors when trying to reply
to an NBD_OPT_ABORT request for plaintext clients, but did not make
the same change for a TLS server. Since NBD_OPT_ABORT is
documented as being a potential for an EPIPE when the client hangs
up without waiting for our reply, we don't need to pollute the
server's output with that failure.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20181117223221.2198751-1-eblake@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
A space was missing after the option number was printed:
Option 0x8not permitted before TLS
becomes
Option 0x8 not permitted before TLS
This fixes
commit 3668328303
Author: Eric Blake <eblake@redhat.com>
Date: Fri Oct 14 13:33:09 2016 -0500
nbd: Send message along with server NBD_REP_ERR errors
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20181116155325.22428-2-berrange@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
[eblake: move lone space to next line]
Signed-off-by: Eric Blake <eblake@redhat.com>
Whether it's "locked" or "frozen", it's in use and should
not be allowed for the purposes of this operation.
Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20181002230218.13949-7-jsnow@redhat.com
Signed-off-by: John Snow <jsnow@redhat.com>
After the previous commit, nbd_client_new's first parameter is always
NULL. Let's drop it with all corresponding old-style negotiation code
path which is unreachable now.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20181003170228.95973-3-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
[eblake: re-wrap short line]
Signed-off-by: Eric Blake <eblake@redhat.com>
We should not go to structured-read branch on CACHE command, fix that.
Bug introduced in bc37b06a5c "nbd/server: introduce NBD_CMD_CACHE"
with the whole feature and affects 3.0.0 release.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
CC: qemu-stable@nongnu.org
Message-Id: <20181003144738.70670-1-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
[eblake: commit message typo fix]
Signed-off-by: Eric Blake <eblake@redhat.com>
Taking the address of a field in a packed struct is a bad idea, because
it might not be actually aligned enough for that pointer type (and
thus cause a crash on dereference on some host architectures). Newer
versions of clang warn about this. Avoid the bug by not using the
"modify in place" byte swapping functions.
This patch was produced with the following spatch script:
@@
expression E;
@@
-be16_to_cpus(&E);
+E = be16_to_cpu(E);
@@
expression E;
@@
-be32_to_cpus(&E);
+E = be32_to_cpu(E);
@@
expression E;
@@
-be64_to_cpus(&E);
+E = be64_to_cpu(E);
@@
expression E;
@@
-cpu_to_be16s(&E);
+E = cpu_to_be16(E);
@@
expression E;
@@
-cpu_to_be32s(&E);
+E = cpu_to_be32(E);
@@
expression E;
@@
-cpu_to_be64s(&E);
+E = cpu_to_be64(E);
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <20180927164200.15097-1-peter.maydell@linaro.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
[eblake: rebase, and squash in missed changes]
Signed-off-by: Eric Blake <eblake@redhat.com>
This is necessary for efficient block-status export, for clients which
support it. (qemu is not yet such a client, but could become one.)
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20180704112302.471456-3-vsementsov@virtuozzo.com>
[eblake: grammar tweaks]
Signed-off-by: Eric Blake <eblake@redhat.com>
bitmap_to_extents function is broken: it switches dirty variable after
every iteration, however it can process only part of dirty (or zero)
area during one iteration in case when this area is too large for one
extent.
Fortunately, the bug doesn't produce wrong extent flags: it just inserts
a zero-length extent between sequential extents representing large dirty
(or zero) area. However, zero-length extents are forbidden by the NBD
protocol. So, a careful client should consider such a reply as a server
fault, while a less-careful will likely ignore zero-length extents.
The bug can only be triggered by a client that requests block status
for nearly 4G at once (a request of 4G and larger is impossible per
the protocol, and requests smaller than 4G less the bitmap granularity
cause the loop to quit iterating rather than revisit the tail of the
large area); it also cannot trigger if the client used the
NBD_CMD_FLAG_REQ_ONE flag. Since qemu 3.0 as client (using the
x-dirty-bitmap extension) always passes the flag, it is immune; and
we are not aware of other open-source clients that know how to request
qemu:dirty-bitmap:FOO contexts. Clients that want to avoid the bug
could cap block status requests to a smaller length, such as 2G or 3G.
Fix this by more careful handling of dirty variable.
Bug was introduced in 3d068aff16
"nbd/server: implement dirty bitmap export", with the whole function.
and is present in v3.0.0 release.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20180914165116.23182-1-vsementsov@virtuozzo.com>
CC: qemu-stable@nongnu.org
Reviewed-by: Eric Blake <eblake@redhat.com>
[eblake: improved commit message]
Signed-off-by: Eric Blake <eblake@redhat.com>
Call nbd_co_send_extents() with correct length parameter
(extent.length may be smaller than original length).
Also, switch length parameter type to uint32_t, to correspond with
request->len and similar nbd_co_send_bitmap().
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20180704112302.471456-2-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
In order to test that the NBD server is properly advertising
dirty bitmaps, we need a bare minimum client that can request
and read the context. Since feature freeze for 3.0 is imminent,
this is the smallest workable patch, which replaces the qemu
block status report with the results of the NBD server's dirty
bitmap (making it very easy to use 'qemu-img map --output=json'
to learn where the dirty portions are). Note that the NBD
protocol defines a dirty section with the same bit but opposite
sense that normal "base:allocation" uses to report an allocated
section; so in qemu-img map output, "data":true corresponds to
clean, "data":false corresponds to dirty.
A more complete solution that allows dirty bitmaps to be queried
at the same time as normal block status will be required before
this addition can lose the x- prefix. Until then, the fact that
this replaces normal status with dirty status means actions
like 'qemu-img convert' will likely misbehave due to treating
dirty regions of the file as if they are unallocated.
The next patch adds an iotest to exercise this new code.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180702191458.28741-2-eblake@redhat.com>
In my hurry to fix a build failure, I introduced a logic bug.
The assertion conditional is backwards, meaning that qemu will
now abort instead of reporting dirty bitmap status.
The bug can only be tickled by an NBD client using an exported
dirty bitmap (which is still an experimental QMP command), so
it's not the end of the world for supported usage (and neither
'make check' nor qemu-iotests fails); but it also shows that we
really want qemu-io support for reading dirty bitmaps if only
so that I can add iotests coverage to prevent future
brown-bag-of-shame events like this one.
Fixes: 45eb6fb6
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180622153509.375130-1-eblake@redhat.com>
The code has a while() loop that always initialized 'end', and
the loop always executes at least once (as evidenced by the assert()
just prior to the loop). But some versions of gcc still complain
that 'end' is used uninitialized, so silence them.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20180622125814.345274-1-eblake@redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Handle nbd CACHE command. Just do read, without sending read data back.
Cache mechanism should be done by exported node driver chain.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20180413143156.11409-1-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
[eblake: fix two missing case labels in switch statements]
Signed-off-by: Eric Blake <eblake@redhat.com>
Handle a new NBD meta namespace: "qemu", and corresponding queries:
"qemu:dirty-bitmap:<export bitmap name>".
With the new metadata context negotiated, BLOCK_STATUS query will reply
with dirty-bitmap data, converted to extents. The new public function
nbd_export_bitmap selects which bitmap to export. For now, only one bitmap
may be exported.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20180609151758.17343-5-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
[eblake: wording tweaks, minor cleanups, additional tracing]
Signed-off-by: Eric Blake <eblake@redhat.com>
Add nbd_meta_pattern() and nbd_meta_empty_or_pattern() helpers for
metadata query parsing. nbd_meta_pattern() will be reused for the
"qemu" namespace in following patches.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20180609151758.17343-4-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
[eblake: comment tweaks]
Signed-off-by: Eric Blake <eblake@redhat.com>
Use NBDExport pointer instead of just export name: there is no need to
store a duplicated name in the struct; moreover, NBDExport will be used
further.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20180609151758.17343-3-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
[eblake: commit message grammar tweak]
Signed-off-by: Eric Blake <eblake@redhat.com>
Return code = 1 doesn't mean that we parsed base:allocation. Use
correct traces in both -parsed and -skipped cases.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20180609151758.17343-2-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
[eblake: comment tweaks]
Signed-off-by: Eric Blake <eblake@redhat.com>
The NBD spec says that behavior is unspecified if the client
requests 0 length for block status; but since the structured
reply is documenting as returning a non-zero length, it's
easier to just diagnose this with an EINVAL error than to
figure out what to return.
CC: qemu-stable@nongnu.org
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180621124937.166549-1-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
A missing space makes for poor error messages, and sizes can't
go negative. Also, we missed diagnosing a server that sends
a maximum block size less than the minimum.
Fixes: 081dd1fe
CC: qemu-stable@nongnu.org
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180501154654.943782-1-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Initialize received variable. Otherwise, is is possible for server to
answer without any contexts, but we will set context_id to something
random (received_id is not initialized too) and return 1, which is
wrong.
To solve it, just initialize received to false. Initialize received_id
too, just to make all possible checkers happy.
Bug was introduced in 78a33ab587 "nbd: BLOCK_STATUS for
standard get_block_status function: client part" with the whole
function.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20180427142002.21930-2-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
CC: qemu-stable@nongnu.org
Signed-off-by: Eric Blake <eblake@redhat.com>
Having a more detailed log of the interaction between client and
server is invaluable in debugging how meta context negotiation
actually works.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180330130950.1931229-1-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
It's never a good idea to blindly read for size bytes as
returned by the server without first validating that the size
is within bounds; a malicious or buggy server could cause us
to hang or get out of sync from reading further messages.
It may be smarter to try and teach the client to cope with
unexpected context ids by silently ignoring them instead of
hanging up on the server, but for now, if the server doesn't
reply with exactly the one context we expect, it's easier to
just give up - however, if we give up for any reason other
than an I/O failure, we might as well try to politely tell
the server we are quitting rather than continuing.
Fix some typos in the process.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180329231837.1914680-1-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Minimal realization: only one extent in server answer is supported.
Flag NBD_CMD_FLAG_REQ_ONE is used to force this behavior.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20180312152126.286890-6-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
[eblake: grammar tweaks, fix min_block check and 32-bit cap, use -1
instead of errno on failure in nbd_negotiate_simple_meta_context,
ensure that block status makes progress on success]
Signed-off-by: Eric Blake <eblake@redhat.com>
Minimal realization: only one extent in server answer is supported.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20180312152126.286890-4-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
[eblake: tweak whitespace, move constant from .h to .c, improve
logic of check_meta_export_name, simplify nbd_negotiate_options
by doing more in nbd_negotiate_meta_queries]
Signed-off-by: Eric Blake <eblake@redhat.com>
Add helper to read name in format:
uint32 len (<= NBD_MAX_NAME_SIZE)
len bytes string (not 0-terminated)
The helper will be reused in following patch.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20180312152126.286890-3-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
[eblake: grammar fixes, actually check error]
Signed-off-by: Eric Blake <eblake@redhat.com>
NBD_REP_ERR_INVALID is often parameter to nbd_opt_drop and it would
be used more in following patches. So, let's add a helper.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180312152126.286890-2-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
The NBD spec states that since trim requests can affect disk contents,
then they should allow for FUA semantics just like writes for ensuring
the disk has settled before returning. As bdrv_[co_]pdiscard() does
not support a flags argument, we can't pass FUA down the block layer
stack, and must therefore emulate it with a flush at the NBD layer.
Note that in all reality, generic well-behaved clients will never
send TRIM+FUA (in fact, qemu as a client never does, and we have no
intention to plumb flags into bdrv_pdiscard). This is because the
NBD protocol states that it is unspecified to READ a trimmed area
(you might read stale data, all zeroes, or even random unrelated
data) without first rewriting it, and even the experimental
BLOCK_STATUS extension states that TRIM need not affect reported
status. Thus, in the general case, a client cannot tell the
difference between an arbitrary server that ignores TRIM, a server
that had a power outage without flushing to disk, and a server that
actually affected the disk before returning; so waiting for the
trim actions to flush to disk makes little sense. However, for a
specific client and server pair, where the client knows the server
treats TRIM'd areas as guaranteed reads-zero, waiting for a flush
makes sense, hence why the protocol documents that FUA is valid on
trim. So, even though the NBD protocol doesn't have a way for the
server to advertise what effects (if any) TRIM will actually have,
and thus any client that relies on specific effects is probably
in error, we can at least support a client that requests TRIM+FUA.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180307225732.155835-1-eblake@redhat.com>
Split out request handling logic.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20180308184636.178534-6-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
[eblake: touch up blank line placement]
Signed-off-by: Eric Blake <eblake@redhat.com>
nbd_trip has difficult logic when sending replies: it tries to use one
code path for all replies. It is ok for simple replies, but is not
comfortable for structured replies. Also, two types of error (and
corresponding messages in local_err) - fatal (leading to disconnect)
and not-fatal (just to be sent to the client) are difficult to follow.
To make things a bit clearer, the following is done:
- split CMD_READ logic to separate function. It is the most difficult
command for now, and it is definitely cramped inside nbd_trip. Also,
it is difficult to follow CMD_READ logic, shared between
"case NBD_CMD_READ" and "if"s under "reply:" label.
- create separate helper function nbd_send_generic_reply() and use it
both in new nbd_do_cmd_read and for other commands in nbd_trip instead
of common code-path under "reply:" label in nbd_trip. The helper
supports an error message, so logic with local_err in nbd_trip is
simplified.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20180308184636.178534-5-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
[eblake: grammar tweaks and blank line placement]
Signed-off-by: Eric Blake <eblake@redhat.com>
Since the unchanged code has just set client->recv_coroutine to
NULL before calling nbd_client_receive_next_request(), we are
spawning a new coroutine unconditionally, but the first thing
that coroutine will do is check for client->closing, making it
a no-op if we have already detected that the client is going
away. Furthermore, for any error other than EIO (where we
disconnect, which itself sets client->closing), if the client
has already gone away, we'll probably encounter EIO later
in the function and attempt disconnect at that point. Logically,
as soon as we know the connection is closing, there is no need
to try a likely-to-fail a response or spawn a no-op coroutine.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20180308184636.178534-4-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
[eblake: squash in further reordering: hoist check before spawning
next coroutine, and document rationale in commit message]
Signed-off-by: Eric Blake <eblake@redhat.com>
In case of io error in nbd_co_send_sparse_read we should not
"goto reply:", as it was a fatal error and the common behavior
is to disconnect in this case. We should not try to send the
client an additional error reply, since we already hit a
channel-io error on our previous attempt to send one.
Fix this by handling block-status error in nbd_co_send_sparse_read,
so nbd_co_send_sparse_read fails only on io error. Then just skip
common "reply:" code path in nbd_trip.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20180308184636.178534-3-vsementsov@virtuozzo.com>
[eblake: grammar tweaks]
Signed-off-by: Eric Blake <eblake@redhat.com>
To be reused in nbd_co_send_sparse_read() in the following patch.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20180308184636.178534-2-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
A new parameter "context" is added to qio_channel_tls_handshake() is to
allow the TLS to be run on a non-default context. Still, no functional
change.
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
1. NBD_REP_ERR_INVALID is not only about length, so, make message more
general
2. hex format is not very good: it's hard to read something like
"option a (set meta context)", so switch to dec.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <1518702707-7077-6-git-send-email-vsementsov@virtuozzo.com>
[eblake: expand scope of patch: ALL uses of nbd_opt_lookup and
nbd_rep_lookup are now decimal]
Signed-off-by: Eric Blake <eblake@redhat.com>
Expose the new constants and structs that will be used by both
server and client implementations of NBD_CMD_BLOCK_STATUS (the
command is currently experimental at
https://github.com/NetworkBlockDevice/nbd/blob/extension-blockstatus/doc/proto.md
but will hopefully be stabilized soon).
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <1518702707-7077-4-git-send-email-vsementsov@virtuozzo.com>
[eblake: split from larger patch on server implementation]
Signed-off-by: Eric Blake <eblake@redhat.com>
This cleanup makes the number of objects depending on qapi/error.h
drop from 1910 (out of 4743) to 1612 in my "build everything" tree.
While there, separate #include from file comment with a blank line,
and drop a useless comment on why qemu/osdep.h is included first.
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20180201111846.21846-5-armbru@redhat.com>
[Semantic conflict with commit 34e304e975 resolved, OSX breakage fixed]
Add command for removing an export. It is needed for cases when we
don't want to keep the export after the operation on it was completed.
The other example is a temporary node, created with blockdev-add.
If we want to delete it we should firstly remove any corresponding
NBD export.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20180119135719.24745-3-vsementsov@virtuozzo.com>
[eblake: drop dead nb_clients code]
Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20171122101958.17065-6-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Rather than making every callsite perform length sanity checks
and error reporting, add the helper functions nbd_opt_read()
and nbd_opt_drop() that use the length stored in the client
struct; also add an assertion that optlen is 0 before any
option (ie. any previous option was fully handled), complementing
the assertion added in an earlier patch that optlen is 0 after
all negotiation completes.
Note that the call in nbd_negotiate_handle_export_name() does
not use the new helper (in part because the server cannot
reply to NBD_OPT_EXPORT_NAME - it either succeeds or the
connection drops).
Based on patches by Vladimir Sementsov-Ogievskiy.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20180110230825.18321-6-eblake@redhat.com>
This will be useful for the next patch.
Based on a patch by Vladimir Sementsov-Ogievskiy
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20180110230825.18321-5-eblake@redhat.com>
When a client abruptly disconnects before we've finished reading
the name sent with NBD_OPT_EXPORT_NAME, we are better off logging
the failure as EIO (we can't communicate with the client), rather
than EINVAL (the client sent bogus data).
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20180110230825.18321-4-eblake@redhat.com>
Instead of passing currently negotiating option and its length to
many of negotiation functions let's just store them on NBDClient
struct to be state-variables of negotiation phase.
This unifies semantics of negotiation functions and allows
tracking changes of remaining option length in future patches.
Asssert that optlen is back to 0 after negotiation (including
old-style connections which don't negotiate), although we need
more patches before we can assert optlen is 0 between options
during negotiation.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20171122101958.17065-2-vsementsov@virtuozzo.com>
[eblake: rebase, commit message tweak, assert !optlen after
negotiation completes]
Signed-off-by: Eric Blake <eblake@redhat.com>
No semantic change, but will make it easier for an upcoming patch
to refactor code without having to add forward declarations. Fix
a poor comment while at it.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20180110230825.18321-2-eblake@redhat.com>
Rename nbd_option and nbd_opt_reply to NBDOption and NBDOptionReply
to correspond to Qemu coding style and other structures here.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20171122101958.17065-5-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
This place is not obvious, nbd_export_close may theoretically reduce
refcount to 0. It may happen if someone calls nbd_export_put on named
export not through nbd_export_set_name when refcount is 1.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20171207155102.66622-2-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
If we are careful to handle 0-length read requests correctly,
we can optimize our sparse read to send the NBD_REPLY_FLAG_DONE
bit on our last OFFSET_DATA or OFFSET_HOLE chunk rather than
needing a separate chunk.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20171107030912.23930-3-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
The reason that NBD added structured reply in the first place was
to allow for efficient reads of sparse files, by allowing the
reply to include chunks to quickly communicate holes to the client
without sending lots of zeroes over the wire. Time to implement
this in the server; our client can already read such data.
We can only skip holes insofar as the block layer can query them;
and only if the client is okay with a fragmented request (if a
client requests NBD_CMD_FLAG_DF and the entire read is a hole, we
could technically return a single NBD_REPLY_TYPE_OFFSET_HOLE, but
that's a fringe case not worth catering to here). Sadly, the
control flow is a bit wonkier than I would have preferred, but
it was minimally invasive to have a split in the action between
a fragmented read (handled directly where we recognize
NBD_CMD_READ with the right conditions, and sending multiple
chunks) vs. a single read (handled at the end of nbd_trip, for
both simple and structured replies, when we know there is only
one thing being read). Likewise, I didn't make any effort to
optimize the final chunk of a fragmented read to set the
NBD_REPLY_FLAG_DONE, but unconditionally send that as a separate
NBD_REPLY_TYPE_NONE.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20171107030912.23930-2-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Introduced in commit f37708f6b8 (2.10). The NBD spec says a client
can request export names up to 4096 bytes in length, even though
they should not expect success on names longer than 256. However,
qemu hard-codes the limit of 256, and fails to filter out a client
that probes for a longer name; the result is a stack smash that can
potentially give an attacker arbitrary control over the qemu
process.
The smash can be easily demonstrated with this client:
$ qemu-io f raw nbd://localhost:10809/$(printf %3000d 1 | tr ' ' a)
If the qemu NBD server binary (whether the standalone qemu-nbd, or
the builtin server of QMP nbd-server-start) was compiled with
-fstack-protector-strong, the ability to exploit the stack smash
into arbitrary execution is a lot more difficult (but still
theoretically possible to a determined attacker, perhaps in
combination with other CVEs). Still, crashing a running qemu (and
losing the VM) is bad enough, even if the attacker did not obtain
full execution control.
CC: qemu-stable@nongnu.org
Signed-off-by: Eric Blake <eblake@redhat.com>
The NBD spec gives us permission to abruptly disconnect on clients
that send outrageously large option requests, rather than having
to spend the time reading to the end of the option. No real
option request requires that much data anyways; and meanwhile, we
already have the practice of abruptly dropping the connection on
any client that sends NBD_CMD_WRITE with a payload larger than 32M.
For comparison, nbdkit drops the connection on any request with
more than 4096 bytes; however, that limit is probably too low
(as the NBD spec states an export name can theoretically be up
to 4096 bytes, which means a valid NBD_OPT_INFO could be even
longer) - even if qemu doesn't permit exports longer than 256
bytes.
It could be argued that a malicious client trying to get us to
read nearly 4G of data on a bad request is a form of denial of
service. In particular, if the server requires TLS, but a client
that does not know the TLS credentials sends any option (other
than NBD_OPT_STARTTLS or NBD_OPT_EXPORT_NAME) with a stated
payload of nearly 4G, then the server was keeping the connection
alive trying to read all the payload, tying up resources that it
would rather be spending on a client that can get past the TLS
handshake. Hence, this warranted a CVE.
Present since at least 2.5 when handling known options, and made
worse in 2.6 when fixing support for NBD_FLAG_C_FIXED_NEWSTYLE
to handle unknown options.
CC: qemu-stable@nongnu.org
Signed-off-by: Eric Blake <eblake@redhat.com>
The NBD spec says an attempt to NBD_CMD_TRIM on a read-only
export should fail with EPERM, as a trim has the potential
to change disk contents, but we were relying on the block
layer to catch that for us, which might not always give the
right error (and even if it does, it does not let us pass
back a sane message for structured replies).
The NBD spec says an attempt to NBD_CMD_WRITE_ZEROES out of
bounds should fail with ENOSPC, not EINVAL.
Our check for u64 offset + u32 length wraparound up front is
pointless; nothing uses offset until after the second round
of sanity checks, and we can just as easily ensure there is
no wraparound by checking whether offset is in bounds (since
a disk size cannot exceed off_t which is 63 bits, adding a
32-bit number for a valid offset can't overflow). Bonus:
dropping the up-front check lets us keep the connection alive
after NBD_CMD_WRITE, whereas before we would drop the
connection (of course, any client sending a packet that would
trigger the failure is already buggy, so it's also okay to
drop the connection, but better quality-of-implementation
never hurts).
Solve all of these issues by some code motion and improved
request validation.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20171115213557.3548-1-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
The NBD spec says that a server may fail any transmission request
with ESHUTDOWN when it is apparent that no further request from
the client can be successfully honored. The client is supposed
to then initiate a soft shutdown (wait for all remaining in-flight
requests to be answered, then send NBD_CMD_DISC). However, since
qemu's server never uses ESHUTDOWN errors, this code was mostly
untested since its introduction in commit b6f5d3b5.
More recently, I learned that nbdkit as the NBD server is able to
send ESHUTDOWN errors, so I finally tested this code, and noticed
that our client was special-casing ESHUTDOWN to cause a hard
shutdown (immediate disconnect, with no NBD_CMD_DISC), but only
if the server sends this error as a simple reply. Further
investigation found that commit d2febedb introduced a regression
where structured replies behave differently than simple replies -
but that the structured reply behavior is more in line with the
spec (even if we still lack code in nbd-client.c to properly quit
sending further requests). So this patch reverts the portion of
b6f5d3b5 that introduced an improper hard-disconnect special-case
at the lower level, and leaves the future enhancement of a nicer
soft-disconnect at the higher level for another day.
CC: qemu-stable@nongnu.org
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20171113194857.13933-1-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
When using error prepend(), it is necessary to end with a space
in the format string; otherwise, messages come out incorrectly,
such as when connecting to a socket that hangs up immediately:
can't open device nbd://localhost:10809/: Failed to read dataUnexpected end-of-file before all bytes were read
Originally botched in commit e44ed99d, then several more instances
added in the meantime.
Pre-existing and not fixed here: we are inconsistent on capitalization;
some of our messages start with lower case, and others start with upper,
although the use of error_prepend() is much nicer to read when all
fragments consistently start with lower.
CC: qemu-stable@nongnu.org
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20171113152424.25381-1-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
The NBD spec was recently clarified to state that a read of length 0
should not be attempted by a compliant client; but that a server must
still handle it correctly in an unspecified manner (that is, either
a successful no-op or an error reply, but not a crash) [1]. However,
it also implies that NBD_REPLY_TYPE_OFFSET_DATA must have a non-zero
payload length, but our existing code was replying with a chunk
that a picky client could reject as invalid because it was missing
a payload (our own client implementation was recently patched to be
that picky, after first fixing it to not send 0-length requests).
We are already doing successful no-ops for 0-length writes and for
non-structured reads; so for consistency, we want structured reply
reads to also be a no-op. The easiest way to do this is to return
a NBD_REPLY_TYPE_NONE chunk; this is best done via a new helper
function (especially since future patches for other structured
replies may benefit from using the same helper).
[1] https://github.com/NetworkBlockDevice/nbd/commit/ee926037
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20171108215703.9295-8-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
A closer read of the NBD spec shows that a structured reply chunk
for a hole is not quite identical to the prefix of a data chunk,
because the hole has to also send a 32-bit size field. Although
we do not yet send holes, we should fix the misleading information
in our header and make it easier for a future patch to support
sparse reads. Messed up in commit bae245d1.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20171108215703.9295-5-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
It's useful to know which structured reply chunk is being processed.
Missed in commit d2febedb.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20171108215703.9295-4-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
namelen should be here, length is unrelated, and always 0 at this
point. Broken in introduction in commit f37708f6, but mostly
harmless (replying with '' as the name does not violate protocol,
and does not confuse qemu as the nbd client since our implementation
does not ask for the name; but might confuse some other client that
does ask for the name especially if the default export is different
than the export name being queried).
Adding an assert makes it obvious that we are not skipping any bytes
in the client's message, as well as making it obvious that we were
using the wrong variable.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
CC: qemu-stable@nongnu.org
Message-Id: <20171101154204.27146-1-vsementsov@virtuozzo.com>
[eblake: improve commit message, squash in assert addition]
Signed-off-by: Eric Blake <eblake@redhat.com>
Minimal implementation: for structured error only error_report error
message.
Note that test 83 is now more verbose, because the implementation
prints more warnings about unexpected communication errors; perhaps
future patches should tone things down by using trace messages
instead of traces, but the common case of successful communication
is no noisier than before.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20171027104037.8319-13-eblake@redhat.com>
An upcoming change to block/nbd-client.c will want to read the
tail of a structured reply chunk directly from the wire. Move
this function to make it easier.
Based on a patch from Vladimir Sementsov-Ogievskiy.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20171027104037.8319-12-eblake@redhat.com>
In following patch nbd_receive_reply will be used both for simple
and structured reply header receiving.
NBDReply is altered into union of simple reply header and structured
reply chunk header, simple error translation moved to block/nbd-client
to be consistent with further structured reply error translation.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20171027104037.8319-11-eblake@redhat.com>
Split out nbd_request_simple_option to be reused for structured reply
option.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20171027104037.8319-10-eblake@redhat.com>
The NBD spec permits including a human-readable error string if
structured replies are in force, so we might as well send the
client the message that we logged on any error.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20171027104037.8319-9-eblake@redhat.com>
Minimal implementation of structured read: one structured reply chunk,
no segmentation.
Minimal structured error implementation: no text message.
Support DF flag, but just ignore it, as there is no segmentation any
way.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20171027104037.8319-8-eblake@redhat.com>
Consolidate the response for a non-zero-length option payload
into a new function, nbd_reject_length(). This check will
also be used when introducing support for structured replies.
Note that STARTTLS response differs based on time: if the connection
is still unencrypted, we set fatal to true (a client that can't
request TLS correctly may still think that we are ready to start
the TLS handshake, so we must disconnect); while if the connection
is already encrypted, the client is sending a bogus request but
is no longer at risk of being confused by continuing the connection.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20171027104037.8319-7-eblake@redhat.com>
[eblake: correct return value on STARTTLS]
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Instead of making each caller check whether a transmission error
occurred, we can sink a common error check to the end of the loop.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20171027104037.8319-6-eblake@redhat.com>
[eblake: squash in compiler warning fix]
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
When the server is read-only, we were already reporting an error
message for NBD_CMD_WRITE_ZEROES, but failed to set errp for a
similar NBD_CMD_WRITE. This will matter more once structured
replies allow the server to propagate the errp information back
to the client. While at it, use an error message that makes a
bit more sense if viewed on the client side.
Note that when using qemu-io to test qemu-nbd behavior, it is
rather difficult to convince qemu-io to send protocol violations
(such as a read beyond bounds), because we have a lot of active
checking on the client side that a qemu-io request makes sense
before it ever goes over the wire to the server. The case of a
client attempting a write when the server is started as
'qemu-nbd -r' is one of the few places where we can easily test
error path handling, without having to resort to hacking in known
temporary bugs to either the server or client. [Maybe we want a
future patch to the client to do up-front checking on writes to a
read-only export, the way it does up-front bounds checking; but I
don't see anything in the NBD spec that points to a protocol
violation in our current behavior.]
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20171027104037.8319-5-eblake@redhat.com>
Upcoming patches will implement the NBD structured reply
extension [1] for both client and server roles. Declare the
constants, structs, and lookup routines that will be valuable
whether the server or client code is backported in isolation.
This includes moving one constant from an internal header to
the public header, as part of the structured read processing
will be done in block/nbd-client.c rather than nbd/client.c.
[1]https://github.com/NetworkBlockDevice/nbd/blob/extension-structured-reply/doc/proto.md
Based on patches from Vladimir Sementsov-Ogievskiy.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20171027104037.8319-4-eblake@redhat.com>
This is needed in preparation for structured reply handling,
as we will be performing the translation from NBD error to
system errno value higher in the stack at block/nbd-client.c.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20171027104037.8319-3-eblake@redhat.com>
NBD errors were originally sent over the wire based on Linux errno
values; but not all the world is Linux, and not all platforms share
the same values. Since a number isn't very easy to decipher on all
platforms, update the trace messages to include the name of NBD
errors being sent/received over the wire. Tweak the trace messages
to be at the point where we are using the NBD error, not the
translation to the host errno values.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20171027104037.8319-2-eblake@redhat.com>
Prepare indenting for the following commit.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20171012095319.136610-9-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Send qiov via qio_channel_writev_all instead of calling nbd_write twice
with a cork.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20171012095319.136610-8-vsementsov@virtuozzo.com>
[eblake: rebase to tweaks earlier in series]
Signed-off-by: Eric Blake <eblake@redhat.com>