If we are careful to handle 0-length read requests correctly,
we can optimize our sparse read to send the NBD_REPLY_FLAG_DONE
bit on our last OFFSET_DATA or OFFSET_HOLE chunk rather than
needing a separate chunk.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20171107030912.23930-3-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
The reason that NBD added structured reply in the first place was
to allow for efficient reads of sparse files, by allowing the
reply to include chunks to quickly communicate holes to the client
without sending lots of zeroes over the wire. Time to implement
this in the server; our client can already read such data.
We can only skip holes insofar as the block layer can query them;
and only if the client is okay with a fragmented request (if a
client requests NBD_CMD_FLAG_DF and the entire read is a hole, we
could technically return a single NBD_REPLY_TYPE_OFFSET_HOLE, but
that's a fringe case not worth catering to here). Sadly, the
control flow is a bit wonkier than I would have preferred, but
it was minimally invasive to have a split in the action between
a fragmented read (handled directly where we recognize
NBD_CMD_READ with the right conditions, and sending multiple
chunks) vs. a single read (handled at the end of nbd_trip, for
both simple and structured replies, when we know there is only
one thing being read). Likewise, I didn't make any effort to
optimize the final chunk of a fragmented read to set the
NBD_REPLY_FLAG_DONE, but unconditionally send that as a separate
NBD_REPLY_TYPE_NONE.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20171107030912.23930-2-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Introduced in commit f37708f6b8 (2.10). The NBD spec says a client
can request export names up to 4096 bytes in length, even though
they should not expect success on names longer than 256. However,
qemu hard-codes the limit of 256, and fails to filter out a client
that probes for a longer name; the result is a stack smash that can
potentially give an attacker arbitrary control over the qemu
process.
The smash can be easily demonstrated with this client:
$ qemu-io f raw nbd://localhost:10809/$(printf %3000d 1 | tr ' ' a)
If the qemu NBD server binary (whether the standalone qemu-nbd, or
the builtin server of QMP nbd-server-start) was compiled with
-fstack-protector-strong, the ability to exploit the stack smash
into arbitrary execution is a lot more difficult (but still
theoretically possible to a determined attacker, perhaps in
combination with other CVEs). Still, crashing a running qemu (and
losing the VM) is bad enough, even if the attacker did not obtain
full execution control.
CC: qemu-stable@nongnu.org
Signed-off-by: Eric Blake <eblake@redhat.com>
The NBD spec gives us permission to abruptly disconnect on clients
that send outrageously large option requests, rather than having
to spend the time reading to the end of the option. No real
option request requires that much data anyways; and meanwhile, we
already have the practice of abruptly dropping the connection on
any client that sends NBD_CMD_WRITE with a payload larger than 32M.
For comparison, nbdkit drops the connection on any request with
more than 4096 bytes; however, that limit is probably too low
(as the NBD spec states an export name can theoretically be up
to 4096 bytes, which means a valid NBD_OPT_INFO could be even
longer) - even if qemu doesn't permit exports longer than 256
bytes.
It could be argued that a malicious client trying to get us to
read nearly 4G of data on a bad request is a form of denial of
service. In particular, if the server requires TLS, but a client
that does not know the TLS credentials sends any option (other
than NBD_OPT_STARTTLS or NBD_OPT_EXPORT_NAME) with a stated
payload of nearly 4G, then the server was keeping the connection
alive trying to read all the payload, tying up resources that it
would rather be spending on a client that can get past the TLS
handshake. Hence, this warranted a CVE.
Present since at least 2.5 when handling known options, and made
worse in 2.6 when fixing support for NBD_FLAG_C_FIXED_NEWSTYLE
to handle unknown options.
CC: qemu-stable@nongnu.org
Signed-off-by: Eric Blake <eblake@redhat.com>
The NBD spec says an attempt to NBD_CMD_TRIM on a read-only
export should fail with EPERM, as a trim has the potential
to change disk contents, but we were relying on the block
layer to catch that for us, which might not always give the
right error (and even if it does, it does not let us pass
back a sane message for structured replies).
The NBD spec says an attempt to NBD_CMD_WRITE_ZEROES out of
bounds should fail with ENOSPC, not EINVAL.
Our check for u64 offset + u32 length wraparound up front is
pointless; nothing uses offset until after the second round
of sanity checks, and we can just as easily ensure there is
no wraparound by checking whether offset is in bounds (since
a disk size cannot exceed off_t which is 63 bits, adding a
32-bit number for a valid offset can't overflow). Bonus:
dropping the up-front check lets us keep the connection alive
after NBD_CMD_WRITE, whereas before we would drop the
connection (of course, any client sending a packet that would
trigger the failure is already buggy, so it's also okay to
drop the connection, but better quality-of-implementation
never hurts).
Solve all of these issues by some code motion and improved
request validation.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20171115213557.3548-1-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
The NBD spec says that a server may fail any transmission request
with ESHUTDOWN when it is apparent that no further request from
the client can be successfully honored. The client is supposed
to then initiate a soft shutdown (wait for all remaining in-flight
requests to be answered, then send NBD_CMD_DISC). However, since
qemu's server never uses ESHUTDOWN errors, this code was mostly
untested since its introduction in commit b6f5d3b5.
More recently, I learned that nbdkit as the NBD server is able to
send ESHUTDOWN errors, so I finally tested this code, and noticed
that our client was special-casing ESHUTDOWN to cause a hard
shutdown (immediate disconnect, with no NBD_CMD_DISC), but only
if the server sends this error as a simple reply. Further
investigation found that commit d2febedb introduced a regression
where structured replies behave differently than simple replies -
but that the structured reply behavior is more in line with the
spec (even if we still lack code in nbd-client.c to properly quit
sending further requests). So this patch reverts the portion of
b6f5d3b5 that introduced an improper hard-disconnect special-case
at the lower level, and leaves the future enhancement of a nicer
soft-disconnect at the higher level for another day.
CC: qemu-stable@nongnu.org
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20171113194857.13933-1-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
When using error prepend(), it is necessary to end with a space
in the format string; otherwise, messages come out incorrectly,
such as when connecting to a socket that hangs up immediately:
can't open device nbd://localhost:10809/: Failed to read dataUnexpected end-of-file before all bytes were read
Originally botched in commit e44ed99d, then several more instances
added in the meantime.
Pre-existing and not fixed here: we are inconsistent on capitalization;
some of our messages start with lower case, and others start with upper,
although the use of error_prepend() is much nicer to read when all
fragments consistently start with lower.
CC: qemu-stable@nongnu.org
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20171113152424.25381-1-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
The NBD spec was recently clarified to state that a read of length 0
should not be attempted by a compliant client; but that a server must
still handle it correctly in an unspecified manner (that is, either
a successful no-op or an error reply, but not a crash) [1]. However,
it also implies that NBD_REPLY_TYPE_OFFSET_DATA must have a non-zero
payload length, but our existing code was replying with a chunk
that a picky client could reject as invalid because it was missing
a payload (our own client implementation was recently patched to be
that picky, after first fixing it to not send 0-length requests).
We are already doing successful no-ops for 0-length writes and for
non-structured reads; so for consistency, we want structured reply
reads to also be a no-op. The easiest way to do this is to return
a NBD_REPLY_TYPE_NONE chunk; this is best done via a new helper
function (especially since future patches for other structured
replies may benefit from using the same helper).
[1] https://github.com/NetworkBlockDevice/nbd/commit/ee926037
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20171108215703.9295-8-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
A closer read of the NBD spec shows that a structured reply chunk
for a hole is not quite identical to the prefix of a data chunk,
because the hole has to also send a 32-bit size field. Although
we do not yet send holes, we should fix the misleading information
in our header and make it easier for a future patch to support
sparse reads. Messed up in commit bae245d1.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20171108215703.9295-5-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
It's useful to know which structured reply chunk is being processed.
Missed in commit d2febedb.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20171108215703.9295-4-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
namelen should be here, length is unrelated, and always 0 at this
point. Broken in introduction in commit f37708f6, but mostly
harmless (replying with '' as the name does not violate protocol,
and does not confuse qemu as the nbd client since our implementation
does not ask for the name; but might confuse some other client that
does ask for the name especially if the default export is different
than the export name being queried).
Adding an assert makes it obvious that we are not skipping any bytes
in the client's message, as well as making it obvious that we were
using the wrong variable.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
CC: qemu-stable@nongnu.org
Message-Id: <20171101154204.27146-1-vsementsov@virtuozzo.com>
[eblake: improve commit message, squash in assert addition]
Signed-off-by: Eric Blake <eblake@redhat.com>
Minimal implementation: for structured error only error_report error
message.
Note that test 83 is now more verbose, because the implementation
prints more warnings about unexpected communication errors; perhaps
future patches should tone things down by using trace messages
instead of traces, but the common case of successful communication
is no noisier than before.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20171027104037.8319-13-eblake@redhat.com>
An upcoming change to block/nbd-client.c will want to read the
tail of a structured reply chunk directly from the wire. Move
this function to make it easier.
Based on a patch from Vladimir Sementsov-Ogievskiy.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20171027104037.8319-12-eblake@redhat.com>
In following patch nbd_receive_reply will be used both for simple
and structured reply header receiving.
NBDReply is altered into union of simple reply header and structured
reply chunk header, simple error translation moved to block/nbd-client
to be consistent with further structured reply error translation.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20171027104037.8319-11-eblake@redhat.com>
Split out nbd_request_simple_option to be reused for structured reply
option.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20171027104037.8319-10-eblake@redhat.com>
The NBD spec permits including a human-readable error string if
structured replies are in force, so we might as well send the
client the message that we logged on any error.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20171027104037.8319-9-eblake@redhat.com>
Minimal implementation of structured read: one structured reply chunk,
no segmentation.
Minimal structured error implementation: no text message.
Support DF flag, but just ignore it, as there is no segmentation any
way.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20171027104037.8319-8-eblake@redhat.com>
Consolidate the response for a non-zero-length option payload
into a new function, nbd_reject_length(). This check will
also be used when introducing support for structured replies.
Note that STARTTLS response differs based on time: if the connection
is still unencrypted, we set fatal to true (a client that can't
request TLS correctly may still think that we are ready to start
the TLS handshake, so we must disconnect); while if the connection
is already encrypted, the client is sending a bogus request but
is no longer at risk of being confused by continuing the connection.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20171027104037.8319-7-eblake@redhat.com>
[eblake: correct return value on STARTTLS]
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Instead of making each caller check whether a transmission error
occurred, we can sink a common error check to the end of the loop.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20171027104037.8319-6-eblake@redhat.com>
[eblake: squash in compiler warning fix]
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
When the server is read-only, we were already reporting an error
message for NBD_CMD_WRITE_ZEROES, but failed to set errp for a
similar NBD_CMD_WRITE. This will matter more once structured
replies allow the server to propagate the errp information back
to the client. While at it, use an error message that makes a
bit more sense if viewed on the client side.
Note that when using qemu-io to test qemu-nbd behavior, it is
rather difficult to convince qemu-io to send protocol violations
(such as a read beyond bounds), because we have a lot of active
checking on the client side that a qemu-io request makes sense
before it ever goes over the wire to the server. The case of a
client attempting a write when the server is started as
'qemu-nbd -r' is one of the few places where we can easily test
error path handling, without having to resort to hacking in known
temporary bugs to either the server or client. [Maybe we want a
future patch to the client to do up-front checking on writes to a
read-only export, the way it does up-front bounds checking; but I
don't see anything in the NBD spec that points to a protocol
violation in our current behavior.]
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20171027104037.8319-5-eblake@redhat.com>
Upcoming patches will implement the NBD structured reply
extension [1] for both client and server roles. Declare the
constants, structs, and lookup routines that will be valuable
whether the server or client code is backported in isolation.
This includes moving one constant from an internal header to
the public header, as part of the structured read processing
will be done in block/nbd-client.c rather than nbd/client.c.
[1]https://github.com/NetworkBlockDevice/nbd/blob/extension-structured-reply/doc/proto.md
Based on patches from Vladimir Sementsov-Ogievskiy.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20171027104037.8319-4-eblake@redhat.com>
This is needed in preparation for structured reply handling,
as we will be performing the translation from NBD error to
system errno value higher in the stack at block/nbd-client.c.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20171027104037.8319-3-eblake@redhat.com>
NBD errors were originally sent over the wire based on Linux errno
values; but not all the world is Linux, and not all platforms share
the same values. Since a number isn't very easy to decipher on all
platforms, update the trace messages to include the name of NBD
errors being sent/received over the wire. Tweak the trace messages
to be at the point where we are using the NBD error, not the
translation to the host errno values.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20171027104037.8319-2-eblake@redhat.com>
Prepare indenting for the following commit.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20171012095319.136610-9-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Send qiov via qio_channel_writev_all instead of calling nbd_write twice
with a cork.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20171012095319.136610-8-vsementsov@virtuozzo.com>
[eblake: rebase to tweaks earlier in series]
Signed-off-by: Eric Blake <eblake@redhat.com>
Pass client and buffer (*data) parameters directly, to make the function
consistent with further structured reply sending functions.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20171012095319.136610-7-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
NBDReply structure will be upgraded in future patches to handle both
simple and structured replies and will be used only in the client
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20171012095319.136610-6-vsementsov@virtuozzo.com>
[eblake: rebase to tweaks earlier in series]
Signed-off-by: Eric Blake <eblake@redhat.com>
Use packed structure instead of pointer arithmetics.
Also, merge two redundant traces into one.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20171012095319.136610-5-vsementsov@virtuozzo.com>
[eblake: tweak and mention impact on traces, fix errp usage]
Signed-off-by: Eric Blake <eblake@redhat.com>
To be consistent when their _structured_ analogs will be introduced.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20171012095319.136610-4-vsementsov@virtuozzo.com>
[eblake: also tweak trace message contents]
Signed-off-by: Eric Blake <eblake@redhat.com>
Rather than open-coding our own read/write-all functions, we
can make use of the recently-added qio code. It slightly
changes the error message in one of the iotests.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20170905191114.5959-4-eblake@redhat.com>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Fix nbd_send_request to return int, as it returns a return value
of nbd_write (which is int), and the only user of nbd_send_request's
return value (nbd_co_send_request) consider it as int too.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20170804151440.320927-5-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Refactor nbd_receive_reply to return 1 on success, 0 on eof, when no
data was read and <0 for other cases, because returned size of read
data is not actually used.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20170804151440.320927-4-vsementsov@virtuozzo.com>
[eblake: tweak function comments]
Signed-off-by: Eric Blake <eblake@redhat.com>
Refactor nbd_read_eof to return 1 on success, 0 on eof, when no
data was read and <0 for other cases, because returned size of
read data is not actually used.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20170804151440.320927-3-vsementsov@virtuozzo.com>
[eblake: tweak function comments, rebase to test 083 enhancements]
Signed-off-by: Eric Blake <eblake@redhat.com>
Do not send NBD_OPT_ABORT to the broken server. After sending
NBD_REP_ACK on NBD_OPT_GO server is most probably in transmission
phase, when option sending is finished.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20170804151440.320927-2-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
The "inactive" state of BDS affects whether the permissions can be
granted, we must call bdrv_invalidate_cache before bdrv_set_perm to
support "-incoming defer" case.
Reported-by: Christian Ehrhardt <christian.ehrhardt@canonical.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-Id: <20170815130740.31229-3-famz@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
NBD_CMD_DISC is a disconnect request, not a data discard request.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20170811015749.20365-1-eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
The only exception are groups of numers separated by symbols
'.', ' ', ':', '/', like 'ab.09.7d'.
This patch is made by the following:
> find . -name trace-events | xargs python script.py
where script.py is the following python script:
=========================
#!/usr/bin/env python
import sys
import re
import fileinput
rhex = '%[-+ *.0-9]*(?:[hljztL]|ll|hh)?(?:x|X|"\s*PRI[xX][^"]*"?)'
rgroup = re.compile('((?:' + rhex + '[.:/ ])+' + rhex + ')')
rbad = re.compile('(?<!0x)' + rhex)
files = sys.argv[1:]
for fname in files:
for line in fileinput.input(fname, inplace=True):
arr = re.split(rgroup, line)
for i in range(0, len(arr), 2):
arr[i] = re.sub(rbad, '0x\g<0>', arr[i])
sys.stdout.write(''.join(arr))
=========================
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Cornelia Huck <cohuck@redhat.com>
Message-id: 20170731160135.12101-5-vsementsov@virtuozzo.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
nbd/client.c:385:12: warning: Potential leak of memory pointed to by 'buf'
Reported-by: Clang Static Analyzer
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20170727024224.22900-5-f4bug@amsat.org>
[introduced in commit 8ecaeae8]
Signed-off-by: Eric Blake <eblake@redhat.com>
A typo in commit 23e099c set the size of buf[] used in response
to NBD_OPT_EXPORT_NAME according to the length needed for old-style
negotiation (4 bytes of flag information) instead of the intended
2 bytes used in new style. If the client doesn't enable
NBD_FLAG_C_NO_ZEROES, then the server sends two bytes too many,
and is then out of sync in response to the client's next command
(the bug is masked when modern qemu is the client, since we enable
the no zeroes flag).
While touching this code, add some more defines to nbd_internal.h
rather than having quite so many magic numbers in the .c; also,
use "" initialization rather than memset(), and tweak the oldstyle
negotiation to better match the spec description of the layout
(since the spec is big-endian, skipping two bytes as 0 followed by
writing a 2-byte flag is the same as writing a zero-extended 4-byte
flag), to make it a bit easier to follow compared to the spec.
[checkpatch.pl has some false positives in the comments]
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20170717192635.17880-3-eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Make the client trace slightly more legible by including the name
of the command being sent.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-Id: <20170717192635.17880-2-eblake@redhat.com>
Commit 8ecaeae8 changed the way the client requests an NBD export,
and in the process also changed the resulting error message when
the export is not present, breaking a couple of iotests. The error
message is now directly given by the server (a failed NBD_OPT_GO)
instead of implied by the client (after exhausting NBD_OPT_LIST),
but looking at the testsuite changes, it proves worthwhile to
reword the error message to be slightly less verbose (as this is
one particular error message likely to be hit by a user).
Note that the error message is now sensitive to which binary is
running the server as well as the client (since the expected
output is replaying a message received from the server - for that
matter, it depends on a server new enough to understand NBD_OPT_GO);
in general iotests are run on client and server from the same source
code base so the default setup will pass; but if it proves
problematic for people overriding QEMU_PROG, QEMU_IMG_PROG,
QEMU_IO_PROG, and QEMU_NBD_PROG to point across multiple builds for
cross-version integration testing, we may have to later tweak or
sanitize the output somehow.
Reported-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20170717142310.17048-1-eblake@redhat.com>
Tested-by: John Snow <jsnow@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
The upstream NBD Protocol has defined a new extension to allow
the server to advertise block sizes to the client, as well as
a way for the client to inform the server whether it intends to
obey block sizes.
When using the block layer as the client, we will obey block
sizes; but when used as 'qemu-nbd -c' to hand off to the
kernel nbd module as the client, we are still waiting for the
kernel to implement a way for us to learn if it will honor
block sizes (perhaps by an addition to sysfs, rather than an
ioctl), as well as any way to tell the kernel what additional
block sizes to obey (NBD_SET_BLKSIZE appears to be accurate
for the minimum size, but preferred and maximum sizes would
probably be new ioctl()s), so until then, we need to make our
request for block sizes conditional.
When using ioctl(NBD_SET_BLKSIZE) to hand off to the kernel,
use the minimum block size as the sector size if it is larger
than 512, which also has the nice effect of cooperating with
(non-qemu) servers that don't do read-modify-write when
exposing a block device with 4k sectors; it might also allow
us to visit a file larger than 2T on a 32-bit kernel.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20170707203049.534-10-eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The upstream NBD Protocol has defined a new extension to allow
the server to advertise block sizes to the client, as well as
a way for the client to inform the server that it intends to
obey block sizes.
Thanks to a recent fix (commit df7b97ff), our real minimum
transfer size is always 1 (the block layer takes care of
read-modify-write on our behalf), but we're still more efficient
if we advertise 512 when the client supports it, as follows:
- OPT_INFO, but no NBD_INFO_BLOCK_SIZE: advertise 512, then
fail with NBD_REP_ERR_BLOCK_SIZE_REQD; client is free to try
something else since we don't disconnect
- OPT_INFO with NBD_INFO_BLOCK_SIZE: advertise 512
- OPT_GO, but no NBD_INFO_BLOCK_SIZE: advertise 1
- OPT_GO with NBD_INFO_BLOCK_SIZE: advertise 512
We can also advertise the optimum block size (presumably the
cluster size, when exporting a qcow2 file), and our absolute
maximum transfer size of 32M, to help newer clients avoid
EINVAL failures or abrupt disconnects on oversize requests.
We do not reject clients for using the older NBD_OPT_EXPORT_NAME;
we are no worse off for those clients than we used to be.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20170707203049.534-9-eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
NBD_OPT_EXPORT_NAME is lousy: per the NBD protocol, any failure
requires the server to close the connection rather than report an
error to us. Therefore, upstream NBD recently added NBD_OPT_GO as
the improved version of the option that does what we want [1]: it
reports sane errors on failures, and on success provides at least
as much info as NBD_OPT_EXPORT_NAME.
[1] https://github.com/NetworkBlockDevice/nbd/blob/extension-info/doc/proto.md
This is a first cut at use of the information types. Note that we
do not need to use NBD_OPT_INFO, and that use of NBD_OPT_GO means
we no longer have to use NBD_OPT_LIST to learn whether a server
requires TLS (this requires servers that gracefully handle unknown
NBD_OPT, many servers prior to qemu 2.5 were buggy, but I have patched
qemu, upstream nbd, and nbdkit in the meantime, in part because of
interoperability testing with this patch). We still fall back to
NBD_OPT_LIST when NBD_OPT_GO is not supported on the server, as it
is still one last chance for a nicer error message. Later patches
will use further info, like NBD_INFO_BLOCK_SIZE.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20170707203049.534-8-eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
NBD_OPT_EXPORT_NAME is lousy: per the NBD protocol, any failure
requires us to close the connection rather than report an error.
Therefore, upstream NBD recently added NBD_OPT_GO as the improved
version of the option that does what we want [1], along with
NBD_OPT_INFO that returns the same information but does not
transition to transmission phase.
[1] https://github.com/NetworkBlockDevice/nbd/blob/extension-info/doc/proto.md
This is a first cut at the information types, and only passes the
same information already available through NBD_OPT_LIST and
NBD_OPT_EXPORT_NAME; items like NBD_INFO_BLOCK_SIZE (and thus any
use of NBD_REP_ERR_BLOCK_SIZE_REQD) are intentionally left for
later patches.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20170707203049.534-7-eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reply directly in nbd_negotiate_handle_export_name(), rather than
waiting until nbd_negotiate_options() completes. This will make it
easier to implement NBD_OPT_GO. Pass additional parameters around,
rather than stashing things inside NBDClient.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20170707203049.534-6-eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Simplify the tracing of client flags in the server, and return -EINVAL
instead of -EIO if we successfully read but don't like those flags.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20170707203049.534-5-eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The NBD protocol has several constants defined in various extensions
that we are about to implement. Expose them to the code, along with
an easy way to map various constants to strings during diagnostic
messages.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20170707203049.534-4-eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
We really don't care if our spec-compliant reply to NBD_OPT_ABORT
was received, so shave off some lines of code by not even tracing it.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20170707203049.534-3-eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The NBD Protocol is introducing some additional information
about exports, such as minimum request size and alignment, as
well as an advertised maximum request size. It will be easier
to feed this information back to the block layer if we gather
all the information into a struct, rather than adding yet more
pointer parameters during negotiation.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20170707203049.534-2-eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Let NBD use the trace mechanisms already present in qemu. Now you can
use the -trace optino of qemu, or the -T/--trace option of qemu-img,
qemu-io, and qemu-nbd, to select nbd traces. For qemu, the QMP commands
trace-event-{get,set}-state can also toggle tracing on the fly.
Example:
qemu-nbd --trace 'nbd_*' <image file> # enables all nbd traces
Recompilation with CFLAGS=-DDEBUG_NBD is no more needed, furthermore,
DEBUG_NBD macro is removed from the code.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20170707152918.23086-11-vsementsov@virtuozzo.com>
[eblake: minor tweaks to a couple of traces]
Signed-off-by: Eric Blake <eblake@redhat.com>
Reorganize traces: move, reword, add information, drop extra ones.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20170707152918.23086-10-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Rename 'clientflags' to just 'option'. This variable has nothing to do
with flags, but is a single integer representing the option requested
by the client.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20170707152918.23086-9-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Fix wrong order of TRACE arguments.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20170707152918.23086-8-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
We are going to switch from TRACE macro to trace points,
this TRACE complicates things, this patch simplifies it.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20170707152918.23086-7-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Error is propagated to the caller, TRACE is not needed.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20170707152918.23086-6-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20170707152918.23086-5-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Move to modern errp scheme from just LOGging errors.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20170707152918.23086-4-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Combine two successive "if (oldStyle) {...} else {...}" into one.
Block "if (client->tlscreds)" under "if (oldStyle)" is unreachable,
as we have "oldStyle = client->exp != NULL && !client->tlscreds;".
So, delete this block.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20170707152918.23086-3-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Separate the case when a client sends NBD_OPT_ABORT from all other
errors. It will be needed for the following patch, where errors will be
reported.
This particular case is not actually an error - it honestly follows the
NBD protocol. Therefore it should not be reported like an error.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20170707152918.23086-2-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
- do not use 'goto error_reply' outside a switch to jump into the
middle of the switch's default case label
- reduce code duplication
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20170602150150.258222-13-vsementsov@virtuozzo.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
For consistency use 'ret' name for saving return code everywhere
in the file.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20170602150150.258222-12-vsementsov@virtuozzo.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
"goto fail" error handling scheme is not needed for just returning
error code. Better is return it immediately.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20170602150150.258222-11-vsementsov@virtuozzo.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Current code will return 0 on this nbd_write fail, as rc is 0
after successful nbd_negotiate_options. Fix this.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20170602150150.258222-10-vsementsov@virtuozzo.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
"co" field of NBDClientNewData has never been used, all the way back to
its declaration in commit 1a6245a5. So let's just use client pointer
instead of extra structure.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20170602150150.258222-9-vsementsov@virtuozzo.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Move function tail, about receiving next request out of the function.
Error path is simplified and nbd_co_receive_request becomes more
corresponding to its name.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20170602150150.258222-8-vsementsov@virtuozzo.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
For now nbd_read never returns EAGAIN. So, don't handle it.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20170602150150.258222-7-vsementsov@virtuozzo.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
As nbd_write never returns value > 0, we can get rid of extra ret.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20170602150150.258222-6-vsementsov@virtuozzo.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Now nbd_read and friends return int, so get rid of ssize_t.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20170602150150.258222-5-vsementsov@virtuozzo.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Functions nbd_negotiate_{read,write,drop_sync} were introduced in
1a6245a5b, when nbd_rwv (was nbd_wr_sync) was working through
qemu_co_sendv_recvv (the path is nbd_wr_sync -> qemu_co_{recv/send} ->
qemu_co_send_recv -> qemu_co_sendv_recvv), which just yields, without
setting any handlers. But starting from ff82911cd nbd_rwv (was
nbd_wr_syncv) works through qio_channel_yield() which sets handlers, so
watchers are redundant in nbd_negotiate_{read,write,drop_sync}, then,
let's just use nbd_{read,write,drop} functions.
Functions nbd_{read,write,drop} has errp parameter, which is unused in
this patch. This will be fixed later.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20170602150150.258222-4-vsementsov@virtuozzo.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Following commit will reuse it for nbd server too.
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20170602150150.258222-3-vsementsov@virtuozzo.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Rename
nbd_wr_syncv -> nbd_rwv
read_sync -> nbd_read
read_sync_eof -> nbd_read_eof
write_sync -> nbd_write
drop_sync -> nbd_drop
1. nbd_ prefix
read_sync and write_sync are already shared, so it is good to have a
namespace prefix. drop_sync will be shared, and read_sync_eof is
related to read_sync, so let's rename them all.
2. _sync suffix
_sync is related to the fact that nbd_wr_syncv doesn't return if a
write to socket returns EAGAIN. The first implementation of
nbd_wr_syncv (was wr_sync in 7a5ca8648b) just loops while getting
EAGAIN, the current implementation yields in this case.
Why we want to get rid of it:
- it is normal for r/w functions to be synchronous, so having an
additional suffix for it looks redundant (contrariwise, we have
_aio suffix for async functions)
- _sync suffix in block layer is used when function does flush (so
using it for other thing is confusing a bit)
- keep function names short after adding nbd_ prefix
3. for nbd_wr_syncv let's use more common notation 'rw'
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20170602150150.258222-2-vsementsov@virtuozzo.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Back in qemu 2.5, qemu-nbd was immune to port probes (a transient
server would not quit, regardless of how many probe connections
came and went, until a connection actually negotiated). But we
broke that in commit ee7d7aa when removing the return value to
nbd_client_new(), although that patch also introduced a bug causing
an assertion failure on a client that fails negotiation. We then
made it worse during refactoring in commit 1a6245a (a segfault
before we could even assert); the (masked) assertion was cleaned
up in d3780c2 (still in 2.6), and just recently we finally fixed
the segfault ("nbd: Fully intialize client in case of failed
negotiation"). But that still means that ever since we added
TLS support to qemu-nbd, we have been vulnerable to an ill-timed
port-scan being able to cause a denial of service by taking down
qemu-nbd before a real client has a chance to connect.
Since negotiation is now handled asynchronously via coroutines,
we no longer have a synchronous point of return by re-adding a
return value to nbd_client_new(). So this patch instead wires
things up to pass the negotiation status through the close_fn
callback function.
Simple test across two terminals:
$ qemu-nbd -f raw -p 30001 file
$ nmap 127.0.0.1 -p 30001 && \
qemu-io -c 'r 0 512' -f raw nbd://localhost:30001
Note that this patch does not change what constitutes successful
negotiation (thus, a client must enter transmission phase before
that client can be considered as a reason to terminate the server
when the connection ends). Perhaps we may want to tweak things
in a later patch to also treat a client that uses NBD_OPT_ABORT
as being a 'successful' negotiation (the client correctly talked
the NBD protocol, and informed us it was not going to use our
export after all), but that's a discussion for another day.
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1451614
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20170608222617.20376-1-eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
If a non-NBD client connects to qemu-nbd, we would end up with
a SIGSEGV in nbd_client_put() because we were trying to
unregister the client's association to the export, even though
we skipped inserting the client into that list. Easy trigger
in two terminals:
$ qemu-nbd -p 30001 --format=raw file
$ nmap 127.0.0.1 -p 30001
nmap claims that it thinks it connected to a pago-services1
server (which probably means nmap could be updated to learn the
NBD protocol and give a more accurate diagnosis of the open
port - but that's not our problem), then terminates immediately,
so our call to nbd_negotiate() fails. The fix is to reorder
nbd_co_client_start() to ensure that all initialization occurs
before we ever try talking to a client in nbd_negotiate(), so
that the teardown sequence on negotiation failure doesn't fault
while dereferencing a half-initialized object.
While debugging this, I also noticed that nbd_update_server_watch()
called by nbd_client_closed() was still adding a channel to accept
the next client, even when the state was no longer RUNNING. That
is fixed by making nbd_can_accept() pay attention to the current
state.
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1451614
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20170527030421.28366-1-eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Move to modern errp scheme from just LOGging errors.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20170526110913.89098-1-vsementsov@virtuozzo.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
There a lot of calls of these functions, which already have errp, which
they are filling themselves. On the other hand, nbd_wr_syncv has errp
parameter too, so it would be great to connect them.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20170516094533.6160-5-vsementsov@virtuozzo.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Will be used in following patch to provide actual error message in
some cases.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20170516094533.6160-4-vsementsov@virtuozzo.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
functions read_sync, drop_sync, write_sync, and also
nbd_negotiate_write, nbd_negotiate_read, nbd_negotiate_drop_sync
returns number of processed bytes. But what this number can be,
except requested number of bytes?
Actually, underlying nbd_wr_syncv function returns a value >= 0 and
!= requested_bytes only on eof on read operation. So, firstly, it is
impossible on write (let's add an assert) and on read it actually
means, that communication is broken (except nbd_receive_reply, see
below).
Most of callers operate like this:
if (func(..., size) != size) {
/* error path */
}
, i.e.:
1. They are not interested in partial success
2. Extra duplications in code (especially bad are duplications of
magic numbers)
3. User doesn't see actual error message, as return code is lost.
(this patch doesn't fix this point, but it makes fixing easier)
Several callers handles ret >= 0 and != requested-size separately, by
just returning EINVAL in this case. This patch makes read_sync and
friends return EINVAL in this case, so final behavior is the same.
And only one caller - nbd_receive_reply() does something not so
obvious. It returns EINVAL for ret > 0 and != requested-size, like
previous group, but for ret == 0 it returns 0. The only caller of
nbd_receive_reply() - nbd_read_reply_entry() handles ret == 0 in the
same way as ret < 0, so for now it doesn't matter. However, in
following commits error path handling will be improved and we'll need
to distinguish success from fail in this case too. So, this patch adds
separate helper for this case - read_sync_eof.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20170516094533.6160-3-vsementsov@virtuozzo.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
nbd_wr_syncv is called either from coroutine or from client negotiation
code, when socket is in blocking mode. So, -EAGAIN is impossible.
Furthermore, EAGAIN is confusing, as, what to read/write again? With
EAGAIN as a return code we don't know how much data is already
read or written by the function, so in case of EAGAIN the whole
communication is broken.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20170516094533.6160-2-vsementsov@virtuozzo.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
After the switch to reading replies in a coroutine, nothing is
reentering pending receive coroutines if the connection hangs.
Move nbd_recv_coroutines_enter_all to the reply read coroutine,
which is the place where hangups are detected. nbd_teardown_connection
can simply wait for the reply read coroutine to detect the hangup
and clean up after itself.
This wouldn't be enough though because nbd_receive_reply returns 0
(rather than -EPIPE or similar) when reading from a hung connection.
Fix the return value check in nbd_read_reply_entry.
This fixes qemu-iotests 083.
Reported-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 20170314111157.14464-1-pbonzini@redhat.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
NBD can't cope with device size changes, so resize must be forbidden,
but otherwise we can tolerate anything. Depending on whether the export
is writable or not, we only require consistent reads and writes.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Acked-by: Fam Zheng <famz@redhat.com>
Now that blk_insert_bs() requests the BlockBackend permissions for the
node it attaches to, it can fail. Instead of aborting, pass the errors
to the callers.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Acked-by: Fam Zheng <famz@redhat.com>
We want every user to be specific about the permissions it needs, so
we'll pass the initial permissions as parameters to blk_new(). A user
only needs to call blk_set_perm() if it wants to change the permissions
after the fact.
The permissions are stored in the BlockBackend and applied whenever a
BlockDriverState should be attached in blk_insert_bs().
This does not include actually choosing the right set of permissions
everywhere yet. Instead, the usual FIXME comment is added to each place
and will be addressed in individual patches.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Acked-by: Fam Zheng <famz@redhat.com>
In the client, read the reply headers from a coroutine, switching the
read side between the "read header" coroutine and the I/O coroutine that
reads the body of the reply.
In the server, if the server can read more requests it will create a new
"read request" coroutine as soon as a request has been read. Otherwise,
the new coroutine is created in nbd_request_put.
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 20170213135235.12274-8-pbonzini@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Currently the QIOTaskFunc signature takes an Object * for
the source, and an Error * for any error. We also need to
be able to provide a result pointer. Rather than continue
to add parameters to QIOTaskFunc, remove the existing
ones and simply pass the QIOTask object instead. This
has methods to access all the other data items required
in the callback impl.
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The new AioPollFn io_poll() argument to aio_set_fd_handler() and
aio_set_event_handler() is used in the next patch.
Keep this code change separate due to the number of files it touches.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 20161201192652.9509-3-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Commit 7d3123e converted a single read_sync() into a while loop
that assumed that read_sync() would either make progress or give
an error. But when the server hangs up early, the client sees
EOF (a read_sync() of 0) and never makes progress, which in turn
caused qemu-iotest './check -nbd 83' to go into an infinite loop.
Rework the loop to accomodate reads cut short by EOF.
Reported-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1478551093-32757-1-git-send-email-eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Upstream NBD protocol recently added the ability to efficiently
write zeroes without having to send the zeroes over the wire,
along with a flag to control whether the client wants to allow
a hole.
Note that when it comes to requiring full allocation, vs.
permitting optimizations, the NBD spec intentionally picked a
different sense for the flag; the rules in qemu are:
MAY_UNMAP == 0: must write zeroes
MAY_UNMAP == 1: may use holes if reads will see zeroes
while in NBD, the rules are:
FLAG_NO_HOLE == 1: must write zeroes
FLAG_NO_HOLE == 0: may use holes if reads will see zeroes
In all cases, the 'may use holes' scenario is optional (the
server need not use a hole, and must not use a hole if
subsequent reads would not see zeroes).
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1476469998-28592-16-git-send-email-eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
NBD commit 6d34500b clarified how clients and servers are supposed
to behave before closing a connection. It added NBD_REP_ERR_SHUTDOWN
(for the server to announce it is about to go away during option
haggling, so the client should quit sending NBD_OPT_* other than
NBD_OPT_ABORT) and ESHUTDOWN (for the server to announce it is about
to go away during transmission, so the client should quit sending
NBD_CMD_* other than NBD_CMD_DISC). It also clarified that
NBD_OPT_ABORT gets a reply, while NBD_CMD_DISC does not.
This patch merely adds the missing reply to NBD_OPT_ABORT and teaches
the client to recognize server errors. Actually teaching the server
to send NBD_REP_ERR_SHUTDOWN or ESHUTDOWN would require knowing that
the server has been requested to shut down soon (maybe we could do
that by installing a SIGINT handler in qemu-nbd, which transitions
from RUNNING to a new state that waits for the client to react,
rather than just out-right quitting - but that's a bigger task for
another day).
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1476469998-28592-15-git-send-email-eblake@redhat.com>
[Move dummy ESHUTDOWN to include/qemu/osdep.h. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Checkpatch complains that 'return EINVAL' is usually wrong
(since we tend to favor 'return -EINVAL'). But it is a
false positive for nbd_errno_to_system_errno(). Since NBD
may add future defined wire values, refactor the code to
keep checkpatch happy.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1476469998-28592-14-git-send-email-eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The NBD Protocol allows the server and client to mutually agree
on a shorter handshake (omit the 124 bytes of reserved 0), via
the server advertising NBD_FLAG_NO_ZEROES and the client
acknowledging with NBD_FLAG_C_NO_ZEROES (only possible in
newstyle, whether or not it is fixed newstyle). It doesn't
shave much off the wire, but we might as well implement it.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Alex Bligh <alex@alex.org.uk>
Message-Id: <1476469998-28592-13-git-send-email-eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Since we know that the maximum name we are willing to accept
is small enough to stack-allocate, rework the iteration over
NBD_OPT_LIST responses to reuse a stack buffer rather than
allocating every time. Furthermore, we don't even have to
allocate if we know the server's length doesn't match what
we are searching for.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1476469998-28592-12-git-send-email-eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The server has a nice helper function nbd_negotiate_drop_sync()
which lets it easily ignore fluff from the client (such as the
payload to an unknown option request). We can't quite make it
common, since it depends on nbd_negotiate_read() which handles
coroutine magic, but we can copy the idea into the client where
we have places where we want to ignore data (such as the
description tacked on the end of NBD_REP_SERVER).
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1476469998-28592-11-git-send-email-eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The NBD spec says that a client should send NBD_OPT_ABORT
rather than just dropping the connection, if the client doesn't
like something the server sent during option negotiation. This
is a best-effort attempt only, and can only be done in places
where we know the server is still in sync with what we've sent,
whether or not we've read everything the server has sent.
Technically, the server then has to reply with NBD_REP_ACK, but
it's not worth complicating the client to wait around for that
reply.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1476469998-28592-10-git-send-email-eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Rather than open-coding each option request, it's easier to
have common helper functions do the work. That in turn requires
having convenient packed types for handling option requests
and replies.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1476469998-28592-9-git-send-email-eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The NBD Protocol allows us to send human-readable messages
along with any NBD_REP_ERR error during option negotiation;
make use of this fact for clients that know what to do with
our message.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1476469998-28592-8-git-send-email-eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Rather than open-coding NBD_REP_SERVER, reuse the code we
already have by adding a length parameter. Additionally,
the refactoring will make adding NBD_OPT_GO in a later patch
easier.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1476469998-28592-7-git-send-email-eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Our coding convention prefers CamelCase names, and we already
have other existing structs with NBDFoo naming. Let's be
consistent, before later patches add even more structs.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1476469998-28592-6-git-send-email-eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
We have both 'struct NBDRequest' and 'struct nbd_request'; making
it confusing to see which does what. Furthermore, we want to
rename nbd_request to align with our normal CamelCase naming
conventions. So, rename the struct which is used to associate
the data received during request callbacks, while leaving the
shorter name for the description of the request sent over the
wire in the NBD protocol.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1476469998-28592-4-git-send-email-eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Current upstream NBD documents that requests have a 16-bit flags,
followed by a 16-bit type integer; although older versions mentioned
only a 32-bit field with masking to find flags. Since the protocol
is in network order (big-endian over the wire), the ABI is unchanged;
but dealing with the flags as a separate field rather than masking
will make it easier to add support for upcoming NBD extensions that
increase the number of both flags and commands.
Improve some comments in nbd.h based on the current upstream
NBD protocol (https://github.com/yoe/nbd/blob/master/doc/proto.md),
and touch some nearby code to keep checkpatch.pl happy.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1476469998-28592-3-git-send-email-eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The NBD protocol allows servers to advertise a human-readable
description alongside an export name during NBD_OPT_LIST. Add
an option to pass through the user's string to the NBD client.
Doing this also makes it easier to test commit 200650d4, which
is the client counterpart of receiving the description.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1476469998-28592-2-git-send-email-eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Ensure that all I/O channels created for NBD are given names
to distinguish their respective roles.
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The builtin NBD server uses its own BlockBackend now instead of reusing
the monitor/guest device one.
This means that it has its own writethrough setting now. The builtin
NBD server always uses writeback caching now regardless of whether the
guest device has WCE enabled. qemu-nbd respects the cache mode given on
the command line.
We still need to keep a reference to the monitor BB because we put an
eject notifier on it, but we don't use it for any I/O.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Rather than asserting that nbdflags is within range, just give
it the correct type to begin with :) nbdflags corresponds to
the per-export portion of NBD Protocol "transmission flags", which
is 16 bits in response to NBD_OPT_EXPORT_NAME and NBD_OPT_GO.
Furthermore, upstream NBD has never passed the global flags to
the kernel via ioctl(NBD_SET_FLAGS) (the ioctl was first
introduced in NBD 2.9.22; then a latent bug in NBD 3.1 actually
tried to OR the global flags with the transmission flags, with
the disaster that the addition of NBD_FLAG_NO_ZEROES in 3.9
caused all earlier NBD 3.x clients to treat every export as
read-only; NBD 3.10 and later intentionally clip things to 16
bits to pass only transmission flags). Qemu should follow suit,
since the current two global flags (NBD_FLAG_FIXED_NEWSTYLE
and NBD_FLAG_NO_ZEROES) have no impact on the kernel's behavior
during transmission.
CC: qemu-stable@nongnu.org
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1469129688-22848-3-git-send-email-eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Commit ab7c548e added a check for invalid flags, but used an
early return on error instead of properly going through the
cleanup label.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1469129688-22848-2-git-send-email-eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Change sector-based blk_discard(), blk_co_discard(), and
blk_aio_discard() to instead be byte-based blk_pdiscard(),
blk_co_pdiscard(), and blk_aio_pdiscard(). NBD gets a lot
simpler now that ignoring the unaligned portion of a
byte-based discard request is handled under the hood by
the block layer.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1468624988-423-6-git-send-email-eblake@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Now that NBD relies on the block layer to fragment things, we no
longer need to track an offset argument for which fragment of
a request we are actually servicing.
While at it, use true and false instead of 0 and 1 for a bool
parameter.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1468607524-19021-6-git-send-email-eblake@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
In practice the entry argument is always known at creation time, and
it is confusing that sometimes qemu_coroutine_enter is used with a
non-NULL argument to re-enter a coroutine (this happens in
block/sheepdog.c and tests/test-coroutine.c). So pass the opaque value
at creation time, for consistency with e.g. aio_bh_new.
Mostly done with the following semantic patch:
@ entry1 @
expression entry, arg, co;
@@
- co = qemu_coroutine_create(entry);
+ co = qemu_coroutine_create(entry, arg);
...
- qemu_coroutine_enter(co, arg);
+ qemu_coroutine_enter(co);
@ entry2 @
expression entry, arg;
identifier co;
@@
- Coroutine *co = qemu_coroutine_create(entry);
+ Coroutine *co = qemu_coroutine_create(entry, arg);
...
- qemu_coroutine_enter(co, arg);
+ qemu_coroutine_enter(co);
@ entry3 @
expression entry, arg;
@@
- qemu_coroutine_enter(qemu_coroutine_create(entry), arg);
+ qemu_coroutine_enter(qemu_coroutine_create(entry, arg));
@ reentry @
expression co;
@@
- qemu_coroutine_enter(co, NULL);
+ qemu_coroutine_enter(co);
except for the aforementioned few places where the semantic patch
stumbled (as expected) and for test_co_queue, which would otherwise
produce an uninitialized variable warning.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The trace format string in nbd_send_request uses PRIu16 for
request->type, but request->type is a uint32_t. This provokes
compiler warnings on the OSX clang. Use PRIu32 instead.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 1466167331-17063-1-git-send-email-peter.maydell@linaro.org
Declare a constant and use that when determining if an export
name fits within the constraints we are willing to support.
Note that upstream NBD recently documented that clients MUST
support export names of 256 bytes (not including trailing NUL),
and SHOULD support names up to 4096 bytes. 4096 is a bit big
(we would lose benefits of stack-allocation of a name array),
and we already have other limits in place (for example, qcow2
snapshot names are clamped around 1024). So for now, just
stick to the required minimum, as that's easier to audit than
a full-scale support for larger names.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1463006384-7734-12-git-send-email-eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Add some debugging to flag servers that are not compliant to
the NBD protocol. This would have flagged the server bug
fixed in commit c0301fcc.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Alex Bligh <alex@alex.org.uk>
Message-Id: <1463006384-7734-11-git-send-email-eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The kernel ioctl() interface into NBD is limited to 'unsigned long';
we MUST pass in input with that type (and not int or size_t, as
there may be platform ABIs where the wrong types promote incorrectly
through var-args). Furthermore, on 32-bit platforms, the kernel
is limited to a maximum export size of 2T (our BLKSIZE of 512 times
a SIZE_BLOCKS constrained by 32 bit unsigned long).
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1463006384-7734-8-git-send-email-eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
NBD ioctl()s are used to manage an NBD client session where
initial handshake is done in userspace, but then the transmission
phase is handed off to the kernel through a /dev/nbdX device.
As such, all ioctls sent to the kernel on the /dev/nbdX fd belong
in client.c; nbd_disconnect() was out-of-place in server.c.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1463006384-7734-7-git-send-email-eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The NBD protocol says that clients should not send a command flag
that has not been negotiated (whether by the client requesting an
option during a handshake, or because we advertise support for the
flag in response to NBD_OPT_EXPORT_NAME), and that servers should
reject invalid flags with EINVAL. We were silently ignoring the
flags instead. The client can't rely on our behavior, since it is
their fault for passing the bad flag in the first place, but it's
better to be robust up front than to possibly behave differently
than the client was expecting with the attempted flag.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Alex Bligh <alex@alex.org.uk>
Message-Id: <1463006384-7734-6-git-send-email-eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
We have a few bugs in how we handle invalid client commands:
- A client can send an NBD_CMD_DISC where from + len overflows,
convincing us to reply with an error and stay connected, even
though the protocol requires us to silently disconnect. Fix by
hoisting the special case sooner.
- A client can send an NBD_CMD_WRITE where from + len overflows,
where we reply to the client with EINVAL without consuming the
payload; this will normally cause us to fail if the next thing
read is not the right magic, but in rare cases, could cause us
to interpret the data payload as valid commands and do things
not requested by the client. Fix by adding a complete flag to
track whether we are in sync or must disconnect.
Furthermore, we have split the checks for bogus from/len across
two functions, when it is easier to do it all at once.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1463006384-7734-5-git-send-email-eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
We should never ignore failure from nbd_negotiate_send_rep(); if
we are unable to write to the client, then it is not worth trying
to continue the negotiation. Fortunately, the problem is not
too severe - chances are that the errors being ignored here (mainly
inability to write the reply to the client) are indications of
a closed connection or something similar, which will also affect
the next attempt to interact with the client and eventually reach
a point where the errors are detected to end the loop.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1463006384-7734-4-git-send-email-eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Clean up some debug message oddities missed earlier; this includes
some typos, and recognizing that %d is not necessarily compatible
with uint32_t. Also add a couple messages that I found useful
while debugging things.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1463006384-7734-3-git-send-email-eblake@redhat.com>
[Do not use PRIx16, clang complains. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Rather than always flushing ourselves, let the block layer
forward the FUA on to the underlying device - where all
underlying layers also understand FUA, we are now more
efficient; and where any underlying layer doesn't understand
it, now the block layer takes care of the full flush fallback
on our behalf.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1463006384-7734-2-git-send-email-eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The cpu_to_*w() functions just compose a pointer dereference
with a byteswap. Instead use st*_p(), which handles potential
pointer misalignment and avoids the need to cast the pointer.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <1465575342-12146-1-git-send-email-peter.maydell@linaro.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The *_to_cpup() functions are not very useful, as they simply do
a pointer dereference and then a *_to_cpu(). Instead use either:
* ld*_*_p(), if the data is at an address that might not be
correctly aligned for the load
* a local dereference and *_to_cpu(), if the pointer is
the correct type and known to be correctly aligned
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <1465570836-22211-1-git-send-email-peter.maydell@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Similar to commit df7b97ff, we are mishandling clients that
give an unaligned NBD_CMD_TRIM request, and potentially
trimming bytes that occur before their request; which in turn
can cause potential unintended data loss (unlikely in
practice, since most clients are sane and issue aligned trim
requests). However, while we fixed read and write by switching
to the byte interfaces of blk_, we don't yet have a byte
interface for discard. On the other hand, trim is advisory, so
rounding the user's request to simply ignore the first and last
unaligned sectors (or the entire request, if it is sub-sector
in length) is just fine.
CC: qemu-stable@nongnu.org
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1464173965-9694-1-git-send-email-eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Move it to the actual users. There are still a few includes of
qemu/bswap.h in headers; removing them is left for future work.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
We have several block drivers that understand BDRV_REQ_FUA,
and emulate it in the block layer for the rest by a full flush.
But without a way to actually request BDRV_REQ_FUA during a
pass-through blk_pwrite(), FUA-aware block drivers like NBD are
forced to repeat the emulation logic of a full flush regardless
of whether the backend they are writing to could do it more
efficiently.
This patch just wires up a flags argument; followup patches
will actually make use of it in the NBD driver and in qemu-io.
Signed-off-by: Eric Blake <eblake@redhat.com>
Acked-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The NBD protocol does not (yet) force any alignment constraints
on clients. Even though qemu NBD clients always send requests
that are aligned to 512 bytes, we must be prepared for non-qemu
clients that don't care about alignment (even if it means they
are less efficient). Our use of blk_read() and blk_write() was
silently operating on the wrong file offsets when the client
made an unaligned request, corrupting the client's data (but
as the client already has control over the file we are serving,
I don't think it is a security hole, per se, just a data
corruption bug).
Note that in the case of NBD_CMD_READ, an unaligned length could
cause us to return up to 511 bytes of uninitialized trailing
garbage from blk_try_blockalign() - hopefully nothing sensitive
from the heap's prior usage is ever leaked in that manner.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Tested-by: Kevin Wolf <kwolf@redhat.com>
Message-id: 1461249750-31928-1-git-send-email-eblake@redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Upstream NBD documents (as of commit 4feebc95) that servers MAY
choose to operate in a conditional mode, where it is up to the
client whether to use TLS. For qemu's case, we want to always be
in FORCEDTLS mode, because of the risk of man-in-the-middle
attacks, and since we never export more than one device; likewise,
the qemu client will ALWAYS send NBD_OPT_STARTTLS as its first
option. But now that SELECTIVETLS servers exist, it is feasible
to encounter a (non-qemu) client that is programmed to talk to
such a server, and does not do NBD_OPT_STARTTLS first, but rather
wants to probe if it can use a non-encrypted export.
The NBD protocol documents that we should let such a client
continue trying, on the grounds that maybe the client will get the
hint to send NBD_OPT_STARTTLS, rather than immediately dropping
the connection.
Note that NBD_OPT_EXPORT_NAME is a special case: since it is the
only option request that can't have an error return, we have to
(continue to) drop the connection on that one; rather, what we are
fixing here is that all other replies prior to TLS initiation tell
the client NBD_REP_ERR_TLS_REQD, but keep the connection alive.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-id: 1460671343-18485-1-git-send-email-eblake@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
The NBD Protocol states that NBD_REP_SERVER may set
'length > sizeof(namelen) + namelen'; in which case the rest
of the packet is a UTF-8 description of the export. While we
don't know of any NBD servers that send this description yet,
we had better consume the data so we don't choke when we start
to talk to such a server.
Also, a (buggy/malicious) server that replies with length <
sizeof(namelen) would cause us to block waiting for bytes that
the server is not sending, and one that replies with super-huge
lengths could cause us to temporarily allocate up to 4G memory.
Sanity check things before blindly reading incorrectly.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-id: 1460077777-31004-1-git-send-email-eblake@redhat.com
Reviewed-by: Alex Bligh <alex@alex.org.uk>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Until commit 1c778ef7 ("nbd: convert to using I/O channels for actual
socket I/O", 2016-02-16), nbd_wr_sync returned -EAGAIN this scenario.
nbd_reply_ready required these semantics because it has two conflicting
requirements:
1) if a reply can be received on the socket, nbd_reply_ready needs
to read the header outside coroutine context to identify _which_
coroutine to enter to process the rest of the reply
2) on the other hand, nbd_reply_ready can find a false positive if
another thread (e.g. a VCPU thread running aio_poll) sneaks in and
calls nbd_reply_ready too. In this case nbd_reply_ready does nothing
and expects nbd_wr_syncv to return -EAGAIN.
Currently, the solution to the first requirement is to wait in the very
rare case of a read() that doesn't retrieve the reply header in its
entirety; this is what nbd_wr_syncv does by calling qio_channel_wait().
However, the unconditional call to qio_channel_wait() breaks the second
requirement. To fix this, the patch makes nbd_wr_syncv return -EAGAIN
if done is zero, similar to the code before commit 1c778ef7.
This is okay because NBD client-side negotiation is the only other case
that calls nbd_wr_syncv outside a coroutine, and it places the socket
in blocking mode. On the other hand, it is a bit unpleasant to put
this in nbd_wr_syncv(), because the function is used by both client
and server.
The full fix would be to add a counter to NbdClientSession for how
many bytes have been filled in s->reply. Then a reply can be filled
by multiple separate invocations of nbd_reply_ready and the
qio_channel_wait() call can be removed completely. Something to
consider for 2.7...
Reported-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
nbd-server.c currently fails to handle unsupported options properly.
If during option haggling the client sends an unknown request, the
server kills the connection instead of letting the client try to
fall back to something older. This is precisely what advertising
NBD_FLAG_FIXED_NEWSTYLE was supposed to fix.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1459982918-32229-1-git-send-email-eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
nbd-client.c currently fails to handle unsupported options properly.
If during option haggling the server finds an option that is
unsupported, it returns an NBD_REP_ERR_UNSUP reply.
According to nbd's proto.md, the format for such a reply
should be:
S: 64 bits, 0x3e889045565a9 (magic number for replies)
S: 32 bits, the option as sent by the client to which this is a reply
S: 32 bits, reply type (e.g., NBD_REP_ACK for successful completion,
or NBD_REP_ERR_UNSUP to mark use of an option not known by this server
S: 32 bits, length of the reply. This may be zero for some replies,
in which case the next field is not sent
S: any data as required by the reply (e.g., an export name in the case
of NBD_REP_SERVER, or optional UTF-8 message for NBD_REP_ERR_*)
However, in nbd-client.c, the reply type was being read, and if it
contained an error, it was bailing out and issuing the next option
request without first reading the length. This meant that the
next option / handshake read had an extra 4 or more bytes of data in it.
In practice, this makes Qemu incompatible with servers that do not
support NBD_OPT_LIST.
To verify this isn't an error in the specification or my reading of
it, replies are sent by the reference implementation here:
https://github.com/yoe/nbd/blob/66dfb35/nbd-server.c#L1232
and as is evident it always sends a 'datasize' (aka length) 32 bit
word. Unsupported elements are replied to here:
https://github.com/yoe/nbd/blob/66dfb35/nbd-server.c#L1371
Signed-off-by: Alex Bligh <alex@alex.org.uk>
Message-Id: <1459882500-24316-1-git-send-email-alex@alex.org.uk>
[rework to ALWAYS consume an optional UTF-8 message from the server]
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1459961962-18771-1-git-send-email-eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Print debug tracing messages while data is still in native
ordering, rather than after we've potentially swapped it into
network order for transmission. Also, it's nice if the server
mentions what it is replying, to correlate it to with what the
client says it is receiving.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1459913704-19949-4-git-send-email-eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The compiler is smart enough to optimize out 'if (0)', but won't
type-check our printfs if they are hidden behind #if.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1459913704-19949-3-git-send-email-eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The NBD Protocol requires that servers should send EPERM for
attempts to write (or trim) a read-only export. We were
correct for TRIM (blk_co_discard() gave EPERM); but were
manually setting EROFS which then got mapped to EINVAL over
the wire on writes.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1459913704-19949-2-git-send-email-eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The client sends messages to the server, not itself.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1459459222-8637-3-git-send-email-eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Commit 57cb38b included qapi/error.h into qemu/osdep.h to get the
Error typedef. Since then, we've moved to include qemu/osdep.h
everywhere. Its file comment explains: "To avoid getting into
possible circular include dependencies, this file should not include
any other QEMU headers, with the exceptions of config-host.h,
compiler.h, os-posix.h and os-win32.h, all of which are doing a
similar job to this file and are under similar constraints."
qapi/error.h doesn't do a similar job, and it doesn't adhere to
similar constraints: it includes qapi-types.h. That's in excess of
100KiB of crap most .c files don't actually need.
Add the typedef to qemu/typedefs.h, and include that instead of
qapi/error.h. Include qapi/error.h in .c files that need it and don't
get it now. Include qapi-types.h in qom/object.h for uint16List.
Update scripts/clean-includes accordingly. Update it further to match
reality: replace config.h by config-target.h, add sysemu/os-posix.h,
sysemu/os-win32.h. Update the list of includes in the qemu/osdep.h
comment quoted above similarly.
This reduces the number of objects depending on qapi/error.h from "all
of them" to less than a third. Unfortunately, the number depending on
qapi-types.h shrinks only a little. More work is needed for that one.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
[Fix compilation without the spice devel packages. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Clean up includes so that osdep.h is included first and headers
which it implies are not included manually.
This commit was created with scripts/clean-includes.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
This extends the NBD protocol handling code so that it is capable
of negotiating TLS support during the connection setup. This involves
requesting the STARTTLS protocol option before any other NBD options.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-Id: <1455129674-17255-14-git-send-email-berrange@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
If the user does not provide an export name and the server
is running the new style protocol, where export names are
mandatory, use "" as the default export name if the user
has not specified any. "" is defined in the NBD protocol
as the default name to use in such scenarios.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-Id: <1455129674-17255-13-git-send-email-berrange@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
With the new style protocol, the NBD client will currenetly
send NBD_OPT_EXPORT_NAME as the first (and indeed only)
option it wants. The problem is that the NBD protocol spec
does not allow for returning an error message with the
NBD_OPT_EXPORT_NAME option. So if the server mandates use
of TLS, the client will simply see an immediate connection
close after issuing NBD_OPT_EXPORT_NAME which is not user
friendly.
To improve this situation, if we have the fixed new style
protocol, we can sent NBD_OPT_LIST as the first option
to query the list of server exports. We can check for our
named export in this list and raise an error if it is not
found, instead of going ahead and sending NBD_OPT_EXPORT_NAME
with a name that we know will be rejected.
This improves the error reporting both in the case that the
server required TLS, and in the case that the client requested
export name does not exist on the server.
If the server does not support NBD_OPT_LIST, we just ignore
that and carry on with NBD_OPT_EXPORT_NAME as before.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-Id: <1455129674-17255-12-git-send-email-berrange@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
If the server advertises support for the fixed new style
negotiation, the client should in turn enable new style.
This will allow the client to negotiate further NBD
options besides the export name.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-Id: <1455129674-17255-10-git-send-email-berrange@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
If the client does not request the fixed new style protocol,
then we should only accept NBD_OPT_EXPORT_NAME. All other
options are only valid when fixed new style has been activated.
The qemu-nbd client doesn't currently request fixed new style
protocol, but this change won't break qemu-nbd, because it
fortunately only ever uses NBD_OPT_EXPORT_NAME, so was never
triggering the non-compliant server behaviour.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-Id: <1455129674-17255-9-git-send-email-berrange@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The nbd_receive_negotiate() method takes different code
paths based on whether 'name == NULL', and then checks
the expected protocol version in each branch.
This patch inverts the logic, so that it takes different
code paths based on what protocol version it receives and
then checks if name is NULL or not as needed.
This facilitates later code which allows the client to
be capable of using the new style protocol regardless
of whether an export name is listed or not.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-Id: <1455129674-17255-8-git-send-email-berrange@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Now that all callers are converted to use I/O channels for
initial connection setup, it is possible to switch the core
NBD protocol handling core over to use QIOChannel APIs for
actual sockets I/O.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-Id: <1455129674-17255-7-git-send-email-berrange@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
cpu_to_be64w can't be used to make unaligned stores, but stq_be_p can.
Also, the st?_be_p takes a void* so it is more clearly suited to the
case where you're writing into a byte buffer.
Use the st?_be_p family of functions everywhere in nbd/server.c.
Signed-off-by: John Snow <jsnow@redhat.com>
[Changed to use st?_be_p everywhere. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Clean up includes so that osdep.h is included first and headers
which it implies are not included manually.
This commit was created with scripts/clean-includes.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 1454089805-5470-16-git-send-email-peter.maydell@linaro.org
The NBD code uses the BDS close notifier to determine when a medium is
ejected. However, now it should use the BB's BDS removal notifier for
that instead of the BDS's close notifier.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Use client_close() if an error in nbd_co_client_start() occurs instead
of manually inlining parts of it. This fixes an assertion error on the
server side if nbd_negotiate() fails.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* NBD fix from Denis
* condvar fix from Dave
* kvm_stat and dump-guest-memory almost rewrite
* mem-prealloc fix from Luiz
* manpage style improvement
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQEcBAABAgAGBQJWp4mKAAoJEL/70l94x66DersH/iUfwRTL7tmGOiUX73Qm32da
QseRiC5E5OaTLOGm+Q0Aehjq6Q18zgdiz/+/wSTPjnLmOiSDn6Sr6yB/URSMwhOE
+JVX3+UOpfHpQ1KHlBesIjS/WBSS1691ND1OPcHbHHa6UYbwEUTEc00hus8nVx6J
wyeteUoBryZA177rjVNb9sH7ncNFuuiQDfkr5pmC5f5JEsDiSK9hDmlg9sFnTWrO
XIVqQb0PD+EbOuufR4z3PTLIgbZEXegEgWOsE1FLBTVY/CZAkujynccOENIujFVv
CEhHJrGWo2NU0yeVJ1UlHREQyK+suIHgsiJlQKvAW8ZyFNqpy3+sWSEo7ZBpB6U=
=bVe7
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging
* chardev support for TLS and leak fix
* NBD fix from Denis
* condvar fix from Dave
* kvm_stat and dump-guest-memory almost rewrite
* mem-prealloc fix from Luiz
* manpage style improvement
# gpg: Signature made Tue 26 Jan 2016 14:58:18 GMT using RSA key ID 78C7AE83
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>"
# gpg: aka "Paolo Bonzini <pbonzini@redhat.com>"
* remotes/bonzini/tags/for-upstream: (49 commits)
scripts/dump-guest-memory.py: Fix module docstring
scripts/dump-guest-memory.py: Introduce multi-arch support
scripts/dump-guest-memory.py: Cleanup functions
scripts/dump-guest-memory.py: Improve python 3 compatibility
scripts/dump-guest-memory.py: Make methods functions
scripts/dump-guest-memory.py: Move constants to the top
nbd: add missed aio_context_acquire in nbd_export_new
memory: exit when hugepage allocation fails if mem-prealloc
cpus: use broadcast on qemu_pause_cond
scripts/kvm/kvm_stat: Add optparse description
scripts/kvm/kvm_stat: Add interactive filtering
scripts/kvm/kvm_stat: Fixup filtering
scripts/kvm/kvm_stat: Fix rlimit for unprivileged users
scripts/kvm/kvm_stat: Read event values as u64
scripts/kvm/kvm_stat: Cleanup and pre-init perf_event_attr
scripts/kvm/kvm_stat: Fix output formatting
scripts/kvm/kvm_stat: Make tui function a class
scripts/kvm/kvm_stat: Remove unneeded X86_EXIT_REASONS
scripts/kvm/kvm_stat: Group arch specific data
scripts/kvm/kvm_stat: Cleanup of Event class
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
blk_invalidate_cache() can call qcow2_invalidate_cache which performs
IO inside.
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <1453273940-15382-3-git-send-email-den@openvz.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Instead of covering only the state of images on the migration
destination before the migration is completed, the flag will also cover
the state of images on the migration source after completion. This
common state implies that the image is technically still open, but no
writes will happen and any cached contents will be reloaded from disk if
and when the image leaves this state.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
The amount of memory allocated in nbd_co_receive_request is driven by the
NBD client (possibly a virtual machine). Parallel I/O can cause the
server to allocate a large amount of memory; check for failures and
return ENOMEM in that case.
Cc: qemu-block@nongnu.org
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Only reads and writes need to allocate memory correspondent to the
request length. Other requests can be sent to the storage without
allocating any memory, and thus any request length is acceptable.
Reported-by: Sitsofe Wheeler <sitsofe@yahoo.com>
Cc: qemu-block@nongnu.org
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Create a coroutine in nbd_client_new, so that nbd_send_negotiate doesn't
need qemu_set_block().
Handlers need to be set temporarily for csock fd in case the coroutine
yields during I/O.
With this, if the other end disappears in the middle of the negotiation,
we don't block the whole event loop.
To make the code clearer, unify all function names that belong to
negotiate, so they are less likely to be misused. This is important
because we rely on negotiation staying in main loop, as commented in
nbd_negotiate_read/write().
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-Id: <1452760863-25350-4-git-send-email-famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
We have NBD server code and client code, all mixed in a file. Now split
them into separate files under nbd/, and update MAINTAINERS.
filter_nbd for iotest 083 is updated to keep the log filtered out.
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-Id: <1452760863-25350-3-git-send-email-famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>