nbd_drained_poll() generally runs in the main thread, not whatever
iothread the NBD server coroutine is meant to run in, so it can't
directly reenter the coroutines to wake them up.
The code seems to have the right intention, it specifies the correct
AioContext when it calls qemu_aio_coroutine_enter(). However, this
functions doesn't schedule the coroutine to run in that AioContext, but
it assumes it is already called in the home thread of the AioContext.
To fix this, add a new thread-safe qio_channel_wake_read() that can be
called in the main thread to wake up the coroutine in its AioContext,
and use this in nbd_drained_poll().
Cc: qemu-stable@nongnu.org
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20230517152834.277483-3-kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
There should be no paths from a coroutine_fn to aio_poll, however in
practice coroutine_mixed_fn will call aio_poll in the !qemu_in_coroutine()
path. By marking mixed functions, we can track accurately the call paths
that execute entirely in coroutine context, and find more missing
coroutine_fn markers. This results in more accurate checks that
coroutine code does not end up blocking.
If the marking were extended transitively to all functions that call
these ones, static analysis could be done much more efficiently.
However, this is a start and makes it possible to use vrc's path-based
searches to find potential bugs where coroutine_fns call blocking functions.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
MSG_PEEK peeks at the channel, The data is treated as unread and
the next read shall still return this data. This support is
currently added only for socket class. Extra parameter 'flags'
is added to io_readv calls to pass extra read flags like MSG_PEEK.
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Suggested-by: Daniel P. Berrange <berrange@redhat.com>
Signed-off-by: manish.mishra <manish.mishra@nutanix.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
qemu/coroutine.h and qemu/lockable.h include each other.
They need each other only in macro expansions, so we could simply drop
both inclusions to break the loop, and add suitable includes to files
that expand the macros.
Instead, move a part of qemu/coroutine.h to new qemu/coroutine-core.h
so that qemu/coroutine-core.h doesn't need qemu/lockable.h, and
qemu/lockable.h only needs qemu/coroutine-core.h. Result:
qemu/coroutine.h includes qemu/lockable.h includes
qemu/coroutine-core.h.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20221221131435.3851212-5-armbru@redhat.com>
[Semantic rebase conflict with 7c10cb38cc "accel/tcg: Add debuginfo
support" resolved]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20221121085054.683122-11-armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
The initial implementation was changing the pipe state created by GLib
to PIPE_NOWAIT, but it turns out it doesn't work (read/write returns an
error). Since reading may return less than the requested amount, it
seems to be non-blocking already. However, the IO operation may block
until the FD is ready, I can't find good sources of information, to be
safe we can just poll for readiness before.
Alternatively, we could setup the FDs ourself, and use UNIX sockets on
Windows, which can be used in blocking/non-blocking mode. I haven't
tried it, as I am not sure it is necessary.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20221006113657.2656108-6-marcandre.lureau@redhat.com>
Simplify qio_channel_command_new_spawn() with GSpawn API. This will
allow to build for WIN32 in the following patches.
As pointed out by Daniel Berrangé: there is a change in semantics here
too. The current code only touches stdin/stdout/stderr. Any other FDs
which do NOT have O_CLOEXEC set will be inherited. With the new code,
all FDs except stdin/out/err will be explicitly closed, because we don't
set the flag G_SPAWN_LEAVE_DESCRIPTORS_OPEN. The only place we use
QIOChannelCommand today is the migration exec: protocol, and that is
only declared to use stdin/stdout.
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20221006113657.2656108-5-marcandre.lureau@redhat.com>
This is for code which needs a portable equivalent to a QIOChannelFile
connected to /dev/null.
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
For CONFIG_LINUX, implement the new zero copy flag and the optional callback
io_flush on QIOChannelSocket, but enables it only when MSG_ZEROCOPY
feature is available in the host kernel, which is checked on
qio_channel_socket_connect_sync()
qio_channel_socket_flush() was implemented by counting how many times
sendmsg(...,MSG_ZEROCOPY) was successfully called, and then reading the
socket's error queue, in order to find how many of them finished sending.
Flush will loop until those counters are the same, or until some error occurs.
Notes on using writev() with QIO_CHANNEL_WRITE_FLAG_ZERO_COPY:
1: Buffer
- As MSG_ZEROCOPY tells the kernel to use the same user buffer to avoid copying,
some caution is necessary to avoid overwriting any buffer before it's sent.
If something like this happen, a newer version of the buffer may be sent instead.
- If this is a problem, it's recommended to call qio_channel_flush() before freeing
or re-using the buffer.
2: Locked memory
- When using MSG_ZERCOCOPY, the buffer memory will be locked after queued, and
unlocked after it's sent.
- Depending on the size of each buffer, and how often it's sent, it may require
a larger amount of locked memory than usually available to non-root user.
- If the required amount of locked memory is not available, writev_zero_copy
will return an error, which can abort an operation like migration,
- Because of this, when an user code wants to add zero copy as a feature, it
requires a mechanism to disable it, so it can still be accessible to less
privileged users.
Signed-off-by: Leonardo Bras <leobras@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-Id: <20220513062836.965425-4-leobras@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Add flags to io_writev and introduce io_flush as optional callback to
QIOChannelClass, allowing the implementation of zero copy writes by
subclasses.
How to use them:
- Write data using qio_channel_writev*(...,QIO_CHANNEL_WRITE_FLAG_ZERO_COPY),
- Wait write completion with qio_channel_flush().
Notes:
As some zero copy write implementations work asynchronously, it's
recommended to keep the write buffer untouched until the return of
qio_channel_flush(), to avoid the risk of sending an updated buffer
instead of the buffer state during write.
As io_flush callback is optional, if a subclass does not implement it, then:
- io_flush will return 0 without changing anything.
Also, some functions like qio_channel_writev_full_all() were adapted to
receive a flag parameter. That allows shared code between zero copy and
non-zero copy writev, and also an easier implementation on new flags.
Signed-off-by: Leonardo Bras <leobras@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-Id: <20220513062836.965425-3-leobras@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
The function isn't used outside of qio_channel_command_new_spawn(),
which is !win32-specific.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Adds qio_channel_readv_full_all_eof() and qio_channel_readv_full_all()
to read both data and FDs. Refactors existing code to use these helpers.
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Acked-by: Daniel P. Berrangé <berrange@redhat.com>
Message-id: b059c4cc0fb741e794d644c144cc21372cad877d.1611938319.git.jag.raman@oracle.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Adds qio_channel_writev_full_all() to transmit both data and FDs.
Refactors existing code to use this helper.
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Daniel P. Berrangé <berrange@redhat.com>
Message-id: 480fbf1fe4152495d60596c9b665124549b426a5.1611938319.git.jag.raman@oracle.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Migration and yank code assume that qio_channel_shutdown is thread
-safe and can be called from qmp oob handler. Document this after
checking the code.
Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <32b8c27e256da043f0f00db05bd7ab8fbc506070.1609167865.git.lukasstraub2@web.de>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
There is no "version 2" of the "Lesser" General Public License.
It is either "GPL version 2.0" or "Lesser GPL version 2.1".
This patch replaces all occurrences of "Lesser GPL version 2" with
"Lesser GPL version 2.1" in comment section.
Signed-off-by: Chetan Pant <chetan4windows@gmail.com>
Acked-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20201014134033.14095-1-chetan4windows@gmail.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
One of the goals of having less boilerplate on QOM declarations
is to avoid human error. Requiring an extra argument that is
never used is an opportunity for mistakes.
Remove the unused argument from OBJECT_DECLARE_TYPE and
OBJECT_DECLARE_SIMPLE_TYPE.
Coccinelle patch used to convert all users of the macros:
@@
declarer name OBJECT_DECLARE_TYPE;
identifier InstanceType, ClassType, lowercase, UPPERCASE;
@@
OBJECT_DECLARE_TYPE(InstanceType, ClassType,
- lowercase,
UPPERCASE);
@@
declarer name OBJECT_DECLARE_SIMPLE_TYPE;
identifier InstanceType, lowercase, UPPERCASE;
@@
OBJECT_DECLARE_SIMPLE_TYPE(InstanceType,
- lowercase,
UPPERCASE);
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Acked-by: Cornelia Huck <cohuck@redhat.com>
Acked-by: Igor Mammedov <imammedo@redhat.com>
Acked-by: Paul Durrant <paul@xen.org>
Acked-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20200916182519.415636-4-ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
The requirement to specify the parent class type makes the macro
harder to use and easy to misuse (silent bugs can be introduced
if the wrong struct type is specified).
Simplify the macro by just not declaring any class struct,
allowing us to remove the class_size field from the TypeInfo
variables for those types.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20200916182519.415636-3-ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Some typedefs and macros are defined after the type check macros.
This makes it difficult to automatically replace their
definitions with OBJECT_DECLARE_TYPE.
Patch generated using:
$ ./scripts/codeconverter/converter.py -i \
--pattern=QOMStructTypedefSplit $(git grep -l '' -- '*.[ch]')
which will split "typdef struct { ... } TypedefName"
declarations.
Followed by:
$ ./scripts/codeconverter/converter.py -i --pattern=MoveSymbols \
$(git grep -l '' -- '*.[ch]')
which will:
- move the typedefs and #defines above the type check macros
- add missing #include "qom/object.h" lines if necessary
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-Id: <20200831210740.126168-9-ehabkost@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-Id: <20200831210740.126168-10-ehabkost@redhat.com>
Message-Id: <20200831210740.126168-11-ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
We need "qom/object.h" to call object_ref()/object_unref(),
and to test the TYPE_DUMMY.
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20200504115656.6045-3-f4bug@amsat.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Daniel P. Berrangé" <berrange@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20191204093625.14836-4-armbru@redhat.com>
Acked-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
No header includes qemu-common.h after this commit, as prescribed by
qemu-common.h's file comment.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20190523143508.25387-5-armbru@redhat.com>
[Rebased with conflicts resolved automatically, except for
include/hw/arm/xlnx-zynqmp.h hw/arm/nrf51_soc.c hw/arm/msf2-soc.c
block/qcow2-refcount.c block/qcow2-cluster.c block/qcow2-cache.c
target/arm/cpu.h target/lm32/cpu.h target/m68k/cpu.h target/mips/cpu.h
target/moxie/cpu.h target/nios2/cpu.h target/openrisc/cpu.h
target/riscv/cpu.h target/tilegx/cpu.h target/tricore/cpu.h
target/unicore32/cpu.h target/xtensa/cpu.h; bsd-user/main.c and
net/tap-bsd.c fixed up]
Similar to how qemu_co_sleep_ns() allows preemption from an external
coroutine entry, allow reentering qio_channel_yield() early.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Add the ability for a caller to wait for completion of the
background thread to synchronously dispatch its result, without
needing to wait for the main loop to run the idle callback.
This method needs very careful usage to avoid a dangerous
race condition with the free'ing of the task. The completion
callback is normally invoked from an idle callback registered
with the main loop context. The qio_task_wait_thread method
must only be called if the completion callback has not yet
run. The only safe way to achieve this is to run the
qio_task_wait_thread method from the thread that executes
the main loop.
It is generally a bad idea to use this method since it will
block execution of the main loop, however, the design of
the character devices and its usage from vhostuser already
requires blocking execution.
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20190211182442.8542-3-berrange@redhat.com>
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
GNUTLS takes a paranoid approach when seeing 0 bytes returned by the
underlying OS read() function. It will consider this an error and
return GNUTLS_E_PREMATURE_TERMINATION instead of propagating the 0
return value. It expects apps to arrange for clean termination at
the protocol level and not rely on seeing EOF from a read call to
detect shutdown. This is to harden apps against a malicious 3rd party
causing termination of the sockets layer.
This is unhelpful for the QEMU NBD code which does have a clean
protocol level shutdown, but still relies on seeing 0 from the I/O
channel read in the coroutine handling incoming replies.
The upshot is that when using a plain NBD connection shutdown is
silent, but when using TLS, the client spams the console with
Cannot read from TLS channel: Broken pipe
The NBD connection has, however, called qio_channel_shutdown()
at this point to indicate that it is done with I/O. This gives
the opportunity to optimize the code such that when the channel
has been shutdown in the read direction, the error code
GNUTLS_E_PREMATURE_TERMINATION gets turned into a '0' return
instead of an error.
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20181119134228.11031-1-berrange@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
A new parameter "context" is added to qio_channel_tls_handshake() is to
allow the TLS to be run on a non-default context. Still, no functional
change.
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
We have worked on qio_task_run_in_thread() already. Further, let
all the qio channel APIs use that context.
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
qio_task_run_in_thread() allows main thread to run blocking operations
in the background. However it has an assumption on that it's always
working with the default context. This patch tries to allow the threaded
QIO task framework to run with non-default gcontext.
Currently no functional change so far, so the QIOTasks are still always
running on main context.
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Originally we were storing the GSources tag IDs. That'll be not enough
if we are going to support non-default gcontext for QIO code. Switch to
GSources without changing anything real. Now we still always pass in
NULL, which means the default gcontext.
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Firstly, introduce an internal qio_channel_add_watch_full(), which
enhances qio_channel_add_watch() that context can be specified.
Then add a new API wrapper qio_channel_add_watch_source() to return a
GSource pointer rather than a tag ID.
Note that the _source() call will keep a reference of GSource so that
callers need to unref them explicitly when finished using the GSource.
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
In my "build everything" tree, a change to the types in
qapi-schema.json triggers a recompile of about 4800 out of 5100
objects.
The previous commit split up qmp-commands.h, qmp-event.h, qmp-visit.h,
qapi-types.h. Each of these headers still includes all its shards.
Reduce compile time by including just the shards we actually need.
To illustrate the benefits: adding a type to qapi/migration.json now
recompiles some 2300 instead of 4800 objects. The next commit will
improve it further.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20180211093607.27351-24-armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
[eblake: rebase to master]
Signed-off-by: Eric Blake <eblake@redhat.com>
The code wrongly passes the mode to open() only if O_WRONLY is set.
Instead, the mode should be passed when O_CREAT is set (or O_TMPFILE on
Linux). Fix this by always passing the mode since open() will correctly
ignore the mode if it is not needed. Add a testcase which exercises this
bug and also change the existing testcase to check that the mode of the
created file is correct.
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The existing QIOChannelSocket class provides the ability to
listen on a single socket at a time. This patch introduces
a QIONetListener class that provides a higher level API
concept around listening for network services, allowing
for listening on multiple sockets.
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Currently most outbound I/O on the websock channel gets copied into the
rawoutput buffer, and then immediately copied again into the encoutput
buffer, with a header prepended. Now that qio_channel_websock_encode
accepts a struct iovec, we can trivially remove this bounce buffering
and write directly to encoutput.
In doing so, we also now correctly validate the encoutput size against
the QIO_CHANNEL_WEBSOCK_MAX_BUFFER limit.
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
We must ensure we don't get flooded with ping replies if the outbound
channel is slow. Currently we do this by keeping the ping reply in a
separate temporary buffer and only writing it if the encoutput buffer
is completely empty. This is overly pessimistic, as it is reasonable
to add a ping reply to the encoutput buffer even if it has previous
data in it, as long as that previous data doesn't include a ping
reply.
To track this better, put the ping reply directly into the encoutput
buffer, and then record the size of encoutput at this time in
pong_remain. As we write encoutput to the underlying channel, we
can decrement the pong_remain counter. Once it hits zero, we can
accept further ping replies for transmission.
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Add an immediate ping reply (pong) to the outgoing stream when a ping
is received. Unsolicited pongs are ignored.
Signed-off-by: Brandon Carpenter <brandon.carpenter@cypherpath.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Allows fragmented binary frames by saving the previous opcode. Handles
the case where an intermediary (i.e., web proxy) fragments frames
originally sent unfragmented by the client.
Signed-off-by: Brandon Carpenter <brandon.carpenter@cypherpath.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Some callers want to distinguish between clean EOF (no bytes read)
vs. a short read (at least one byte read, but EOF encountered
before reaching the desired length), as it allows clients the
ability to do a graceful shutdown when a server shuts down at
defined safe points in the protocol, rather than treating all
shutdown scenarios as an error due to EOF. However, we don't want
to require all callers to have to check for early EOF. So add
another wrapper function that can be used by the callers that care
about the distinction.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20170905191114.5959-3-eblake@redhat.com>
Acked-by: Daniel P. Berrange <berrange@redhat.com>
These functions wait until they are able to read / write the full
requested data buffer(s).
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
SocketAddressLegacy is a simple union, and simple unions are awkward:
they have their variant members wrapped in a "data" object on the
wire, and require additional indirections in C. SocketAddress is the
equivalent flat union. Convert all users of SocketAddressLegacy to
SocketAddress, except for existing external interfaces.
See also commit fce5d53..9445673 and 85a82e8..c5f1ae3.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <1493192202-3184-7-git-send-email-armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
[Minor editing accident fixed, commit message and a comment tweaked]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
The next commit will rename SocketAddressFlat to SocketAddress, and
the commit after that will replace most uses of SocketAddressLegacy by
SocketAddress, replacing most of this commit's renames right back.
Note that checkpatch emits a few "line over 80 characters" warnings.
The long lines are all temporary; the SocketAddressLegacy replacement
will shorten them again.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <1493192202-3184-5-git-send-email-armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>