Move blk_exp_close_all() from bdrv_close() to qemu_cleanup(), before
bdrv_drain_all_begin().
Export drivers may have coroutines yielding at some point in the block
layer, so we need to shut them down before draining the block layer,
as otherwise they may get stuck blk_wait_while_drained().
RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1900505
Signed-off-by: Sergio Lopez <slp@redhat.com>
Message-Id: <20210201125032.44713-3-slp@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Some graphs may contain an indirect reference to the first BDS in the
chain that can be reached while walking it bottom->up from one its
children.
Doubling-processing of a BDS is especially problematic for the
aio_notifiers, as they might attempt to work on both the old and the
new AIO contexts.
To avoid this problem, add every child and parent to the ignore list
before actually processing them.
Suggested-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Sergio Lopez <slp@redhat.com>
Message-Id: <20210201125032.44713-2-slp@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Provide API for insertion a node to backing chain.
Suggested-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20201216061703.70908-3-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEENUvIs9frKmtoZ05fOHC0AOuRhlMFAl/dynUSHGFybWJydUBy
ZWRoYXQuY29tAAoJEDhwtADrkYZT3igP/3bWwsKR5vKVsDUTmMfrhcgaFvQiaYoG
F29Bond8Xy0Zd0gl7OWh/5jKL0vGlrEVPrKfYLUjMnfkeRec/pOkIB2oOmIxpnPs
9zi4kh2hQ3dEoRBuvSnnZzedetYPTuCpWMIjlztkgfxgcimqm8TPNVSxRaSApjC3
Y8108wGwBWVf2C0rhKO9E2xA51uo6khy05i1psUtqUlC+PuDQ/OwzQHM2dnWdDB6
kUwBDK17nhL6WwsYqCyKLSiDModReYfDiY8GS5MDLo74dzwXiatEefCR7+sbM4xq
eX/SBoqoeS1jLPNuCryNeGNKvNA2KAbEJTnbQA2NxBXHgZ9/1SxVZFxuPp4nDMSQ
N7BDuDI8YtJE479RjT/ZzRG65xadGBSe/HXkXM9mZwh1zitop8SVZ9fArFBHvNzw
Y5zAv3fQd54+87psffg4dYFK0wGmqTabLEEuVzM8KIVqcAdYA2yC2b2EHy+vsxuq
GMkr0WaA6Sq2gthXmzdTjmUPuHdan/NIhuV6d66SbPNH2oH31piptFxuznyFWSKV
isciFFdUrkg5QrF8DSt2nmdwMFf8QGbszqP8QIGMzhJCCS9GXIiGG8f149++q8X8
HO1lFAdLQJdrDwCYmfx36tOvi2rS/rcoTGgvg66UX3xKko1ruoxR1ZWcS54obJN6
vEQDZ+PxubDg
=vGLy
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/armbru/tags/pull-qapi-2020-12-19' into staging
QAPI patches patches for 2020-12-19
# gpg: Signature made Sat 19 Dec 2020 09:40:05 GMT
# gpg: using RSA key 354BC8B3D7EB2A6B68674E5F3870B400EB918653
# gpg: issuer "armbru@redhat.com"
# gpg: Good signature from "Markus Armbruster <armbru@redhat.com>" [full]
# gpg: aka "Markus Armbruster <armbru@pond.sub.org>" [full]
# Primary key fingerprint: 354B C8B3 D7EB 2A6B 6867 4E5F 3870 B400 EB91 8653
* remotes/armbru/tags/pull-qapi-2020-12-19: (33 commits)
qobject: Make QString immutable
block: Use GString instead of QString to build filenames
keyval: Use GString to accumulate value strings
json: Use GString instead of QString to accumulate strings
migration: Replace migration's JSON writer by the general one
qobject: Factor JSON writer out of qobject_to_json()
qobject: Factor quoted_str() out of to_json()
qobject: Drop qstring_get_try_str()
qobject: Drop qobject_get_try_str()
Revert "qobject: let object_property_get_str() use new API"
block: Avoid qobject_get_try_str()
qmp: Fix tracing of non-string command IDs
qobject: Move internals to qobject-internal.h
hw/rdma: Replace QList by GQueue
Revert "qstring: add qstring_free()"
qobject: Change qobject_to_json()'s value to GString
qobject: Use GString instead of QString to accumulate JSON
qobject: Make qobject_to_json_pretty() take a pretty argument
monitor: Use GString instead of QString for output buffer
hmp: Simplify how qmp_human_monitor_command() gets output
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
QString supports modifying its string, but it's quite limited: you can
only append. Just one caller remains:
bdrv_parse_filename_strip_prefix() uses it just for building an
initial string.
Change it to do build the initial string with GString. This is
another step towards making QString immutable.
Cc: Kevin Wolf <kwolf@redhat.com>
Cc: Max Reitz <mreitz@redhat.com>
Cc: qemu-block@nongnu.org
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20201211171152.146877-20-armbru@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
I'm about to remove qobject_get_try_str(). Use qstring_get_str()
instead. Safe because the argument is known to be a QString here.
Cc: Kevin Wolf <kwolf@redhat.com>
Cc: Max Reitz <mreitz@redhat.com>
Cc: qemu-block@nongnu.org
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20201211171152.146877-11-armbru@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
qobject_to_json() and qobject_to_json_pretty() build a GString, then
covert it to QString. Just one of the callers actually needs a
QString: qemu_rbd_parse_filename(). A few others need a string they
can modify: qmp_send_response(), qga's send_response(), to_json_str(),
and qmp_fd_vsend_fds(). The remainder just need a string.
Change qobject_to_json() and qobject_to_json_pretty() to return the
GString.
qemu_rbd_parse_filename() now has to convert to QString. All others
save a QString temporary. to_json_str() actually becomes a bit
simpler, because GString provides more convenient modification
functions.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20201211171152.146877-6-armbru@redhat.com>
Do generic processing even for drivers which define .bdrv_check_perm
handler. It's needed for further preallocate filter: it will need to do
additional action on bdrv_check_perm, but don't want to reimplement
generic logic.
The patch doesn't change existing behaviour: the only driver that
implements bdrv_check_perm is file-posix, but it never has any
children.
Also, bdrv_set_perm() don't stop processing if driver has
.bdrv_set_perm handler as well.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20201021145859.11201-8-vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
The only users of this thing are:
1. bdrv_child_try_set_perm, to ignore failures on loosen restrictions
2. assertion in bdrv_replace_child
3. assertion in bdrv_inactivate_recurse
Assertions are not enough reason for overcomplication the permission
update system. So, look at bdrv_child_try_set_perm.
We are interested in tighten_restrictions only on failure. But on
failure this field is not reliable: we may fail in the middle of
permission update, some nodes are not touched and we don't know should
their permissions be tighten or not. So, we rely on the fact that if we
loose restrictions on some node (or BdrvChild), we'll not tighten
restriction in the whole subtree as part of this update (assertions 2
and 3 rely on this fact as well). And, if we rely on this fact anyway,
we can just check it on top, and don't pass additional pointer through
the whole recursive infrastructure.
Note also, that further patches will fix real bugs in permission update
system, so now is good time to simplify it, as a help for further
refactorings.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20201106124241.16950-8-vsementsov@virtuozzo.com>
[mreitz: Fixed rebase conflict]
Signed-off-by: Max Reitz <mreitz@redhat.com>
We must set the permission used for _check_. Assert that we have
backup and drop extra arguments.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20201106124241.16950-7-vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
We should never set permissions other than cumulative permissions of
parents. During bdrv_reopen_multiple() we _check_ for synthetic
permissions but when we do _set_ the graph is already updated.
Add an assertion to bdrv_reopen_multiple(), other cases are more
obvious.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20201106124241.16950-6-vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Make separate function for common pattern.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20201106124241.16950-5-vsementsov@virtuozzo.com>
[mreitz: Squashed in
https://lists.nongnu.org/archive/html/qemu-block/2020-11/msg00299.html]
Signed-off-by: Max Reitz <mreitz@redhat.com>
We are going to modify block layer to work with 64bit requests. And
first step is moving to int64_t type for both offset and bytes
arguments in all block request related functions.
It's mostly safe (when widening signed or unsigned int to int64_t), but
switching from uint64_t is questionable.
So, let's first establish the set of requests we want to work with.
First signed int64_t should be enough, as off_t is signed anyway. Then,
obviously offset + bytes should not overflow.
And most interesting: (offset + bytes) being aligned up should not
overflow as well. Aligned to what alignment? First thing that comes in
mind is bs->bl.request_alignment, as we align up request to this
alignment. But there is another thing: look at
bdrv_mark_request_serialising(). It aligns request up to some given
alignment. And this parameter may be bdrv_get_cluster_size(), which is
often a lot greater than bs->bl.request_alignment.
Note also, that bdrv_mark_request_serialising() uses signed int64_t for
calculations. So, actually, we already depend on some restrictions.
Happily, bdrv_get_cluster_size() returns int and
bs->bl.request_alignment has 32bit unsigned type, but defined to be a
power of 2 less than INT_MAX. So, we may establish, that INT_MAX is
absolute maximum for any kind of alignment that may occur with the
request.
Note, that bdrv_get_cluster_size() is not documented to return power
of 2, still bdrv_mark_request_serialising() behaves like it is.
Also, backup uses bdi.cluster_size and is not prepared to it not being
power of 2.
So, let's establish that Qemu supports only power-of-2 clusters and
alignments.
So, alignment can't be greater than 2^30.
Finally to be safe with calculations, to not calculate different
maximums for different nodes (depending on cluster size and
request_alignment), let's simply set QEMU_ALIGN_DOWN(INT64_MAX, 2^30)
as absolute maximum bytes length for Qemu. Actually, it's not much less
than INT64_MAX.
OK, then, let's apply it to block/io.
Let's consider all block/io entry points of offset/bytes:
4 bytes/offset interface functions: bdrv_co_preadv_part(),
bdrv_co_pwritev_part(), bdrv_co_copy_range_internal() and
bdrv_co_pdiscard() and we check them all with bdrv_check_request().
We also have one entry point with only offset: bdrv_co_truncate().
Check the offset.
And one public structure: BdrvTrackedRequest. Happily, it has only
three external users:
file-posix.c: adopted by this patch
write-threshold.c: only read fields
test-write-threshold.c: sets obviously small constant values
Better is to make the structure private and add corresponding
interfaces.. Still it's not obvious what kind of interface is needed
for file-posix.c. Let's keep it public but add corresponding
assertions.
After this patch we'll convert functions in block/io.c to int64_t bytes
and offset parameters. We can assume that offset/bytes pair always
satisfy new restrictions, and make
corresponding assertions where needed. If we reach some offset/bytes
point in block/io.c missing bdrv_check_request() it is considered a
bug. As well, if block/io.c modifies a offset/bytes request, expanding
it more then aligning up to request_alignment, it's a bug too.
For all io requests except for discard we keep for now old restriction
of 32bit request length.
iotest 206 output error message changed, as now test disk size is
larger than new limit. Add one more test case with new maximum disk
size to cover too-big-L1 case.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20201203222713.13507-5-vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
block-export-add type=fuse allows mounting block graph nodes via FUSE on
some existing regular file. That file should then appears like a raw
disk image, and accesses to it result in accesses to the exported BDS.
Right now, we only implement the necessary block export functions to set
it up and shut it down. We do not implement any access functions, so
accessing the mount point only results in errors. This will be
addressed by a followup patch.
We keep a hash table of exported mount points, because we want to be
able to detect when users try to use a mount point twice. This is
because we invoke stat() to check whether the given mount point is a
regular file, but if that file is served by ourselves (because it is
already used as a mount point), then this stat() would have to be served
by ourselves, too, which is impossible to do while we (as the caller)
are waiting for it to settle. Therefore, keep track of mount point
paths to at least catch the most obvious instances of that problem.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20201027190600.192171-3-mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
First, permission update loop tries to do iterations transactionally,
but the whole update is not transactional: nobody roll-back successful
loop iterations when some iteration fails.
Second, in the iteration we have nested permission update:
c->klass->update_filename may point to bdrv_child_cb_update_filename()
which calls bdrv_backing_update_filename(), which may do node reopen to
RW.
Permission update system is not prepared to nested updates, at least it
has intermediate permission-update state stored in BdrvChild
structures: has_backup_perm, backup_perm and backup_shared_perm.
So, let's first do bdrv_replace_node_common() (which is more
transactional than open-coded update in bdrv_drop_intermediate()) and
then call update_filename() in separate. We still do not rollback
changes in case of update_filename() failure but it's not much worse
than pre-patch behavior.
Note that bdrv_replace_node_common() does check for frozen children,
so corresponding check is dropped in bdrv_drop_intermediate().
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20201106124241.16950-4-vsementsov@virtuozzo.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Add new parameter to bdrv_replace_node(): auto_skip. With
auto_skip=false we'll have stricter behavior: update _all_ from
parents or fail. New behaviour will be used in the following commit in
block.c, so keep original function name as public interface.
Note: new error message is a bit funny in contrast with further
"Cannot" in case of frozen child, but we'd better keep some difference
to make it possible to distinguish one from another on failure. Still,
actually we'd better refactor should_update_child() call to distinguish
also different kinds of "should not". Let's do it later.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20201106124241.16950-3-vsementsov@virtuozzo.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20201106124241.16950-2-vsementsov@virtuozzo.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Back in 2015, we attempted to fix error reporting for images that
claimed to have more than INT64_MAX/512 sectors, but due to the type
promotions caused by BDRV_SECTOR_SIZE being unsigned, this
inadvertently forces all negative ret values to be slammed into -EFBIG
rather than the original error. While we're at it, we can avoid the
confusing ?: by spelling the logic more directly.
Fixes: 4a9c9ea0d3
Reported-by: Guoyi Tu <tu.guoyi@h3c.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20201105155122.60943-1-eblake@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
block.c has a useful macro QAPI_LIST_ADD() for inserting at the front
of any QAPI-generated list; move it from block.c to qapi/util.h so
more places can use it, including one earlier place in block.c, and
rename it to something more obvious (since we also have a lot of
places that append, rather than prepend, to a list).
There are many more places in the codebase that can benefit from using
the macro, but converting them will be left to later patches.
In theory, all QAPI list types are child classes of GenericList; but
in practice, that relationship is not explicitly spelled out in the C
type declarations (rather, it is something that happens implicitly due
to C compatible layouts), and the macro does not actually depend on
the GenericList type. We considered moving GenericList from visitor.h
into util.h to group related code; however, such a move would be
awkward if we do not also move GenericAlternate. Unfortunately,
moving GenericAlternate would introduce its own problems of
declaration circularity (qapi-builtin-types.h needs a complete
definition of QEnumLookup from util.h, but GenericAlternate needs a
complete definition of QType from qapi-builtin-types.h).
Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20201027050556.269064-3-eblake@redhat.com>
[eblake: s/ADD/PREPEND/ per suggestion by Markus]
There is no need to rely on the verbosity of the gcc/clang compiler
extension of g_new(typeof(X), 1) when we can instead use the standard
g_malloc(sizeof(X)). In general, we like g_new over g_malloc for
returning type X rather than void* to let the compiler catch more
potential typing mistakes, but in this particular macro, our other use
of typeof on the same line already ensures we are getting correct
results.
Suggested-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20201027050556.269064-2-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
If a BDS gets deleted during blk_drain_all(), it might miss a
call to bdrv_do_drained_end(). This means missing a call to
aio_enable_external() and the AIO context remains disabled for
ever. This can cause a device to become irresponsive and to
disrupt the guest execution, ie. hang, loop forever or worse.
This scenario is quite easy to encounter with virtio-scsi
on POWER when punching multiple blockdev-create QMP commands
while the guest is booting and it is still running the SLOF
firmware. This happens because SLOF disables/re-enables PCI
devices multiple times via IO/MEM/MASTER bits of PCI_COMMAND
register after the initial probe/feature negotiation, as it
tends to work with a single device at a time at various stages
like probing and running block/network bootloaders without
doing a full reset in-between. This naturally generates many
dataplane stops and starts, and thus many drain sections that
can race with blockdev_create_run(). In the end, SLOF bails
out.
It is somehow reproducible on x86 but it requires to generate
articial dataplane start/stop activity with stop/cont QMP
commands. In this case, seabios ends up looping for ever,
waiting for the virtio-scsi device to send a response to
a command it never received.
Add a helper that pairs all previously called bdrv_do_drained_begin()
with a bdrv_do_drained_end() and call it from bdrv_close().
While at it, update the "/bdrv-drain/graph-change/drain_all"
test in test-bdrv-drain so that it can catch the issue.
BugId: https://bugzilla.redhat.com/show_bug.cgi?id=1874441
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <160346526998.272601.9045392804399803158.stgit@bahia.lan>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Inside of coroutine context, we can't directly use aio_context_acquire()
for the AioContext of a block node because we already own the lock of
the current AioContext and we need to avoid double locking to prevent
deadlocks.
This provides helper functions to lock the AioContext of a node only if
it's not the same as the current AioContext.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20201005155855.256490-14-kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Add a pair of functions to temporarily move the current coroutine to the
AioContext of a given BlockDriverState.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20201005155855.256490-13-kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Use code generation implemented in previous commit to generated
coroutine wrappers in block.c and block/io.c
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20200924185414.28642-6-vsementsov@virtuozzo.com>
We are going to keep coroutine-wrappers code (structure-packing
parameters, BDRV_POLL wrapper functions) in separate auto-generated
files. So, we'll need a header with declaration of original _co_
functions, for those which are static now. As well, we'll need
declarations for wrapper functions. Do these declarations now, as a
preparation step.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20200924185414.28642-4-vsementsov@virtuozzo.com>
This is the only coroutine wrapper from block.c and block/io.c which
doesn't return a value, so let's convert it to the common behavior, to
simplify moving to generated coroutine wrappers in a further commit.
Also, bdrv_invalidate_cache is a void function, returning error only
through **errp parameter, which is considered to be bad practice, as
it forces callers to define and propagate local_err variable, so
conversion is good anyway.
This patch leaves the conversion of .bdrv_co_invalidate_cache() driver
callbacks and bdrv_invalidate_cache_all() for another day.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20200924185414.28642-2-vsementsov@virtuozzo.com>
This adds a function to shut down all block exports, and another one to
shut down the block exports of a single type. The latter is used for now
when stopping the NBD server. As soon as we implement support for
multiple NBD servers, we'll need a per-server list of exports and it
will be replaced by a function using that.
As a side effect, the BlockExport layer has a list tracking all existing
exports now. closed_exports loses its only user and can go away.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200924152717.287415-18-kwolf@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
clang's C11 atomic_fetch_*() functions only take a C11 atomic type
pointer argument. QEMU uses direct types (int, etc) and this causes a
compiler error when a QEMU code calls these functions in a source file
that also included <stdatomic.h> via a system header file:
$ CC=clang CXX=clang++ ./configure ... && make
../util/async.c:79:17: error: address argument to atomic operation must be a pointer to _Atomic type ('unsigned int *' invalid)
Avoid using atomic_*() names in QEMU's atomic.h since that namespace is
used by <stdatomic.h>. Prefix QEMU's APIs with 'q' so that atomic.h
and <stdatomic.h> can co-exist. I checked /usr/include on my machine and
searched GitHub for existing "qatomic_" users but there seem to be none.
This patch was generated using:
$ git grep -h -o '\<atomic\(64\)\?_[a-z0-9_]\+' include/qemu/atomic.h | \
sort -u >/tmp/changed_identifiers
$ for identifier in $(</tmp/changed_identifiers); do
sed -i "s%\<$identifier\>%q$identifier%g" \
$(git grep -I -l "\<$identifier\>")
done
I manually fixed line-wrap issues and misaligned rST tables.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20200923105646.47864-1-stefanha@redhat.com>
I found that there are many spelling errors in the comments of qemu,
so I used the spellcheck tool to check the spelling errors
and finally found some spelling errors in the folder.
Signed-off-by: zhaolichang <zhaolichang@huawei.com>
Reviewed-by: Alex Bennee <alex.bennee@linaro.org>
Message-Id: <20200917075029.313-2-zhaolichang@huawei.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Parts of the block layer treat BDS.backing_file as if it were whatever
the image header says (i.e., if it is a relative path, it is relative to
the overlay), other parts treat it like a cache for
bs->backing->bs->filename (relative paths are relative to the CWD).
Considering bs->backing->bs->filename exists, let us make it mean the
former.
Among other things, this now allows the user to specify a base when
using qemu-img to commit an image file in a directory that is not the
CWD (assuming, everything uses relative filenames).
Before this patch:
$ ./qemu-img create -f qcow2 foo/bot.qcow2 1M
$ ./qemu-img create -f qcow2 -b bot.qcow2 foo/mid.qcow2
$ ./qemu-img create -f qcow2 -b mid.qcow2 foo/top.qcow2
$ ./qemu-img commit -b mid.qcow2 foo/top.qcow2
qemu-img: Did not find 'mid.qcow2' in the backing chain of 'foo/top.qcow2'
$ ./qemu-img commit -b foo/mid.qcow2 foo/top.qcow2
qemu-img: Did not find 'foo/mid.qcow2' in the backing chain of 'foo/top.qcow2'
$ ./qemu-img commit -b $PWD/foo/mid.qcow2 foo/top.qcow2
qemu-img: Did not find '[...]/foo/mid.qcow2' in the backing chain of 'foo/top.qcow2'
After this patch:
$ ./qemu-img commit -b mid.qcow2 foo/top.qcow2
Image committed.
$ ./qemu-img commit -b foo/mid.qcow2 foo/top.qcow2
qemu-img: Did not find 'foo/mid.qcow2' in the backing chain of 'foo/top.qcow2'
$ ./qemu-img commit -b $PWD/foo/mid.qcow2 foo/top.qcow2
Image committed.
With this change, bdrv_find_backing_image() must look at whether the
user has overridden a BDS's backing file. If so, it can no longer use
bs->backing_file, but must instead compare the given filename against
the backing node's filename directly.
Note that this changes the QAPI output for a node's backing_file. We
had very inconsistent output there (sometimes what the image header
said, sometimes the actual filename of the backing image). This
inconsistent output was effectively useless, so we have to decide one
way or the other. Considering that bs->backing_file usually at runtime
contained the path to the image relative to qemu's CWD (or absolute),
this patch changes QAPI's backing_file to always report the
bs->backing->bs->filename from now on. If you want to receive the image
header information, you have to refer to full-backing-filename.
This necessitates a change to iotest 228. The interesting information
it really wanted is the image header, and it can get that now, but it
has to use full-backing-filename instead of backing_file. Because of
this patch's changes to bs->backing_file's behavior, we also need some
reference output changes.
Along with the changes to bs->backing_file, stop updating
BDS.backing_format in bdrv_backing_attach() as well. This way,
ImageInfo's backing-filename and backing-filename-format fields will
represent what the image header says and nothing else.
iotest 245 changes in behavior: With the backing node no longer
overriding the parent node's backing_file string, you can now omit the
@backing option when reopening a node with neither a default nor a
current backing file even if it used to have a backing node at some
point.
273 also changes: The base image is opened without a format layer, so
ImageInfo.backing-filename-format used to report "file" for the base
image's overlay after blockdev-snapshot. However, the image header
never says "file" anywhere, so it now reports $IMGFMT.
Signed-off-by: Max Reitz <mreitz@redhat.com>
There are two practical problems with bdrv_get_allocated_file_size()'s
default right now:
(1) For drivers with children, we should generally sum all their sizes
instead of just passing the request through to bs->file. The latter
is good for filters, but not so much for format drivers.
(2) Filters need not have bs->file, so we should actually go to the
filtered child instead of hard-coding bs->file.
Fix this by splitting the default implementation into three branches:
(1) For filter drivers: Return the size of the filtered child
(2) For protocol drivers: Return -ENOTSUP, because the default
implementation cannot make a guess
(3) For other drivers: Sum all data-bearing children's sizes
Signed-off-by: Max Reitz <mreitz@redhat.com>
When looking for a blkdebug node (which implements debug breakpoints),
use bdrv_primary_bs() to iterate through the graph, because that is
where a blkdebug node would be.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
bdrv_refresh_filename() and the kind of related bdrv_dirname() should
look to the primary child when they wish to copy the underlying file's
filename.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reopening a node's backing child needs a bit of special handling because
the "backing" child has different defaults than all other children
(among other things). Adding filter support here is a bit more
difficult than just using the child access functions. In fact, we often
have to directly use bs->backing because these functions are about the
"backing" child (which may or may not be the COW backing file).
Signed-off-by: Max Reitz <mreitz@redhat.com>
Use child access functions when iterating through backing chains so
filters do not break the chain.
In addition, bdrv_find_overlay() will now always return the actual
overlay; that is, it will never return a filter node but only one with a
COW backing file (there may be filter nodes between that node and @bs).
Signed-off-by: Max Reitz <mreitz@redhat.com>
Places that use patterns like
if (bs->drv->is_filter && bs->file) {
... something about bs->file->bs ...
}
should be
BlockDriverState *filtered = bdrv_filter_bs(bs);
if (filtered) {
... something about @filtered ...
}
instead.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Filters cannot compress data themselves but they have to implement
.bdrv_co_pwritev_compressed() still (or they cannot forward compressed
writes). Therefore, checking whether
bs->drv->bdrv_co_pwritev_compressed is non-NULL is not sufficient to
know whether the node can actually handle compressed writes. This
function looks down the filter chain to see whether there is a
non-filter that can actually convert the compressed writes into
compressed data (and thus normal writes).
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
The original purpose of bdrv_is_encrypted() was to inquire whether a BDS
can be used without the user entering a password or not. It has not
been used for that purpose for quite some time.
Actually, it is not even fit for that purpose, because to answer that
question, it would have recursively query all of the given node's
children.
So now we have to decide in which direction we want to fix
bdrv_is_encrypted(): Recursively query all children, or drop it and just
use bs->encrypted to get the current node's status?
Nowadays, its only purpose is to report through bdrv_query_image_info()
whether the given image is encrypted or not. For this purpose, it is
probably more interesting to see whether a given node itself is
encrypted or not (otherwise, a management application cannot discern for
certain which nodes are really encrypted and which just have encrypted
children).
Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
In order to make filters work in backing chains, the associated
functions must be able to deal with them and freeze both COW and filter
child links.
While at it, add some comments that note which functions require their
caller to ensure that a given child link is not frozen, and how the
callers do so.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
bdrv_set_backing_hd() is a function that explicitly cares about the
bs->backing child. Highlight that in its description and use
child_bs(bs->backing) instead of backing_bs(bs) to make it more obvious.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
bdrv_has_zero_init() should use bdrv_cow_child() if it wants to check
whether the given BDS has a COW backing file.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
There are BDS children that the general block layer code can access,
namely bs->file and bs->backing. Since the introduction of filters and
external data files, their meaning is not quite clear. bs->backing can
be a COW source, or it can be a filtered child; bs->file can be a
filtered child, it can be data and metadata storage, or it can be just
metadata storage.
This overloading really is not helpful. This patch adds functions that
retrieve the correct child for each exact purpose. Later patches in
this series will make use of them. Doing so will allow us to handle
filter nodes in a meaningful way.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Providing an empty string for the backing file parameter like so:
qemu-img create -f qcow2 -b '' /tmp/foo
allows the flow of control to reach and subsequently fail an assert
statement because passing an empty string to
bdrv_get_full_backing_filename_from_filename()
simply results in NULL being returned without an error being raised.
To fix this, let's check for an empty string when getting the value from
the opts list.
Reported-by: Attila Fazekas <afazekas@redhat.com>
Fixes: https://bugzilla.redhat.com/1809553
Signed-off-by: Connor Kuehl <ckuehl@redhat.com>
Message-Id: <20200813134722.802180-1-ckuehl@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Move the create-config logic to meson.build; create a
configuration_data object and let meson handle the
quoting and output.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Unaligned requests will automatically be aligned to bl.request_alignment
and we can't extend write requests to access space beyond the end of the
image without resizing the image, so if we have the WRITE permission,
but not the RESIZE one, it's required that the image size is aligned.
Failing to meet this requirement could cause assertion failures like
this if RESIZE permissions weren't requested:
qemu-img: block/io.c:1910: bdrv_co_write_req_prepare: Assertion `end_sector <= bs->total_sectors || child->perm & BLK_PERM_RESIZE' failed.
This was e.g. triggered by qemu-img converting to a target image with 4k
request alignment when the image was only aligned to 512 bytes, but not
to 4k.
Turn this into a graceful error in bdrv_check_perm() so that WRITE
without RESIZE can only be taken if the image size is aligned. If a user
holds both permissions and drops only RESIZE, the function will return
an error, but bdrv_child_try_set_perm() will ignore the failure silently
if permissions are only requested to be relaxed and just keep both
permissions while returning success.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20200716142601.111237-2-kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Creating an image that requires format probing of the backing image is
potentially unsafe (we've had several CVEs over the years based on
probes leaking information to the guest on a subsequent boot, although
these days tools like libvirt are aware of the issue enough to prevent
the worst effects). For example, if our probing algorithm ever
changes, or if other tools like libvirt determine a different probe
result than we do, then subsequent use of that backing file under a
different format will present corrupted data to the guest.
Fortunately, the worst effects occur only when the backing image is
originally raw, and we at least prevent commit into a probed raw
backing file that would change its probed type.
Still, it is worth starting a deprecation clock so that future
qemu-img can refuse to create backing chains that would rely on
probing, to encourage clients to avoid unsafe practices. Most
warnings are intentionally emitted from bdrv_img_create() in the block
layer, but qemu-img convert uses bdrv_create() which cannot emit its
own warning without causing spurious warnings on other code paths. In
the end, all command-line image creation or backing file rewriting now
performs a check.
Furthermore, if we probe a backing file as non-raw, then it is safe to
explicitly record that result (rather than relying on future probes);
only where we probe a raw image do we care about further warnings to
the user when using such an image (for example, commits into a
probed-raw backing file are prevented), to help them improve their
tooling. But whether or not we make the probe results explicit, we
still warn the user to remind them to upgrade their workflow to supply
-F always.
iotest 114 specifically wants to create an unsafe image for later
amendment rather than defaulting to our new default of recording a
probed format, so it needs an update. While touching it, expand it to
cover all of the various warnings enabled by this patch. iotest 301
also shows a change to qcow messages.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200706203954.341758-11-eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
For now, this is a mechanical addition; all callers pass false. But
the next patch will use it to improve 'qemu-img rebase -u' when
selecting a backing file with no format.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Message-Id: <20200706203954.341758-10-eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Back in commit 6e6e55f5 (Jul 2017, v2.10), we tweaked the code to warn
if the backing file could not be opened but the user gave a size,
unless the user also passes the -u option to bypass the open of the
backing file. As one common reason for failure to open the backing
file is when there is mismatch in the requested backing format in
relation to what the backing file actually contains, we actually want
to open the backing file and ensure that it has the right format in as
many cases as possible. iotest 301 for qcow demonstrates how
detecting explicit format mismatch is useful to prevent the creation
of an image that would probe differently than the user requested. Now
is the time to finally turn the warning an error, as promised.
Note that the original warning was added prior to our documentation of
an official deprecation policy (eb22aeca, also Jul 2017), and because
the warning didn't mention the word "deprecated", we never actually
remembered to document it as such. But the warning has been around
long enough that I don't see prolonging it another two releases.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200706203954.341758-7-eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
During 'qemu-img create ... 2>&1', if --quiet is not in force, we can
end up with buffered I/O in stdout that was produced before failure,
but which appears in output after failure. This is confusing; the fix
is to flush stdout prior to attempting anything that might produce an
error message. Several iotests demonstrate the resulting ordering
change now that the merged outputs now reflect chronology. (An even
better fix would be to avoid printf from within block.c altogether,
but that's much more invasive...)
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200706203954.341758-2-eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
When all we do with an Error we receive into a local variable is
propagating to somewhere else, we can just as well receive it there
right away, even when we need to keep error_propagate() for other
error paths.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200707160613.848843-38-armbru@redhat.com>
When all we do with an Error we receive into a local variable is
propagating to somewhere else, we can just as well receive it there
right away. The previous commit did that with a Coccinelle script I
consider fairly trustworthy. This commit uses the same script with
the matching of return taken out, i.e. we convert
if (!foo(..., &err)) {
...
error_propagate(errp, err);
...
}
to
if (!foo(..., errp)) {
...
...
}
This is unsound: @err could still be read between afterwards. I don't
know how to express "no read of @err without an intervening write" in
Coccinelle. Instead, I manually double-checked for uses of @err.
Suboptimal line breaks tweaked manually. qdev_realize() simplified
further to placate scripts/checkpatch.pl.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200707160613.848843-36-armbru@redhat.com>
When all we do with an Error we receive into a local variable is
propagating to somewhere else, we can just as well receive it there
right away. Convert
if (!foo(..., &err)) {
...
error_propagate(errp, err);
...
return ...
}
to
if (!foo(..., errp)) {
...
...
return ...
}
where nothing else needs @err. Coccinelle script:
@rule1 forall@
identifier fun, err, errp, lbl;
expression list args, args2;
binary operator op;
constant c1, c2;
symbol false;
@@
if (
(
- fun(args, &err, args2)
+ fun(args, errp, args2)
|
- !fun(args, &err, args2)
+ !fun(args, errp, args2)
|
- fun(args, &err, args2) op c1
+ fun(args, errp, args2) op c1
)
)
{
... when != err
when != lbl:
when strict
- error_propagate(errp, err);
... when != err
(
return;
|
return c2;
|
return false;
)
}
@rule2 forall@
identifier fun, err, errp, lbl;
expression list args, args2;
expression var;
binary operator op;
constant c1, c2;
symbol false;
@@
- var = fun(args, &err, args2);
+ var = fun(args, errp, args2);
... when != err
if (
(
var
|
!var
|
var op c1
)
)
{
... when != err
when != lbl:
when strict
- error_propagate(errp, err);
... when != err
(
return;
|
return c2;
|
return false;
|
return var;
)
}
@depends on rule1 || rule2@
identifier err;
@@
- Error *err = NULL;
... when != err
Not exactly elegant, I'm afraid.
The "when != lbl:" is necessary to avoid transforming
if (fun(args, &err)) {
goto out
}
...
out:
error_propagate(errp, err);
even though other paths to label out still need the error_propagate().
For an actual example, see sclp_realize().
Without the "when strict", Coccinelle transforms vfio_msix_setup(),
incorrectly. I don't know what exactly "when strict" does, only that
it helps here.
The match of return is narrower than what I want, but I can't figure
out how to express "return where the operand doesn't use @err". For
an example where it's too narrow, see vfio_intx_enable().
Silently fails to convert hw/arm/armsse.c, because Coccinelle gets
confused by ARMSSE being used both as typedef and function-like macro
there. Converted manually.
Line breaks tidied up manually. One nested declaration of @local_err
deleted manually. Preexisting unwanted blank line dropped in
hw/riscv/sifive_e.c.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200707160613.848843-35-armbru@redhat.com>
When creating an image fails because the format doesn't support option
"backing_file" or "backing_fmt", bdrv_img_create() first has
qemu_opt_set() put a generic error into @local_err, then puts the real
error into @errp with error_setg(), and then propagates the former to
the latter, which throws away the generic error. A bit complicated,
but works.
Now that qemu_opt_set() returns a useful value, we can simply ignore
the generic error instead.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20200707160613.848843-16-armbru@redhat.com>
The function has only one user: bdrv_co_block_status(). Inline it to
simplify reviewing of the following patches, which will finally drop
unallocated_blocks_are_zero field too.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200528094405.145708-3-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
'force' option will be used for some unsafe amend operations.
This includes things like erasing last keyslot in luks based formats
(which destroys the data, unless the master key is backed up
by external means), but that _might_ be desired result.
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200608094030.670121-4-mlevitsk@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Commit 93676c88 relaxed our NBD client code to request export names up
to the NBD protocol maximum of 4096 bytes without NUL terminator, even
though the block layer can't store anything longer than 4096 bytes
including NUL terminator for display to the user. Since this means
there are some export names where we have to truncate things, we can
at least try to make the truncation a bit more obvious for the user.
Note that in spite of the truncated display name, we can still
communicate with an NBD server using such a long export name; this was
deemed nicer than refusing to even connect to such a server (since the
server may not be under our control, and since determining our actual
length limits gets tricky when nbd://host:port/export and
nbd+unix:///export?socket=/path are themselves variable-length
expansions beyond the export name but count towards the block layer
name length).
Reported-by: Xueqiang Wei <xuwei@redhat.com>
Fixes: https://bugzilla.redhat.com/1843684
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20200610163741.3745251-3-eblake@redhat.com>
Implementations should decide the necessary permissions based on @role.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200513110544.176672-35-mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200513110544.176672-33-mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200513110544.176672-32-mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200513110544.176672-31-mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Replace child_file by child_of_bds in all remaining places (excluding
tests).
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200513110544.176672-28-mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200513110544.176672-25-mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Make all parents of backing files pass the appropriate BdrvChildRole.
By doing so, we can switch their BdrvChildClass over to the generic
child_of_bds, which will do the right thing when given a correct
BdrvChildRole.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200513110544.176672-24-mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200513110544.176672-23-mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This callback can be used by BDSs that use child_of_bds with the
appropriate BdrvChildRole for their children.
Also, make bdrv_format_default_perms() use it for child_of_bds children
(just a temporary solution until we can drop bdrv_format_default_perms()
altogether).
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200513110544.176672-20-mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
We can be less restrictive about pure data children than those with
metadata on them, so let bdrv_default_perms_for_storage() handle
metadata children differently from pure data children.
As explained in the code, the restrictions on metadata children are
strictly stricter than those for pure data children, so in theory we
just have to distinguish between pure-data and all other storage
children (pure metadata or data+metadata). In practice, that is not
obvious, though, so we have two independent code paths for metadata and
for data children, and data+metadata children will go through both
(without the path for data children doing anything meaningful).
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200513110544.176672-19-mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Right now, bdrv_format_default_perms() is used by format parents
(generally). We want to switch to a model where most parents use a
single BdrvChildClass, which then decides the permissions based on the
child role. To do so, we have to split bdrv_format_default_perms() into
separate functions for each such role.
Note that bdrv_default_perms_for_storage() currently handles all DATA |
METADATA children. A follow-up patch is going to split it further into
one function for each case.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200513110544.176672-18-mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Right now, bdrv_format_default_perms() is used by format parents
(generally). We want to switch to a model where most parents use a
single BdrvChildClass, which then decides the permissions based on the
child role. To do so, we have to split bdrv_format_default_perms() into
separate functions for each such role.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200513110544.176672-17-mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
bdrv_format_default_perms() has one code path for backing files, and one
for storage files. We want to pull them out into their own functions,
so make sure they are completely distinct before so the next patches
will be a bit cleaner.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200513110544.176672-16-mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Any current user of child_file, child_format, and child_backing can and
should use this generic BdrvChildClass instead, as it can handle all of
these cases. However, to be able to do so, the users must pass the
appropriate BdrvChildRole when the child is created/attached. (The
following commits will take care of that.)
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200513110544.176672-15-mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Make bdrv_child_cb_detach() call bdrv_backing_detach() for children with
a COW role (and drop the reverse call from bdrv_backing_detach()), so it
can be used for any child (with a proper role set).
Because so far no child has a proper role set, we need a temporary new
callback for child_backing.detach that ensures bdrv_backing_detach() is
called for all COW children that do not have their role set yet.
(Also, move bdrv_child_cb_detach() down to group it with
bdrv_inherited_options() and bdrv_child_cb_attach().)
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200513110544.176672-14-mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Make bdrv_child_cb_attach() call bdrv_backing_attach() for children with
a COW role (and drop the reverse call from bdrv_backing_attach()), so it
can be used for any child (with a proper role set).
Because so far no child has a proper role set, we need a temporary new
callback for child_backing.attach that ensures bdrv_backing_attach() is
called for all COW children that do not have their role set yet.
(Also, move bdrv_child_cb_attach() down to group it with
bdrv_inherited_options().)
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200513110544.176672-13-mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Let child_file's, child_format's, and child_backing's .inherit_options()
implementations fall back to bdrv_inherited_options() to show that it
would really work for all of these cases, if only the parents passed the
appropriate BdrvChildRole and parent_is_format values.
(Also, make bdrv_open_inherit(), the only place to explicitly call
bdrv_backing_options(), call bdrv_inherited_options() instead.)
This patch should incur only two visible changes, both for child_format
children, both of which are effectively bug fixes:
First, they no longer have discard=unmap set by default. This reason it
was set is because bdrv_inherited_fmt_options() fell through to
bdrv_protocol_options(), and that set it because "format drivers take
care to send flushes and respect unmap policy". None of the drivers
that use child_format for their children (quorum and blkverify) are
format drivers, though, so this reasoning does not apply here.
Second, they no longer have BDRV_O_NO_IO force-cleared. child_format
was used solely for children that do not store any metadata and as such
will not be accessed by their parents as long as those parents do not
receive I/O themselves. Thus, such children should inherit
BDRV_O_NO_IO.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200513110544.176672-12-mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
After the series this patch belongs to, we want to have a common
BdrvChildClass that encompasses all of child_file, child_format, and
child_backing. Such a single class needs a single .inherit_options()
implementation, and this patch introduces it.
The next patch will show how the existing implementations can fall back
to it just by passing appropriate BdrvChildRole and parent_is_format
values.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200513110544.176672-11-mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The other two .inherit_options implementations specify exactly for what
case they are used in their name, so do it for this one as well.
(The actual intention behind this patch is to follow it up with a
generic bdrv_inherited_options() that works for all three cases.)
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200513110544.176672-10-mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
We plan to unify the generic .inherit_options() functions. The
resulting common function will need to decide whether to force-enable
format probing, force-disable it, or leave it as-is. To make this
decision, it will need to know whether the parent node is a format node
or not (because we never want format probing if the parent is a format
node already (except for the backing chain)).
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200513110544.176672-9-mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
For now, all callers (effectively) pass 0 and no callee evaluates thie
value. Later patches will change both.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200513110544.176672-8-mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
For now, all callers pass 0 and no callee evaluates this value. Later
patches will change both.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200513110544.176672-7-mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
For now, it is always set to 0. Later patches in this series will
ensure that all callers pass an appropriate combination of flags.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200513110544.176672-6-mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This structure nearly only contains parent callbacks for child state
changes. It cannot really reflect a child's role, because different
roles may overlap (as we will see when real roles are introduced), and
because parents can have custom callbacks even when the child fulfills a
standard role.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-Id: <20200513110544.176672-4-mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Right now, all users of bdrv_make_empty() call the BlockDriver method
directly. That is not only bad style, it is also wrong, unless the
caller has a BdrvChild with a WRITE or WRITE_UNCHANGED permission.
(WRITE_UNCHANGED suffices, because callers generally use this function
to clear a node with a backing file after a commit operation.)
Introduce bdrv_make_empty() that verifies that it does.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200429141126.85159-2-mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Now that there are no clients of bdrv_has_zero_init_truncate, none of
the drivers need to worry about providing it.
What's more, this eliminates a source of some confusion: a literal
reading of the documentation as written in ceaca56f and implemented in
commit 1dcaf527 claims that a driver which returns 0 for
bdrv_has_zero_init_truncate() must not return 1 for
bdrv_has_zero_init(); this condition was violated for parallels, qcow,
and sometimes for vdi, although in practice it did not matter since
those drivers also lacked .bdrv_co_truncate.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200428202905.770727-10-eblake@redhat.com>
Acked-by: Richard W.M. Jones <rjones@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Now that node level interface bdrv_truncate() supports passing request
flags to the block driver, expose this on the BlockBackend level, too.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200424125448.63318-4-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The previous few commits have made this more obvious, and removed the
one exception. Time to clarify the documentation, and drop dead error
checking.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200424084338.26803-13-armbru@redhat.com>
bdrv_root_attach_child promises to drop child_bs reference on failure.
It does it on first handled failure path, but not on the second. Fix
that.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20200324155921.23822-1-vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Instead of checking the .bdrv_co_create_opts to see if we need the
fallback, just implement the .bdrv_co_create_opts in the drivers that
need it.
This way we don't break various places that need to know if the
underlying protocol/format really supports image creation, and this way
we still allow some drivers to not support image creation.
Fixes: fd17146cd9
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1816007
Note that technically this driver reverts the image creation fallback
for the vxhs driver since I don't have a means to test it, and IMHO it
is better to leave it not supported as it was prior to generic image
creation patches.
Also drop iscsi_create_opts which was left accidentally.
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20200326011218.29230-3-mlevitsk@redhat.com>
Reviewed-by: Denis V. Lunev <den@openvz.org>
[mreitz: Fixed alignment, and moved bdrv_co_create_opts_simple() and
bdrv_create_opts_simple from block.h into block_int.h]
Signed-off-by: Max Reitz <mreitz@redhat.com>
This will allow the reuse of a single generic .bdrv_co_create
implementation for several drivers.
No functional changes.
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20200326011218.29230-2-mlevitsk@redhat.com>
Reviewed-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: Max Reitz <mreitz@redhat.com>
There is a use-after-free possible: bdrv_unref_child() leaves
bs->backing freed but not NULL. bdrv_attach_child may produce nested
polling loop due to drain, than access of freed pointer is possible.
I've produced the following crash on 30 iotest with modified code. It
does not reproduce on master, but still seems possible:
#0 __strcmp_avx2 () at /lib64/libc.so.6
#1 bdrv_backing_overridden (bs=0x55c9d3cc2060) at block.c:6350
#2 bdrv_refresh_filename (bs=0x55c9d3cc2060) at block.c:6404
#3 bdrv_backing_attach (c=0x55c9d48e5520) at block.c:1063
#4 bdrv_replace_child_noperm
(child=child@entry=0x55c9d48e5520,
new_bs=new_bs@entry=0x55c9d3cc2060) at block.c:2290
#5 bdrv_replace_child
(child=child@entry=0x55c9d48e5520,
new_bs=new_bs@entry=0x55c9d3cc2060) at block.c:2320
#6 bdrv_root_attach_child
(child_bs=child_bs@entry=0x55c9d3cc2060,
child_name=child_name@entry=0x55c9d241d478 "backing",
child_role=child_role@entry=0x55c9d26ecee0 <child_backing>,
ctx=<optimized out>, perm=<optimized out>, shared_perm=21,
opaque=0x55c9d3c5a3d0, errp=0x7ffd117108e0) at block.c:2424
#7 bdrv_attach_child
(parent_bs=parent_bs@entry=0x55c9d3c5a3d0,
child_bs=child_bs@entry=0x55c9d3cc2060,
child_name=child_name@entry=0x55c9d241d478 "backing",
child_role=child_role@entry=0x55c9d26ecee0 <child_backing>,
errp=errp@entry=0x7ffd117108e0) at block.c:5876
#8 in bdrv_set_backing_hd
(bs=bs@entry=0x55c9d3c5a3d0,
backing_hd=backing_hd@entry=0x55c9d3cc2060,
errp=errp@entry=0x7ffd117108e0)
at block.c:2576
#9 stream_prepare (job=0x55c9d49d84a0) at block/stream.c:150
#10 job_prepare (job=0x55c9d49d84a0) at job.c:761
#11 job_txn_apply (txn=<optimized out>, fn=<optimized out>) at
job.c:145
#12 job_do_finalize (job=0x55c9d49d84a0) at job.c:778
#13 job_completed_txn_success (job=0x55c9d49d84a0) at job.c:832
#14 job_completed (job=0x55c9d49d84a0) at job.c:845
#15 job_completed (job=0x55c9d49d84a0) at job.c:836
#16 job_exit (opaque=0x55c9d49d84a0) at job.c:864
#17 aio_bh_call (bh=0x55c9d471a160) at util/async.c:117
#18 aio_bh_poll (ctx=ctx@entry=0x55c9d3c46720) at util/async.c:117
#19 aio_poll (ctx=ctx@entry=0x55c9d3c46720,
blocking=blocking@entry=true)
at util/aio-posix.c:728
#20 bdrv_parent_drained_begin_single (poll=true, c=0x55c9d3d558f0)
at block/io.c:121
#21 bdrv_parent_drained_begin_single (c=c@entry=0x55c9d3d558f0,
poll=poll@entry=true)
at block/io.c:114
#22 bdrv_replace_child_noperm
(child=child@entry=0x55c9d3d558f0,
new_bs=new_bs@entry=0x55c9d3d27300) at block.c:2258
#23 bdrv_replace_child
(child=child@entry=0x55c9d3d558f0,
new_bs=new_bs@entry=0x55c9d3d27300) at block.c:2320
#24 bdrv_root_attach_child
(child_bs=child_bs@entry=0x55c9d3d27300,
child_name=child_name@entry=0x55c9d241d478 "backing",
child_role=child_role@entry=0x55c9d26ecee0 <child_backing>,
ctx=<optimized out>, perm=<optimized out>, shared_perm=21,
opaque=0x55c9d3cc2060, errp=0x7ffd11710c60) at block.c:2424
#25 bdrv_attach_child
(parent_bs=parent_bs@entry=0x55c9d3cc2060,
child_bs=child_bs@entry=0x55c9d3d27300,
child_name=child_name@entry=0x55c9d241d478 "backing",
child_role=child_role@entry=0x55c9d26ecee0 <child_backing>,
errp=errp@entry=0x7ffd11710c60) at block.c:5876
#26 bdrv_set_backing_hd
(bs=bs@entry=0x55c9d3cc2060,
backing_hd=backing_hd@entry=0x55c9d3d27300,
errp=errp@entry=0x7ffd11710c60)
at block.c:2576
#27 stream_prepare (job=0x55c9d495ead0) at block/stream.c:150
...
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20200316060631.30052-2-vsementsov@virtuozzo.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
bdrv_do_find_format() calls strcmp() using BlockDriver::format_name
as argument, which must not be NULL. Assert this field is not null
when we register a block driver in bdrv_register().
Reported-by: Mansour Ahmadi <ManSoSec@gmail.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20200318222235.23856-1-philmd@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Using the new 'bdrv_co_delete_file' interface, a pure co_routine function
'bdrv_co_delete_file' inside block.c can can be used in a way similar of
the existing bdrv_create_file to to clean up a created file.
We're creating a pure co_routine because the only caller of
'bdrv_co_delete_file' will be already in co_routine context, thus there
is no need to add all the machinery to check for qemu_in_coroutine() and
create a separated co_routine to do the job.
Suggested-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20200130213907.2830642-3-danielhb413@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
external_snapshot_prepare() tries to move the overlay to the AioContext
of the backing file (the snapshotted node). However, it's possible that
this doesn't work, but the backing file can instead be moved to the
overlay's AioContext (e.g. opening the backing chain for a mirror
target).
bdrv_append() already indirectly uses bdrv_attach_node(), which takes
care to move nodes to make sure they use the same AioContext and which
tries both directions.
So the problem has a simple fix: Just delete the unnecessary extra
bdrv_try_set_aio_context() call in external_snapshot_prepare() and
instead assert in bdrv_append() that both nodes were indeed moved to the
same AioContext.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20200310113831.27293-6-kwolf@redhat.com>
Tested-by: Peter Krempa <pkrempa@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20200310113831.27293-2-kwolf@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This patch allows bdrv_reopen() (and therefore the x-blockdev-reopen QMP
command) to attach a node as the new backing file even if the node is in
a different AioContext than the parent if one of both nodes can be moved
to the AioContext of the other node.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Tested-by: Peter Krempa <pkrempa@redhat.com>
Message-Id: <20200306141413.30705-3-kwolf@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Add another step in the reopen process where driver can execute code
after permission changes are comitted.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Message-Id: <adc02cf591c3cb34e98e33518eb1c540a0f27db1.1582893284.git.pkrempa@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
@options is leaked by the first two return statements in this function.
Note that blk_new_open() takes the reference to @options even on
failure, so all we need to do to fix the leak is to move the QDict
allocation down to where we actually need it.
Reported-by: Coverity (CID 1419884)
Fixes: fd17146cd9
("block: Generic file creation fallback")
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200225155618.133412-1-mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
QLIST_REMOVE() assumes the element is in a list. It also leaves the
element's linked list pointers dangling.
Introduce a safe version of QLIST_REMOVE() and convert open-coded
instances of this pattern.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Message-id: 20200214171712.541358-4-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
If a protocol driver does not support image creation, we can see whether
maybe the file exists already. If so, just truncating it will be
sufficient.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200122164532.178040-3-mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
When a management application manages node names there's no reason to
recurse into backing images in the output of query-named-block-nodes.
Add a parameter to the command which will return just the top level
structs.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Message-Id: <4470f8c779abc404dcf65e375db195cd91a80651.1579509782.git.pkrempa@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
[mreitz: Fixed coding style]
Signed-off-by: Max Reitz <mreitz@redhat.com>
It no longer has any users.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20200218103454.296704-11-mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Let check_to_replace_node() use the more specialized
bdrv_recurse_can_replace() instead of
bdrv_recurse_is_first_non_filter(), which is too restrictive (or, in the
case of quorum, sometimes not restrictive enough).
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20200218103454.296704-10-mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
After a couple of follow-up patches, this function will replace
bdrv_recurse_is_first_non_filter() in check_to_replace_node().
bdrv_recurse_is_first_non_filter() is both not sufficiently specific for
check_to_replace_node() (it allows cases that should not be allowed,
like replacing child nodes of quorum with dissenting data that have more
parents than just quorum), and it is too restrictive (it is perfectly
fine to replace filters).
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20200218103454.296704-7-mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
It is unused now. (And it was ugly because it needed to explore all BDS
chains from the top.)
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20200218103454.296704-4-mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Fixes: 132ada80c4
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20200218094402.26625-4-philmd@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
If we call the qmp 'query-block' while qemu is working on
'block-commit', it will cause memleaks, the memory leak stack is as
follow:
Indirect leak of 12360 byte(s) in 3 object(s) allocated from:
#0 0x7f80f0b6d970 in __interceptor_calloc (/lib64/libasan.so.5+0xef970)
#1 0x7f80ee86049d in g_malloc0 (/lib64/libglib-2.0.so.0+0x5249d)
#2 0x55ea95b5bb67 in qdict_new /mnt/sdb/qemu-4.2.0-rc0/qobject/qdict.c:29
#3 0x55ea956cd043 in bdrv_refresh_filename /mnt/sdb/qemu-4.2.0-rc0/block.c:6427
#4 0x55ea956cc950 in bdrv_refresh_filename /mnt/sdb/qemu-4.2.0-rc0/block.c:6399
#5 0x55ea956cc950 in bdrv_refresh_filename /mnt/sdb/qemu-4.2.0-rc0/block.c:6399
#6 0x55ea956cc950 in bdrv_refresh_filename /mnt/sdb/qemu-4.2.0-rc0/block.c:6399
#7 0x55ea958818ea in bdrv_block_device_info /mnt/sdb/qemu-4.2.0-rc0/block/qapi.c:56
#8 0x55ea958879de in bdrv_query_info /mnt/sdb/qemu-4.2.0-rc0/block/qapi.c:392
#9 0x55ea9588b58f in qmp_query_block /mnt/sdb/qemu-4.2.0-rc0/block/qapi.c:578
#10 0x55ea95567392 in qmp_marshal_query_block qapi/qapi-commands-block-core.c:95
Indirect leak of 4120 byte(s) in 1 object(s) allocated from:
#0 0x7f80f0b6d970 in __interceptor_calloc (/lib64/libasan.so.5+0xef970)
#1 0x7f80ee86049d in g_malloc0 (/lib64/libglib-2.0.so.0+0x5249d)
#2 0x55ea95b5bb67 in qdict_new /mnt/sdb/qemu-4.2.0-rc0/qobject/qdict.c:29
#3 0x55ea956cd043 in bdrv_refresh_filename /mnt/sdb/qemu-4.2.0-rc0/block.c:6427
#4 0x55ea956cc950 in bdrv_refresh_filename /mnt/sdb/qemu-4.2.0-rc0/block.c:6399
#5 0x55ea956cc950 in bdrv_refresh_filename /mnt/sdb/qemu-4.2.0-rc0/block.c:6399
#6 0x55ea9569f301 in bdrv_backing_attach /mnt/sdb/qemu-4.2.0-rc0/block.c:1064
#7 0x55ea956a99dd in bdrv_replace_child_noperm /mnt/sdb/qemu-4.2.0-rc0/block.c:2283
#8 0x55ea956b9b53 in bdrv_replace_node /mnt/sdb/qemu-4.2.0-rc0/block.c:4196
#9 0x55ea956b9e49 in bdrv_append /mnt/sdb/qemu-4.2.0-rc0/block.c:4236
#10 0x55ea958c3472 in commit_start /mnt/sdb/qemu-4.2.0-rc0/block/commit.c:306
#11 0x55ea94b68ab0 in qmp_block_commit /mnt/sdb/qemu-4.2.0-rc0/blockdev.c:3459
#12 0x55ea9556a7a7 in qmp_marshal_block_commit qapi/qapi-commands-block-core.c:407
Fixes: bb808d5f5c
Reported-by: Euler Robot <euler.robot@huawei.com>
Signed-off-by: Pan Nengyuan <pannengyuan@huawei.com>
Message-id: 20200116085600.24056-1-pannengyuan@huawei.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
This is a bit more efficient than having to allocate and free memory
for each new permission.
The default size (30) is enough for "consistent read, write, resize".
Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-id: 20200110171518.22168-1-berto@igalia.com
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
We can save some LoC in xdbg_graph_add_edge() by using
bdrv_qapi_perm_to_blk_perm().
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20191108123455.39445-3-mreitz@redhat.com
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
We need some way to correlate QAPI BlockPermission values with
BLK_PERM_* flags. We could:
(1) have the same order in the QAPI definition as the the BLK_PERM_*
flags are in LSb-first order. However, then there is no guarantee
that they actually match (e.g. when someone modifies the QAPI schema
without thinking of the BLK_PERM_* definitions).
We could add static assertions, but these would break what’s good
about this solution, namely its simplicity.
(2) define the BLK_PERM_* flags based on the BlockPermission values.
But this way whenever someone were to modify the QAPI order
(perfectly sensible in theory), the BLK_PERM_* values would change.
Because these values are used for file locking, this might break
file locking between different qemu versions.
Therefore, go the slightly more cumbersome way: Add a function to
translate from the QAPI constants to the BLK_PERM_* flags.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20191108123455.39445-2-mreitz@redhat.com
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
bdrv_invalidate_cache_all() assumes that all nodes in a given subtree
are either active or inactive when it starts. Therefore, as soon as it
arrives at an already active node, it stops.
However, this assumption is wrong. For example, it's possible to take a
snapshot of an inactive node, which results in an active overlay over an
inactive backing file. The active overlay is probably also the root node
of an inactive BlockBackend (blk->disable_perm == true).
In this case, bdrv_invalidate_cache_all() does not need to do anything
to activate the overlay node, but it still needs to recurse into the
children and the parents to make sure that after returning success,
really everything is activated.
Cc: qemu-stable@nongnu.org
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
If both the create options (qemu-img create -o ...) and the size
parameter were given, the size parameter was silently ignored. Instead,
make specifying two sizes an error.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
bs->options and bs->explicit_options shouldn't contain any options for
child nodes. bdrv_open_inherited() takes care to remove any options that
match a child name after opening the image and the same is done when
reopening.
However, we miss the case of 'backing': null, which is a child option,
but results in no child being created. This means that a 'backing': null
remains in bs->options and bs->explicit_options.
A typical use for 'backing': null is in live snapshots: blockdev-add for
the qcow2 overlay makes sure not to open the backing file (because it is
already opened and blockdev-snapshot will attach it). After doing a
blockdev-snapshot, bs->options and bs->explicit_options become
inconsistent with the actual state (bs has a backing file now, but the
options still say null). On the next occasion that the image is
reopened, e.g. switching it from read-write to read-only when another
snapshot is taken, the option will take effect again and the node
incorrectly loses its backing file.
Fix bdrv_open_inherited() to remove the 'backing' option from
bs->options and bs->explicit_options even for the case where it
specifies that no backing file is wanted.
Reported-by: Peter Krempa <pkrempa@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Tested-by: Peter Krempa <pkrempa@redhat.com>
There are three page size in qemu:
real host page size
host page size
target page size
All of them have dedicate variable to represent. For the last two, we
use the same form in the whole qemu project, while for the first one we
use two forms: qemu_real_host_page_size and getpagesize().
qemu_real_host_page_size is defined to be a replacement of
getpagesize(), so let it serve the role.
[Note] Not fully tested for some arch or device.
Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
Message-Id: <20191013021145.16011-3-richardw.yang@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The only reason I can imagine for this strange code at the very-end of
bdrv_reopen_commit is the fact that bs->read_only updated after
calling drv->bdrv_reopen_commit in bdrv_reopen_commit. And in the same
time, prior to previous commit, qcow2_reopen_bitmaps_rw did a wrong
check for being writable, when actually it only need writable file
child not self.
So, as it's fixed, let's move things to correct place.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Acked-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190927122355.7344-10-vsementsov@virtuozzo.com
Signed-off-by: John Snow <jsnow@redhat.com>
It's needed to fix reopening qcow2 with bitmaps to RW. Currently it
can't work, as qcow2 needs write access to file child, to mark bitmaps
in-image with IN_USE flag. But usually children goes after parents in
reopen queue and file child is still RO on qcow2 reopen commit. Reverse
reopen order to fix it.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Acked-by: Max Reitz <mreitz@redhat.com>
Acked-by: John Snow <jsnow@redhat.com>
Message-id: 20190927122355.7344-3-vsementsov@virtuozzo.com
Signed-off-by: John Snow <jsnow@redhat.com>
We'll need reverse-foreach in the following commit, QTAILQ support it,
so move to QTAILQ.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190927122355.7344-2-vsementsov@virtuozzo.com
Signed-off-by: John Snow <jsnow@redhat.com>
bdrv_dirty_bitmap_next is always used in same pattern. So, split it
into _next and _first, instead of combining two functions into one and
add FOR_EACH_DIRTY_BITMAP macro.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 20190916141911.5255-5-vsementsov@virtuozzo.com
Signed-off-by: John Snow <jsnow@redhat.com>
block/dirty-bitmap.c seems to be more appropriate for it and
bdrv_remove_persistent_dirty_bitmap already in it.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 20190920082543.23444-2-vsementsov@virtuozzo.com
Signed-off-by: John Snow <jsnow@redhat.com>
A block driver can provide a callback to report driver-specific
statistics.
file-posix driver now reports discard statistics
Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
Message-id: 20190923121737.83281-10-anton.nefedov@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
Teach bdrv_debug_breakpoint and bdrv_debug_remove_breakpoint skip
filters with backing. This is needed to implement and use in backup job
it's own backup_top filter driver (like mirror already has one), and
without this improvement, breakpoint removal will fail at least in 55
iotest.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190920142056.12778-9-vsementsov@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
It's wrong to OR shared permissions. It may lead to crash on further
permission updates.
Also, no needs to consider previously calculated permissions, as at
this point we already bind all new parents and bdrv_get_cumulative_perm
result is enough. So fix the bug by just set permissions by
bdrv_get_cumulative_perm result.
Bug was introduced in long ago 234ac1a902, in 2.9.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20190824100740.61635-1-vsementsov@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
No .bdrv_has_zero_init() implementation returns 1 if growing the file
would add non-zero areas (at least with PREALLOC_MODE_OFF), so using it
in lieu of this new function was always safe.
But on the other hand, it is possible that growing an image that is not
zero-initialized would still add a zero-initialized area, like when
using nonpreallocating truncation on a preallocated image. For callers
that care only about truncation, not about creation with potential
preallocation, this new function is useful.
Alternatively, we could have added a PreallocMode parameter to
bdrv_has_zero_init(). But the only user would have been qemu-img
convert, which does not have a plain PreallocMode value right now -- it
would have to parse the creation option to obtain it. Therefore, the
simpler solution is to let bdrv_has_zero_init() inquire the
preallocation status and add the new bdrv_has_zero_init_truncate() that
presupposes PREALLOC_MODE_OFF.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190724171239.8764-4-mreitz@redhat.com
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
It is used to do transactional movement of the bitmap (which is
possible in conjunction with merge command). Transactional bitmap
movement is needed in scenarios with external snapshot, when we don't
want to leave copy of the bitmap in the base image.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190708220502.12977-3-jsnow@redhat.com
[Edited "since" version to 4.2 --js]
Signed-off-by: John Snow <jsnow@redhat.com>
- file-posix: Fix O_DIRECT alignment detection
- Fixes for concurrent block jobs
- block-backend: Queue requests while drained (fix IDE vs. job crashes)
- qemu-img convert: Deprecate using -n and -o together
- iotests: Migration tests with filter nodes
- iotests: More media change tests
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJdVnduAAoJEH8JsnLIjy/W0IgQAKft/M3aDgt0sbTzQh8vdy6A
yAfTnnSL4Z56+8qAsqhEnplC3rZxvTkg9AGOoNYHOZKl3FgRH9r8g9/Enemh4fWu
MH52hiRf2ytlFVurIQal3aj9O+i0YTnzuvYbysvkH4ID5zbv2QnwdagtEcBxbbYL
NZTMZBynDzp4rKIZ7p6T/kkaklLHh4vZrjW+Mzm3LQx9JJr8TwVNqqetSfc4VKIJ
ByaNbbihDUVjQyIaJ24DXXJdzonGrrtSbSZycturc5FzXymzSRgrXZCeSKCs8X+i
fjwMXH5v4/UfK511ILsXiumeuxBfD2Ck4sAblFxVo06oMPRNmsAKdRLeDByE7IC1
lWep/pB3y/au9CW2/pkWJOiaz5s5iuv2fFYidKUJ0KQ1dD7G8M9rzkQlV3FUmTZO
jBKSxHEffXsYl0ojn0vGmZEd7FAPi3fsZibGGws1dVgxlWI93aUJsjCq0E+lHIRD
hEmQcjqZZa4taKpj0Y3Me05GkL7tH6RYA153jDNb8rPdzriGRCLZSObEISrOJf8H
Mh0gTLi8KJNh6bULd12Ake1tKn7ZeTXpHH+gadz9OU7eIModh1qYTSHPlhy5oAv0
Hm9BikNlS1Hzw+a+EbLcOW7TrsteNeGr7r8T6QKPMq1sfsYcp3svbC2c+zVlQ6Ll
mLoTssksXOkgBevVqSiS
=T7L5
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging
Block layer patches:
- file-posix: Fix O_DIRECT alignment detection
- Fixes for concurrent block jobs
- block-backend: Queue requests while drained (fix IDE vs. job crashes)
- qemu-img convert: Deprecate using -n and -o together
- iotests: Migration tests with filter nodes
- iotests: More media change tests
# gpg: Signature made Fri 16 Aug 2019 10:29:18 BST
# gpg: using RSA key 7F09B272C88F2FD6
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" [full]
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6
* remotes/kevin/tags/for-upstream:
file-posix: Handle undetectable alignment
qemu-img convert: Deprecate using -n and -o together
block-backend: Queue requests while drained
mirror: Keep mirror_top_bs drained after dropping permissions
block: Remove blk_pread_unthrottled()
iotests: Add test for concurrent stream/commit
tests: Test mid-drain bdrv_replace_child_noperm()
tests: Test polling in bdrv_drop_intermediate()
block: Reduce (un)drains when replacing a child
block: Keep subtree drained in drop_intermediate
block: Simplify bdrv_filter_default_perms()
iotests: Test migration with all kinds of filter nodes
iotests: Move migration helpers to iotests.py
iotests/118: Add -blockdev based tests
iotests/118: Create test classes dynamically
iotests/118: Test media change for scsi-cd
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
In my "build everything" tree, changing qemu/main-loop.h triggers a
recompile of some 5600 out of 6600 objects (not counting tests and
objects that don't depend on qemu/osdep.h). It includes block/aio.h,
which in turn includes qemu/event_notifier.h, qemu/notify.h,
qemu/processor.h, qemu/qsp.h, qemu/queue.h, qemu/thread-posix.h,
qemu/thread.h, qemu/timer.h, and a few more.
Include qemu/main-loop.h only where it's needed. Touching it now
recompiles only some 1700 objects. For block/aio.h and
qemu/event_notifier.h, these numbers drop from 5600 to 2800. For the
others, they shrink only slightly.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20190812052359.30071-21-armbru@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Currently, bdrv_replace_child_noperm() undrains the parent until it is
completely undrained, then re-drains it after attaching the new child
node.
This is a problem with bdrv_drop_intermediate(): We want to keep the
whole subtree drained, including parents, while the operation is
under way. bdrv_replace_child_noperm() breaks this by allowing every
parent to become unquiesced briefly, and then redraining it.
In fact, there is no reason why the parent should become unquiesced and
be allowed to submit requests to the new child node if that new node is
supposed to be kept drained. So if anything, we have to drain the
parent before detaching the old child node. Conversely, we have to
undrain it only after attaching the new child node.
Thus, change the whole drain algorithm here: Calculate the number of
times we have to drain/undrain the parent before replacing the child
node then drain it (if necessary), replace the child node, and then
undrain it.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
bdrv_drop_intermediate() calls BdrvChildRole.update_filename(). That
may poll, thus changing the graph, which potentially breaks the
QLIST_FOREACH_SAFE() loop.
Just keep the whole subtree drained. This is probably the right thing
to do anyway (dropping nodes while the subtree is not drained seems
wrong).
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The same change as commit 2b23f28639 ('block/copy-on-read: Fix
permissions for inactive node') made for the copy-on-read driver can be
made for bdrv_filter_default_perms(): Retaining the old permissions from
the BdrvChild if it is given complicates things unnecessarily when in
the end this only means that the options set in the c == NULL case (i.e.
during child creation) are retained.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
bdrv_set_aio_context_ignore() can only work in the main loop:
bdrv_drained_begin() only works in the main loop and the node's (old)
AioContext; and bdrv_drained_end() really only works in the main loop
and the node's (new) AioContext (contrary to its current comment, which
is just wrong).
Consequentially, bdrv_set_aio_context_ignore() must be called from the
main loop. Luckily, assuming that we can make block graph changes only
from the main loop as well, all its callers do that already.
Note that changing a node's context in a sense is an operation that
changes the block graph, so it actually makes sense to require this
function to be called from the main loop.
Also, fix bdrv_drained_end()'s description. You can only use it from
the main loop or the node's AioContext, and in the latter case, the
whole subtree must be in the same context.
Fixes: e037c09c78
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190722133054.21781-3-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
We should never poll anywhere in bdrv_do_drained_end() (including its
recursive callees like bdrv_drain_invoke()), because it does not cope
well with graph changes. In fact, it has been written based on the
postulation that no graph changes will happen in it.
Instead, the callers that want to poll must poll, i.e. all currently
globally available wrappers: bdrv_drained_end(),
bdrv_subtree_drained_end(), bdrv_unapply_subtree_drain(), and
bdrv_drain_all_end(). Graph changes there do not matter.
They can poll simply by passing a pointer to a drained_end_counter and
wait until it reaches 0.
This patch also adds a non-polling global wrapper for
bdrv_do_drained_end() that takes a drained_end_counter pointer. We need
such a variant because now no function called anywhere from
bdrv_do_drained_end() must poll. This includes
BdrvChildRole.drained_end(), which already must not poll according to
its interface documentation, but bdrv_child_cb_drained_end() just
violates that by invoking bdrv_drained_end() (which does poll).
Therefore, BdrvChildRole.drained_end() must take a *drained_end_counter
parameter, which bdrv_child_cb_drained_end() can pass on to the new
bdrv_drained_end_no_poll() function.
Note that we now have a pattern of all drained_end-related functions
either polling or receiving a *drained_end_counter to let the caller
poll based on that.
A problem with a single poll loop is that when the drained section in
bdrv_set_aio_context_ignore() ends, some nodes in the subgraph may be in
the old contexts, while others are in the new context already. To let
the collective poll in bdrv_drained_end() work correctly, we must not
hold a lock to the old context, so that the old context can make
progress in case it is different from the current context.
(In the process, remove the comment saying that the current context is
always the old context, because it is wrong.)
In all other places, all nodes in a subtree must be in the same context,
so we can just poll that. The exception of course is
bdrv_drain_all_end(), but that always runs in the main context, so we
can just poll NULL (like bdrv_drain_all_begin() does).
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Commit 5cb2737e92 laid out why
bdrv_do_drained_end() must decrement the quiesce_counter after
bdrv_drain_invoke(). It did not give a very good reason why it has to
happen after bdrv_parent_drained_end(), instead only claiming symmetry
to bdrv_do_drained_begin().
It turns out that delaying it for so long is wrong.
Situation: We have an active commit job (i.e. a mirror job) from top to
base for the following graph:
filter
|
[file]
|
v
top --[backing]--> base
Now the VM is closed, which results in the job being cancelled and a
bdrv_drain_all() happening pretty much simultaneously.
Beginning the drain means the job is paused once whenever one of its
nodes is quiesced. This is reversed when the drain ends.
With how the code currently is, after base's drain ends (which means
that it will have unpaused the job once), its quiesce_counter remains at
1 while it goes to undrain its parents (bdrv_parent_drained_end()). For
some reason or another, undraining filter causes the job to be kicked
and enter mirror_exit_common(), where it proceeds to invoke
block_job_remove_all_bdrv().
Now base will be detached from the job. Because its quiesce_counter is
still 1, it will unpause the job once more. So in total, undraining
base will unpause the job twice. Eventually, this will lead to the
job's pause_count going negative -- well, it would, were there not an
assertion against this, which crashes qemu.
The general problem is that if in bdrv_parent_drained_end() we undrain
parent A, and then undrain parent B, which then leads to A detaching the
child, bdrv_replace_child_noperm() will undrain A as if we had not done
so yet; that is, one time too many.
It follows that we cannot decrement the quiesce_counter after invoking
bdrv_parent_drained_end().
Unfortunately, decrementing it before bdrv_parent_drained_end() would be
wrong, too. Imagine the above situation in reverse: Undraining A leads
to B detaching the child. If we had already decremented the
quiesce_counter by that point, bdrv_replace_child_noperm() would undrain
B one time too little; because it expects bdrv_parent_drained_end() to
issue this undrain. But bdrv_parent_drained_end() won't do that,
because B is no longer a parent.
Therefore, we have to do something else. This patch opts for
introducing a second quiesce_counter that counts how many times a
child's parent has been quiesced (though c->role->drained_*). With
that, bdrv_replace_child_noperm() just has to undrain the parent exactly
that many times when removing a child, and it will always be right.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
BDS.inherits_from does not always point to an immediate parent node.
When launching a block job with a filter node, for example, the node
directly below the filter will not point to the filter, but keep its old
pointee (above the filter).
If that pointee goes away while the job is still running, the node's
inherits_from will not be updated and thus point to garbage. To fix
this, bdrv_unref_child() has to check not only the parent node's
immediate children for nodes whose inherits_from needs to be cleared,
but its whole subtree.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190703172813.6868-7-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
The commit and the mirror block job must be able to drop their filter
node at any point. However, this will not be possible if any of the
BdrvChild links to them is frozen. Therefore, we need to prevent them
from ever becoming frozen.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-id: 20190703172813.6868-2-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
We generally assume that loosening permission restrictions can never
fail. We have seen in the past that this assumption is wrong. This has
led to crashes because we generally pass &error_abort when loosening
permissions.
However, a failure in such a case should actually be handled in quite
the opposite way: It is very much not fatal, so qemu may report it, but
still consider the operation successful. The only realistic problem is
that qemu may then retain permissions and thus locks on images it
actually does not require. But again, that is not fatal.
To implement this behavior, we make all functions that change
permissions and that pass &error_abort to the initiating function
(bdrv_check_perm() or bdrv_child_check_perm()) evaluate the
@loosen_restrictions value introduced in the previous patch. If it is
true and an error did occur, we abort the permission update, discard the
error, and instead report success to the caller.
bdrv_child_try_set_perm() itself does not pass &error_abort, but it is
the only public function to change permissions. As such, callers may
pass &error_abort to it, expecting dropping permission restrictions to
never fail.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This patch makes three functions report whether the necessary permission
change tightens restrictions or not. These functions are:
- bdrv_check_perm()
- bdrv_check_update_perm()
- bdrv_child_check_perm()
Callers can use this result to decide whether a failure is fatal or not
(see the next patch).
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
We have to start by applying the permission restrictions to new_bs
before we can loosen them on old_bs. See the comment for the
explanation.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
If a block node uses bdrv_child_try_set_perm() to change the permission
it takes on its child, the result may be very short-lived. If anything
makes the block layer recalculate the permissions internally, it will
invoke the node driver's .bdrv_child_perm() implementation. The
permission/shared permissions masks that returns will then override the
values previously passed to bdrv_child_try_set_perm().
If drivers want a child edge to have specific values for the
permissions/shared permissions mask, it must return them in
.bdrv_child_perm(). Consequentially, there is no need for them to pass
the same values to bdrv_child_try_set_perm() then: It is better to have
a function that invokes .bdrv_child_perm() and calls
bdrv_child_try_set_perm() with the result. This patch adds such a
function under the name of bdrv_child_refresh_perms().
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Drop remaining users of bs->job:
1. assertions actually duplicated by assert(!bs->refcnt)
2. trace-point seems not enough reason to change stream_start to return
BlockJob pointer
3. Restricting creation of two jobs based on same bs is bad idea, as
3.1 Some jobs creates filters to be their main node, so, this check
don't actually prevent creating second job on same real node (which
will create another filter node) (but I hope it is restricted by
other mechanisms)
3.2 Even without bs->job we have two systems of permissions:
op-blockers and BLK_PERM
3.3 We may want to run several jobs on one node one day
And finally, drop bs->job pointer itself. Hurrah!
Suggested-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
All callers of bdrv_set_aio_context() are eliminated now, they have
moved to bdrv_try_set_aio_context() and related safe functions. Remove
bdrv_set_aio_context().
With this, we can now know that the .set_aio_ctx callback must be
present in bdrv_set_aio_context_ignore() because
bdrv_can_set_aio_context() would have returned false previously, so
instead of checking the condition, we can assert it.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The mirror and commit block jobs use bdrv_set_aio_context() to move
their filter node into the right AioContext before hooking it up in the
graph. Similarly, bdrv_open_backing_file() explicitly moves the backing
file node into the right AioContext first.
This isn't necessary any more, they get automatically moved into the
right context now when attaching them.
However, in the case of bdrv_open_backing_file() with a node reference,
it's actually not only unnecessary, but even wrong: The unchecked
bdrv_set_aio_context() changes the AioContext of the child node even if
other parents require it to retain the old context. So this is not only
a simplification, but a bug fix, too.
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1684342
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
A node should only be in a non-default AioContext if a user is attached
to it that requires this. When the last parent of a node is gone, it can
move back to the main AioContext.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
So far, we only made sure that updating the AioContext of a node
affected the whole subtree. However, if a node is newly attached to a
new parent, we also need to make sure that both the subtree of the node
and the parent are in the same AioContext. This tries to move the new
child node to the parent AioContext and returns an error if this isn't
possible.
BlockBackends now actually apply their AioContext to their root node.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This adds a new parameter to blk_new() which requires its callers to
declare from which AioContext this BlockBackend is going to be used (or
the locks of which AioContext need to be taken anyway).
The given context is only stored and kept up to date when changing
AioContexts. Actually applying the stored AioContext to the root node
is saved for another commit.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Instead of just asserting that no requests are in flight in
bdrv_replace_node(), which is a requirement that most callers ignore, we
can just drain the source node right there. This fixes at least starting
a commit job while I/O is active on the backing chain, but probably
other callers, too.
Having requests in flight on the target node isn't a problem because the
target just gets new parents, but the call path of running requests
isn't modified. So we can just drop this assertion without a replacement.
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1711643
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
A consequence of the previous patch is that bdrv_attach_child()
transfers the reference to child_bs from the caller to parent_bs,
which will drop it on bdrv_close() or when someone calls
bdrv_unref_child().
But this only happens when bdrv_attach_child() succeeds. If it fails
then the caller is responsible for dropping the reference to child_bs.
This patch makes bdrv_attach_child() take the reference also when
there is an error, freeing the caller for having to do it.
A similar situation happens with bdrv_root_attach_child(), so the
changes on this patch affect both functions.
Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-id: 20dfb3d9ccec559cdd1a9690146abad5d204a186.1557754872.git.berto@igalia.com
[mreitz: Removed now superfluous BdrvChild * variable in
bdrv_open_child()]
Signed-off-by: Max Reitz <mreitz@redhat.com>
bdrv_unref_child() does the following things:
- Updates the child->bs->inherits_from pointer.
- Calls bdrv_detach_child() to remove the BdrvChild from bs->children.
- Calls bdrv_unref() to unref the child BlockDriverState.
When bdrv_unref_child() was introduced in commit 33a604075c it was not
used in bdrv_close() because the drivers that had additional children
(like quorum or blkverify) had already called bdrv_unref() on their
children during their own close functions.
This was changed later (in 0bd6e91a7e for quorum, in 3e586be0b2 for
blkverify) so there's no reason not to use bdrv_unref_child() in
bdrv_close() anymore.
After this there's also no need to remove bs->backing and bs->file
separately from the rest of the children, so bdrv_close() can be
simplified.
Now bdrv_close() unrefs all children (before this patch it was only
bs->file and bs->backing). As a result, none of the callers of
brvd_attach_child() should remove their reference to child_bs (because
this function effectively steals that reference). This patch updates a
couple of tests that were doing their own bdrv_unref().
Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-id: 6d1d5feaa53aa1ab127adb73d605dc4503e3abd5.1557754872.git.berto@igalia.com
[mreitz: s/where/were/]
Signed-off-by: Max Reitz <mreitz@redhat.com>
This message does not make any sense when it appears as the response to
making an R/W node read-only. We should detect that case and emit a
different message, then.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>