Commit Graph

1298 Commits

Author SHA1 Message Date
Max Reitz cdb1cec89e block: Use bdrv_qapi_perm_to_blk_perm()
We can save some LoC in xdbg_graph_add_edge() by using
bdrv_qapi_perm_to_blk_perm().

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20191108123455.39445-3-mreitz@redhat.com
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-01-06 13:43:06 +01:00
Max Reitz 7b1d9c4df0 block: Add bdrv_qapi_perm_to_blk_perm()
We need some way to correlate QAPI BlockPermission values with
BLK_PERM_* flags.  We could:

(1) have the same order in the QAPI definition as the the BLK_PERM_*
    flags are in LSb-first order.  However, then there is no guarantee
    that they actually match (e.g. when someone modifies the QAPI schema
    without thinking of the BLK_PERM_* definitions).
    We could add static assertions, but these would break what’s good
    about this solution, namely its simplicity.

(2) define the BLK_PERM_* flags based on the BlockPermission values.
    But this way whenever someone were to modify the QAPI order
    (perfectly sensible in theory), the BLK_PERM_* values would change.
    Because these values are used for file locking, this might break
    file locking between different qemu versions.

Therefore, go the slightly more cumbersome way: Add a function to
translate from the QAPI constants to the BLK_PERM_* flags.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20191108123455.39445-2-mreitz@redhat.com
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-01-06 13:43:06 +01:00
Kevin Wolf 7bb4941ace block: Activate recursively even for already active nodes
bdrv_invalidate_cache_all() assumes that all nodes in a given subtree
are either active or inactive when it starts. Therefore, as soon as it
arrives at an already active node, it stops.

However, this assumption is wrong. For example, it's possible to take a
snapshot of an inactive node, which results in an active overlay over an
inactive backing file. The active overlay is probably also the root node
of an inactive BlockBackend (blk->disable_perm == true).

In this case, bdrv_invalidate_cache_all() does not need to do anything
to activate the overlay node, but it still needs to recurse into the
children and the parents to make sure that after returning success,
really everything is activated.

Cc: qemu-stable@nongnu.org
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2019-12-19 18:04:25 +01:00
Kevin Wolf f6dc1c31d3 block: Error out on image creation with conflicting size options
If both the create options (qemu-img create -o ...) and the size
parameter were given, the size parameter was silently ignored. Instead,
make specifying two sizes an error.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2019-12-18 11:20:57 +01:00
Kevin Wolf ae0f57f0aa block: Remove 'backing': null from bs->{explicit_,}options
bs->options and bs->explicit_options shouldn't contain any options for
child nodes. bdrv_open_inherited() takes care to remove any options that
match a child name after opening the image and the same is done when
reopening.

However, we miss the case of 'backing': null, which is a child option,
but results in no child being created. This means that a 'backing': null
remains in bs->options and bs->explicit_options.

A typical use for 'backing': null is in live snapshots: blockdev-add for
the qcow2 overlay makes sure not to open the backing file (because it is
already opened and blockdev-snapshot will attach it). After doing a
blockdev-snapshot, bs->options and bs->explicit_options become
inconsistent with the actual state (bs has a backing file now, but the
options still say null). On the next occasion that the image is
reopened, e.g. switching it from read-write to read-only when another
snapshot is taken, the option will take effect again and the node
incorrectly loses its backing file.

Fix bdrv_open_inherited() to remove the 'backing' option from
bs->options and bs->explicit_options even for the case where it
specifies that no backing file is wanted.

Reported-by: Peter Krempa <pkrempa@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Tested-by: Peter Krempa <pkrempa@redhat.com>
2019-11-18 17:50:08 +01:00
Wei Yang 038adc2f58 core: replace getpagesize() with qemu_real_host_page_size
There are three page size in qemu:

  real host page size
  host page size
  target page size

All of them have dedicate variable to represent. For the last two, we
use the same form in the whole qemu project, while for the first one we
use two forms: qemu_real_host_page_size and getpagesize().

qemu_real_host_page_size is defined to be a replacement of
getpagesize(), so let it serve the role.

[Note] Not fully tested for some arch or device.

Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
Message-Id: <20191013021145.16011-3-richardw.yang@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-26 15:38:06 +02:00
Vladimir Sementsov-Ogievskiy 4dd09f6223 qcow2-bitmap: move bitmap reopen-rw code to qcow2_reopen_commit
The only reason I can imagine for this strange code at the very-end of
bdrv_reopen_commit is the fact that bs->read_only updated after
calling drv->bdrv_reopen_commit in bdrv_reopen_commit. And in the same
time, prior to previous commit, qcow2_reopen_bitmaps_rw did a wrong
check for being writable, when actually it only need writable file
child not self.

So, as it's fixed, let's move things to correct place.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Acked-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190927122355.7344-10-vsementsov@virtuozzo.com
Signed-off-by: John Snow <jsnow@redhat.com>
2019-10-17 17:53:28 -04:00
Vladimir Sementsov-Ogievskiy fcd6a4f42d block: reverse order for reopen commits
It's needed to fix reopening qcow2 with bitmaps to RW. Currently it
can't work, as qcow2 needs write access to file child, to mark bitmaps
in-image with IN_USE flag. But usually children goes after parents in
reopen queue and file child is still RO on qcow2 reopen commit. Reverse
reopen order to fix it.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Acked-by: Max Reitz <mreitz@redhat.com>
Acked-by: John Snow <jsnow@redhat.com>
Message-id: 20190927122355.7344-3-vsementsov@virtuozzo.com
Signed-off-by: John Snow <jsnow@redhat.com>
2019-10-17 17:02:32 -04:00
Vladimir Sementsov-Ogievskiy 859443b0fb block: switch reopen queue from QSIMPLEQ to QTAILQ
We'll need reverse-foreach in the following commit, QTAILQ support it,
so move to QTAILQ.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190927122355.7344-2-vsementsov@virtuozzo.com
Signed-off-by: John Snow <jsnow@redhat.com>
2019-10-17 17:02:32 -04:00
Vladimir Sementsov-Ogievskiy ef9041a7b8 block/dirty-bitmap: refactor bdrv_dirty_bitmap_next
bdrv_dirty_bitmap_next is always used in same pattern. So, split it
into _next and _first, instead of combining two functions into one and
add FOR_EACH_DIRTY_BITMAP macro.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 20190916141911.5255-5-vsementsov@virtuozzo.com
Signed-off-by: John Snow <jsnow@redhat.com>
2019-10-17 17:02:32 -04:00
Vladimir Sementsov-Ogievskiy 85cc8a4f6b block: move bdrv_can_store_new_dirty_bitmap to block/dirty-bitmap.c
block/dirty-bitmap.c seems to be more appropriate for it and
bdrv_remove_persistent_dirty_bitmap already in it.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 20190920082543.23444-2-vsementsov@virtuozzo.com
Signed-off-by: John Snow <jsnow@redhat.com>
2019-10-17 17:02:32 -04:00
Anton Nefedov d924559953 qapi: query-blockstat: add driver specific file-posix stats
A block driver can provide a callback to report driver-specific
statistics.

file-posix driver now reports discard statistics

Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
Message-id: 20190923121737.83281-10-anton.nefedov@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-10-10 10:56:18 +02:00
Vladimir Sementsov-Ogievskiy d10529a2b7 block: teach bdrv_debug_breakpoint skip filters with backing
Teach bdrv_debug_breakpoint and bdrv_debug_remove_breakpoint skip
filters with backing. This is needed to implement and use in backup job
it's own backup_top filter driver (like mirror already has one), and
without this improvement, breakpoint removal will fail at least in 55
iotest.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190920142056.12778-9-vsementsov@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-10-10 10:56:17 +02:00
Vladimir Sementsov-Ogievskiy b503de619e block: fix permission update in bdrv_replace_node
It's wrong to OR shared permissions. It may lead to crash on further
permission updates.
Also, no needs to consider previously calculated permissions, as at
this point we already bind all new parents and bdrv_get_cumulative_perm
result is enough. So fix the bug by just set permissions by
bdrv_get_cumulative_perm result.

Bug was introduced in long ago 234ac1a902, in 2.9.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20190824100740.61635-1-vsementsov@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-09-03 14:55:35 +02:00
Max Reitz ceaca56fee block: Add bdrv_has_zero_init_truncate()
No .bdrv_has_zero_init() implementation returns 1 if growing the file
would add non-zero areas (at least with PREALLOC_MODE_OFF), so using it
in lieu of this new function was always safe.

But on the other hand, it is possible that growing an image that is not
zero-initialized would still add a zero-initialized area, like when
using nonpreallocating truncation on a preallocated image.  For callers
that care only about truncation, not about creation with potential
preallocation, this new function is useful.

Alternatively, we could have added a PreallocMode parameter to
bdrv_has_zero_init().  But the only user would have been qemu-img
convert, which does not have a plain PreallocMode value right now -- it
would have to parse the creation option to obtain it.  Therefore, the
simpler solution is to let bdrv_has_zero_init() inquire the
preallocation status and add the new bdrv_has_zero_init_truncate() that
presupposes PREALLOC_MODE_OFF.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190724171239.8764-4-mreitz@redhat.com
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-08-19 17:13:26 +02:00
John Snow c4e4b0fa59 qapi: implement block-dirty-bitmap-remove transaction action
It is used to do transactional movement of the bitmap (which is
possible in conjunction with merge command). Transactional bitmap
movement is needed in scenarios with external snapshot, when we don't
want to leave copy of the bitmap in the base image.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190708220502.12977-3-jsnow@redhat.com
[Edited "since" version to 4.2 --js]
Signed-off-by: John Snow <jsnow@redhat.com>
2019-08-16 16:28:03 -04:00
Peter Maydell e018ccb3fb Block layer patches:
- file-posix: Fix O_DIRECT alignment detection
 - Fixes for concurrent block jobs
 - block-backend: Queue requests while drained (fix IDE vs. job crashes)
 - qemu-img convert: Deprecate using -n and -o together
 - iotests: Migration tests with filter nodes
 - iotests: More media change tests
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJdVnduAAoJEH8JsnLIjy/W0IgQAKft/M3aDgt0sbTzQh8vdy6A
 yAfTnnSL4Z56+8qAsqhEnplC3rZxvTkg9AGOoNYHOZKl3FgRH9r8g9/Enemh4fWu
 MH52hiRf2ytlFVurIQal3aj9O+i0YTnzuvYbysvkH4ID5zbv2QnwdagtEcBxbbYL
 NZTMZBynDzp4rKIZ7p6T/kkaklLHh4vZrjW+Mzm3LQx9JJr8TwVNqqetSfc4VKIJ
 ByaNbbihDUVjQyIaJ24DXXJdzonGrrtSbSZycturc5FzXymzSRgrXZCeSKCs8X+i
 fjwMXH5v4/UfK511ILsXiumeuxBfD2Ck4sAblFxVo06oMPRNmsAKdRLeDByE7IC1
 lWep/pB3y/au9CW2/pkWJOiaz5s5iuv2fFYidKUJ0KQ1dD7G8M9rzkQlV3FUmTZO
 jBKSxHEffXsYl0ojn0vGmZEd7FAPi3fsZibGGws1dVgxlWI93aUJsjCq0E+lHIRD
 hEmQcjqZZa4taKpj0Y3Me05GkL7tH6RYA153jDNb8rPdzriGRCLZSObEISrOJf8H
 Mh0gTLi8KJNh6bULd12Ake1tKn7ZeTXpHH+gadz9OU7eIModh1qYTSHPlhy5oAv0
 Hm9BikNlS1Hzw+a+EbLcOW7TrsteNeGr7r8T6QKPMq1sfsYcp3svbC2c+zVlQ6Ll
 mLoTssksXOkgBevVqSiS
 =T7L5
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging

Block layer patches:

- file-posix: Fix O_DIRECT alignment detection
- Fixes for concurrent block jobs
- block-backend: Queue requests while drained (fix IDE vs. job crashes)
- qemu-img convert: Deprecate using -n and -o together
- iotests: Migration tests with filter nodes
- iotests: More media change tests

# gpg: Signature made Fri 16 Aug 2019 10:29:18 BST
# gpg:                using RSA key 7F09B272C88F2FD6
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" [full]
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74  56FE 7F09 B272 C88F 2FD6

* remotes/kevin/tags/for-upstream:
  file-posix: Handle undetectable alignment
  qemu-img convert: Deprecate using -n and -o together
  block-backend: Queue requests while drained
  mirror: Keep mirror_top_bs drained after dropping permissions
  block: Remove blk_pread_unthrottled()
  iotests: Add test for concurrent stream/commit
  tests: Test mid-drain bdrv_replace_child_noperm()
  tests: Test polling in bdrv_drop_intermediate()
  block: Reduce (un)drains when replacing a child
  block: Keep subtree drained in drop_intermediate
  block: Simplify bdrv_filter_default_perms()
  iotests: Test migration with all kinds of filter nodes
  iotests: Move migration helpers to iotests.py
  iotests/118: Add -blockdev based tests
  iotests/118: Create test classes dynamically
  iotests/118: Test media change for scsi-cd

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2019-08-16 16:43:46 +01:00
Markus Armbruster db72581598 Include qemu/main-loop.h less
In my "build everything" tree, changing qemu/main-loop.h triggers a
recompile of some 5600 out of 6600 objects (not counting tests and
objects that don't depend on qemu/osdep.h).  It includes block/aio.h,
which in turn includes qemu/event_notifier.h, qemu/notify.h,
qemu/processor.h, qemu/qsp.h, qemu/queue.h, qemu/thread-posix.h,
qemu/thread.h, qemu/timer.h, and a few more.

Include qemu/main-loop.h only where it's needed.  Touching it now
recompiles only some 1700 objects.  For block/aio.h and
qemu/event_notifier.h, these numbers drop from 5600 to 2800.  For the
others, they shrink only slightly.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20190812052359.30071-21-armbru@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
2019-08-16 13:31:52 +02:00
Max Reitz debc292767 block: Reduce (un)drains when replacing a child
Currently, bdrv_replace_child_noperm() undrains the parent until it is
completely undrained, then re-drains it after attaching the new child
node.

This is a problem with bdrv_drop_intermediate(): We want to keep the
whole subtree drained, including parents, while the operation is
under way.  bdrv_replace_child_noperm() breaks this by allowing every
parent to become unquiesced briefly, and then redraining it.

In fact, there is no reason why the parent should become unquiesced and
be allowed to submit requests to the new child node if that new node is
supposed to be kept drained.  So if anything, we have to drain the
parent before detaching the old child node.  Conversely, we have to
undrain it only after attaching the new child node.

Thus, change the whole drain algorithm here: Calculate the number of
times we have to drain/undrain the parent before replacing the child
node then drain it (if necessary), replace the child node, and then
undrain it.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-08-16 10:25:16 +02:00
Max Reitz 637d54a5f3 block: Keep subtree drained in drop_intermediate
bdrv_drop_intermediate() calls BdrvChildRole.update_filename().  That
may poll, thus changing the graph, which potentially breaks the
QLIST_FOREACH_SAFE() loop.

Just keep the whole subtree drained.  This is probably the right thing
to do anyway (dropping nodes while the subtree is not drained seems
wrong).

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-08-16 10:25:16 +02:00
Kevin Wolf e444fa8312 block: Simplify bdrv_filter_default_perms()
The same change as commit 2b23f28639 ('block/copy-on-read: Fix
permissions for inactive node') made for the copy-on-read driver can be
made for bdrv_filter_default_perms(): Retaining the old permissions from
the BdrvChild if it is given complicates things unnecessarily when in
the end this only means that the options set in the c == NULL case (i.e.
during child creation) are retained.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2019-08-16 10:25:16 +02:00
Max Reitz 43eaaaef0e block: Only the main loop can change AioContexts
bdrv_set_aio_context_ignore() can only work in the main loop:
bdrv_drained_begin() only works in the main loop and the node's (old)
AioContext; and bdrv_drained_end() really only works in the main loop
and the node's (new) AioContext (contrary to its current comment, which
is just wrong).

Consequentially, bdrv_set_aio_context_ignore() must be called from the
main loop.  Luckily, assuming that we can make block graph changes only
from the main loop as well, all its callers do that already.

Note that changing a node's context in a sense is an operation that
changes the block graph, so it actually makes sense to require this
function to be called from the main loop.

Also, fix bdrv_drained_end()'s description.  You can only use it from
the main loop or the node's AioContext, and in the latter case, the
whole subtree must be in the same context.

Fixes: e037c09c78
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190722133054.21781-3-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-07-22 18:41:43 +02:00
Max Reitz e037c09c78 block: Do not poll in bdrv_do_drained_end()
We should never poll anywhere in bdrv_do_drained_end() (including its
recursive callees like bdrv_drain_invoke()), because it does not cope
well with graph changes.  In fact, it has been written based on the
postulation that no graph changes will happen in it.

Instead, the callers that want to poll must poll, i.e. all currently
globally available wrappers: bdrv_drained_end(),
bdrv_subtree_drained_end(), bdrv_unapply_subtree_drain(), and
bdrv_drain_all_end().  Graph changes there do not matter.

They can poll simply by passing a pointer to a drained_end_counter and
wait until it reaches 0.

This patch also adds a non-polling global wrapper for
bdrv_do_drained_end() that takes a drained_end_counter pointer.  We need
such a variant because now no function called anywhere from
bdrv_do_drained_end() must poll.  This includes
BdrvChildRole.drained_end(), which already must not poll according to
its interface documentation, but bdrv_child_cb_drained_end() just
violates that by invoking bdrv_drained_end() (which does poll).
Therefore, BdrvChildRole.drained_end() must take a *drained_end_counter
parameter, which bdrv_child_cb_drained_end() can pass on to the new
bdrv_drained_end_no_poll() function.

Note that we now have a pattern of all drained_end-related functions
either polling or receiving a *drained_end_counter to let the caller
poll based on that.

A problem with a single poll loop is that when the drained section in
bdrv_set_aio_context_ignore() ends, some nodes in the subgraph may be in
the old contexts, while others are in the new context already.  To let
the collective poll in bdrv_drained_end() work correctly, we must not
hold a lock to the old context, so that the old context can make
progress in case it is different from the current context.

(In the process, remove the comment saying that the current context is
always the old context, because it is wrong.)

In all other places, all nodes in a subtree must be in the same context,
so we can just poll that.  The exception of course is
bdrv_drain_all_end(), but that always runs in the main context, so we
can just poll NULL (like bdrv_drain_all_begin() does).

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-07-19 13:19:16 +02:00
Max Reitz 804db8ea00 block: Introduce BdrvChild.parent_quiesce_counter
Commit 5cb2737e92 laid out why
bdrv_do_drained_end() must decrement the quiesce_counter after
bdrv_drain_invoke().  It did not give a very good reason why it has to
happen after bdrv_parent_drained_end(), instead only claiming symmetry
to bdrv_do_drained_begin().

It turns out that delaying it for so long is wrong.

Situation: We have an active commit job (i.e. a mirror job) from top to
base for the following graph:

                  filter
                    |
                  [file]
                    |
                    v
top --[backing]--> base

Now the VM is closed, which results in the job being cancelled and a
bdrv_drain_all() happening pretty much simultaneously.

Beginning the drain means the job is paused once whenever one of its
nodes is quiesced.  This is reversed when the drain ends.

With how the code currently is, after base's drain ends (which means
that it will have unpaused the job once), its quiesce_counter remains at
1 while it goes to undrain its parents (bdrv_parent_drained_end()).  For
some reason or another, undraining filter causes the job to be kicked
and enter mirror_exit_common(), where it proceeds to invoke
block_job_remove_all_bdrv().

Now base will be detached from the job.  Because its quiesce_counter is
still 1, it will unpause the job once more.  So in total, undraining
base will unpause the job twice.  Eventually, this will lead to the
job's pause_count going negative -- well, it would, were there not an
assertion against this, which crashes qemu.

The general problem is that if in bdrv_parent_drained_end() we undrain
parent A, and then undrain parent B, which then leads to A detaching the
child, bdrv_replace_child_noperm() will undrain A as if we had not done
so yet; that is, one time too many.

It follows that we cannot decrement the quiesce_counter after invoking
bdrv_parent_drained_end().

Unfortunately, decrementing it before bdrv_parent_drained_end() would be
wrong, too.  Imagine the above situation in reverse: Undraining A leads
to B detaching the child.  If we had already decremented the
quiesce_counter by that point, bdrv_replace_child_noperm() would undrain
B one time too little; because it expects bdrv_parent_drained_end() to
issue this undrain.  But bdrv_parent_drained_end() won't do that,
because B is no longer a parent.

Therefore, we have to do something else.  This patch opts for
introducing a second quiesce_counter that counts how many times a
child's parent has been quiesced (though c->role->drained_*).  With
that, bdrv_replace_child_noperm() just has to undrain the parent exactly
that many times when removing a child, and it will always be right.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-07-19 13:19:16 +02:00
Max Reitz 3cf746b3f1 block: Deep-clear inherits_from
BDS.inherits_from does not always point to an immediate parent node.
When launching a block job with a filter node, for example, the node
directly below the filter will not point to the filter, but keep its old
pointee (above the filter).

If that pointee goes away while the job is still running, the node's
inherits_from will not be updated and thus point to garbage.  To fix
this, bdrv_unref_child() has to check not only the parent node's
immediate children for nodes whose inherits_from needs to be cleared,
but its whole subtree.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190703172813.6868-7-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-07-15 15:48:40 +02:00
Max Reitz e5182c1c57 block: Add BDS.never_freeze
The commit and the mirror block job must be able to drop their filter
node at any point.  However, this will not be possible if any of the
BdrvChild links to them is frozen.  Therefore, we need to prevent them
from ever becoming frozen.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-id: 20190703172813.6868-2-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-07-15 15:48:40 +02:00
Max Reitz 1046779e64 block: Ignore loosening perm restrictions failures
We generally assume that loosening permission restrictions can never
fail.  We have seen in the past that this assumption is wrong.  This has
led to crashes because we generally pass &error_abort when loosening
permissions.

However, a failure in such a case should actually be handled in quite
the opposite way: It is very much not fatal, so qemu may report it, but
still consider the operation successful.  The only realistic problem is
that qemu may then retain permissions and thus locks on images it
actually does not require.  But again, that is not fatal.

To implement this behavior, we make all functions that change
permissions and that pass &error_abort to the initiating function
(bdrv_check_perm() or bdrv_child_check_perm()) evaluate the
@loosen_restrictions value introduced in the previous patch.  If it is
true and an error did occur, we abort the permission update, discard the
error, and instead report success to the caller.

bdrv_child_try_set_perm() itself does not pass &error_abort, but it is
the only public function to change permissions.  As such, callers may
pass &error_abort to it, expecting dropping permission restrictions to
never fail.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-06-18 16:41:10 +02:00
Max Reitz 9eab154415 block: Add *tighten_restrictions to *check*_perm()
This patch makes three functions report whether the necessary permission
change tightens restrictions or not.  These functions are:
- bdrv_check_perm()
- bdrv_check_update_perm()
- bdrv_child_check_perm()

Callers can use this result to decide whether a failure is fatal or not
(see the next patch).

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-06-18 16:41:10 +02:00
Max Reitz 87ace5f8b6 block: Fix order in bdrv_replace_child()
We have to start by applying the permission restrictions to new_bs
before we can loosen them on old_bs.  See the comment for the
explanation.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-06-18 16:41:10 +02:00
Max Reitz c1087f1206 block: Add bdrv_child_refresh_perms()
If a block node uses bdrv_child_try_set_perm() to change the permission
it takes on its child, the result may be very short-lived.  If anything
makes the block layer recalculate the permissions internally, it will
invoke the node driver's .bdrv_child_perm() implementation.  The
permission/shared permissions masks that returns will then override the
values previously passed to bdrv_child_try_set_perm().

If drivers want a child edge to have specific values for the
permissions/shared permissions mask, it must return them in
.bdrv_child_perm().  Consequentially, there is no need for them to pass
the same values to bdrv_child_try_set_perm() then: It is better to have
a function that invokes .bdrv_child_perm() and calls
bdrv_child_try_set_perm() with the result.  This patch adds such a
function under the name of bdrv_child_refresh_perms().

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-06-18 16:41:10 +02:00
Vladimir Sementsov-Ogievskiy b23c580c94 block: drop bs->job
Drop remaining users of bs->job:
1. assertions actually duplicated by assert(!bs->refcnt)
2. trace-point seems not enough reason to change stream_start to return
   BlockJob pointer
3. Restricting creation of two jobs based on same bs is bad idea, as
   3.1 Some jobs creates filters to be their main node, so, this check
   don't actually prevent creating second job on same real node (which
   will create another filter node) (but I hope it is restricted by
   other mechanisms)
   3.2 Even without bs->job we have two systems of permissions:
   op-blockers and BLK_PERM
   3.3 We may want to run several jobs on one node one day

And finally, drop bs->job pointer itself. Hurrah!

Suggested-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-06-18 16:41:10 +02:00
Kevin Wolf 42a65f02f9 block: Remove bdrv_set_aio_context()
All callers of bdrv_set_aio_context() are eliminated now, they have
moved to bdrv_try_set_aio_context() and related safe functions. Remove
bdrv_set_aio_context().

With this, we can now know that the .set_aio_ctx callback must be
present in bdrv_set_aio_context_ignore() because
bdrv_can_set_aio_context() would have returned false previously, so
instead of checking the condition, we can assert it.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-06-04 16:55:58 +02:00
Kevin Wolf d0ee0204f4 block: Remove wrong bdrv_set_aio_context() calls
The mirror and commit block jobs use bdrv_set_aio_context() to move
their filter node into the right AioContext before hooking it up in the
graph. Similarly, bdrv_open_backing_file() explicitly moves the backing
file node into the right AioContext first.

This isn't necessary any more, they get automatically moved into the
right context now when attaching them.

However, in the case of bdrv_open_backing_file() with a node reference,
it's actually not only unnecessary, but even wrong: The unchecked
bdrv_set_aio_context() changes the AioContext of the child node even if
other parents require it to retain the old context. So this is not only
a simplification, but a bug fix, too.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1684342
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-06-04 15:22:22 +02:00
Kevin Wolf ad943dcb22 block: Move node without parents to main AioContext
A node should only be in a non-default AioContext if a user is attached
to it that requires this. When the last parent of a node is gone, it can
move back to the main AioContext.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-06-04 15:22:22 +02:00
Kevin Wolf 132ada80c4 block: Adjust AioContexts when attaching nodes
So far, we only made sure that updating the AioContext of a node
affected the whole subtree. However, if a node is newly attached to a
new parent, we also need to make sure that both the subtree of the node
and the parent are in the same AioContext. This tries to move the new
child node to the parent AioContext and returns an error if this isn't
possible.

BlockBackends now actually apply their AioContext to their root node.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-06-04 15:22:22 +02:00
Kevin Wolf d861ab3acf block: Add BlockBackend.ctx
This adds a new parameter to blk_new() which requires its callers to
declare from which AioContext this BlockBackend is going to be used (or
the locks of which AioContext need to be taken anyway).

The given context is only stored and kept up to date when changing
AioContexts. Actually applying the stored AioContext to the root node
is saved for another commit.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-06-04 15:22:22 +02:00
Kevin Wolf f871abd60f block: Drain source node in bdrv_replace_node()
Instead of just asserting that no requests are in flight in
bdrv_replace_node(), which is a requirement that most callers ignore, we
can just drain the source node right there. This fixes at least starting
a commit job while I/O is active on the backing chain, but probably
other callers, too.

Having requests in flight on the target node isn't a problem because the
target just gets new parents, but the call path of running requests
isn't modified. So we can just drop this assertion without a replacement.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1711643
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2019-06-04 15:20:41 +02:00
Alberto Garcia b441dc71c0 block: Make bdrv_root_attach_child() unref child_bs on failure
A consequence of the previous patch is that bdrv_attach_child()
transfers the reference to child_bs from the caller to parent_bs,
which will drop it on bdrv_close() or when someone calls
bdrv_unref_child().

But this only happens when bdrv_attach_child() succeeds. If it fails
then the caller is responsible for dropping the reference to child_bs.

This patch makes bdrv_attach_child() take the reference also when
there is an error, freeing the caller for having to do it.

A similar situation happens with bdrv_root_attach_child(), so the
changes on this patch affect both functions.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-id: 20dfb3d9ccec559cdd1a9690146abad5d204a186.1557754872.git.berto@igalia.com
[mreitz: Removed now superfluous BdrvChild * variable in
         bdrv_open_child()]
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-05-28 20:30:55 +02:00
Alberto Garcia dd4118c792 block: Use bdrv_unref_child() for all children in bdrv_close()
bdrv_unref_child() does the following things:

  - Updates the child->bs->inherits_from pointer.
  - Calls bdrv_detach_child() to remove the BdrvChild from bs->children.
  - Calls bdrv_unref() to unref the child BlockDriverState.

When bdrv_unref_child() was introduced in commit 33a604075c it was not
used in bdrv_close() because the drivers that had additional children
(like quorum or blkverify) had already called bdrv_unref() on their
children during their own close functions.

This was changed later (in 0bd6e91a7e for quorum, in 3e586be0b2 for
blkverify) so there's no reason not to use bdrv_unref_child() in
bdrv_close() anymore.

After this there's also no need to remove bs->backing and bs->file
separately from the rest of the children, so bdrv_close() can be
simplified.

Now bdrv_close() unrefs all children (before this patch it was only
bs->file and bs->backing). As a result, none of the callers of
brvd_attach_child() should remove their reference to child_bs (because
this function effectively steals that reference). This patch updates a
couple of tests that were doing their own bdrv_unref().

Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-id: 6d1d5feaa53aa1ab127adb73d605dc4503e3abd5.1557754872.git.berto@igalia.com
[mreitz: s/where/were/]
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-05-28 20:30:55 +02:00
Max Reitz 481e0eeef4 block: Improve "Block node is read-only" message
This message does not make any sense when it appears as the response to
making an R/W node read-only.  We should detect that case and emit a
different message, then.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-05-20 17:08:57 +02:00
Kevin Wolf 53a7d04185 block: Propagate AioContext change to parents
All block nodes and users in any connected component of the block graph
must be in the same AioContext, so changing the AioContext of one node
must not only change all of its children, but all of its parents (and
in turn their children etc.) as well.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-05-20 17:08:56 +02:00
Kevin Wolf 0d83708a1d block: Move recursion to bdrv_set_aio_context()
Instead of having two recursions, in bdrv_attach_aio_context() and in
bdrv_detach_aio_context(), just having one recursion is enough. Said
functions are only about a single node now.

It is important that the recursion doesn't happen between detaching and
attaching a context to the current node because the nested call will
drain the node, and draining with a NULL context would segfault.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-05-20 17:08:56 +02:00
Kevin Wolf a3a683c33d block: Make bdrv_attach/detach_aio_context() static
Since commit b97511c7bc, there is no reason for block drivers any more
to call these functions (see the function comment in block_int.h). They
are now just internal helper functions for bdrv_set_aio_context()
and can be made static.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-05-20 17:08:56 +02:00
Kevin Wolf 5d2318499f block: Add bdrv_try_set_aio_context()
Eventually, we want to make sure that all parents and all children of a
node are in the same AioContext as the node itself. This means that
changing the AioContext may fail because one of the other involved
parties (e.g. a guest device that was configured with an iothread)
cannot allow switching to a different AioContext.

Introduce a set of functions that allow to first check whether all
involved nodes can switch to a new context and only then do the actual
switch. The check recursively covers children and parents.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-05-20 17:08:56 +02:00
Anton Kuchin 30c321f97f block: remove bs from lists before closing
Close involves flush that can be performed asynchronously and bs
must be protected from being referenced before it is deleted.

Signed-off-by: Anton Kuchin <antonkuchin@yandex-team.ru>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-05-10 16:45:40 +02:00
Nikita Alekseev 66a5bdf309 block: Add coroutine_fn to bdrv_check_co_entry
bdrv_check_co_entry calls bdrv_co_check, which is a coroutine function.
Thus, it also needs to be marked as a coroutine.

Signed-off-by: Nikita Alekseev <n.alekseev2104@gmail.com>
Message-id: 20190401093051.16488-1-n.alekseev2104@gmail.com
Message-Id: <20190401093051.16488-1-n.alekseev2104@gmail.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2019-05-10 10:53:21 +01:00
Alberto Garcia 0b3ca76e52 block: Assert that drv->bdrv_child_perm is set in bdrv_child_perm()
There is no need to check for this because all block drivers that have
children implement bdrv_child_perm and all callers already ensure that
bs->drv is set.

Furthermore, if this check would fail then the callers would end up
with uninitialized values for nperm and nshared.

This patch replaces the check with an assertion.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-id: 20190404112953.4058-1-berto@igalia.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-05-07 17:14:21 +02:00
Kevin Wolf 1bffe1ae7a block: Fix AioContext switch for bs->drv == NULL
Even for block nodes with bs->drv == NULL, we can't just ignore a
bdrv_set_aio_context() call. Leaving the node in its old context can
mean that it's still in an iothread context in bdrv_close_all() during
shutdown, resulting in an attempted unlock of the AioContext lock which
we don't hold.

This is an example stack trace of a related crash:

 #0  0x00007ffff59da57f in raise () at /lib64/libc.so.6
 #1  0x00007ffff59c4895 in abort () at /lib64/libc.so.6
 #2  0x0000555555b97b1e in error_exit (err=<optimized out>, msg=msg@entry=0x555555d386d0 <__func__.19059> "qemu_mutex_unlock_impl") at util/qemu-thread-posix.c:36
 #3  0x0000555555b97f7f in qemu_mutex_unlock_impl (mutex=mutex@entry=0x5555568002f0, file=file@entry=0x555555d378df "util/async.c", line=line@entry=507) at util/qemu-thread-posix.c:97
 #4  0x0000555555b92f55 in aio_context_release (ctx=ctx@entry=0x555556800290) at util/async.c:507
 #5  0x0000555555b05cf8 in bdrv_prwv_co (child=child@entry=0x7fffc80012f0, offset=offset@entry=131072, qiov=qiov@entry=0x7fffffffd4f0, is_write=is_write@entry=true, flags=flags@entry=0)
         at block/io.c:833
 #6  0x0000555555b060a9 in bdrv_pwritev (qiov=0x7fffffffd4f0, offset=131072, child=0x7fffc80012f0) at block/io.c:990
 #7  0x0000555555b060a9 in bdrv_pwrite (child=0x7fffc80012f0, offset=131072, buf=<optimized out>, bytes=<optimized out>) at block/io.c:990
 #8  0x0000555555ae172b in qcow2_cache_entry_flush (bs=bs@entry=0x555556810680, c=c@entry=0x5555568cc740, i=i@entry=0) at block/qcow2-cache.c:51
 #9  0x0000555555ae18dd in qcow2_cache_write (bs=bs@entry=0x555556810680, c=0x5555568cc740) at block/qcow2-cache.c:248
 #10 0x0000555555ae15de in qcow2_cache_flush (bs=0x555556810680, c=<optimized out>) at block/qcow2-cache.c:259
 #11 0x0000555555ae16b1 in qcow2_cache_flush_dependency (c=0x5555568a1700, c=0x5555568a1700, bs=0x555556810680) at block/qcow2-cache.c:194
 #12 0x0000555555ae16b1 in qcow2_cache_entry_flush (bs=bs@entry=0x555556810680, c=c@entry=0x5555568a1700, i=i@entry=0) at block/qcow2-cache.c:194
 #13 0x0000555555ae18dd in qcow2_cache_write (bs=bs@entry=0x555556810680, c=0x5555568a1700) at block/qcow2-cache.c:248
 #14 0x0000555555ae15de in qcow2_cache_flush (bs=bs@entry=0x555556810680, c=<optimized out>) at block/qcow2-cache.c:259
 #15 0x0000555555ad242c in qcow2_inactivate (bs=bs@entry=0x555556810680) at block/qcow2.c:2124
 #16 0x0000555555ad2590 in qcow2_close (bs=0x555556810680) at block/qcow2.c:2153
 #17 0x0000555555ab0c62 in bdrv_close (bs=0x555556810680) at block.c:3358
 #18 0x0000555555ab0c62 in bdrv_delete (bs=0x555556810680) at block.c:3542
 #19 0x0000555555ab0c62 in bdrv_unref (bs=0x555556810680) at block.c:4598
 #20 0x0000555555af4d72 in blk_remove_bs (blk=blk@entry=0x5555568103d0) at block/block-backend.c:785
 #21 0x0000555555af4dbb in blk_remove_all_bs () at block/block-backend.c:483
 #22 0x0000555555aae02f in bdrv_close_all () at block.c:3412
 #23 0x00005555557f9796 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4776

The reproducer I used is a qcow2 image on gluster volume, where the
virtual disk size (4 GB) is larger than the gluster volume size (64M),
so we can easily trigger an ENOSPC. This backend is assigned to a
virtio-blk device using an iothread, and then from the guest a
'dd if=/dev/zero of=/dev/vda bs=1G count=1' causes the VM to stop
because of an I/O error. qemu_gluster_co_flush_to_disk() sets
bs->drv = NULL on error, so when virtio-blk stops the dataplane, the
block nodes stay in the iothread AioContext. A 'quit' monitor command
issued from this paused state crashes the process.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1631227
Cc: qemu-stable@nongnu.org
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
2019-04-30 15:29:00 +02:00
Kevin Wolf 3f48686fac block: Forward 'discard' to temporary overlay
When bdrv_temp_snapshot_options() is called for snapshot=on, the
'discard' option in the options QDict hasn't been parsed and merged into
the flags yet. So copy the dict entry to make sure that the temporary
overlay enables discard when it was requested for the drive.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
2019-04-08 16:48:46 +02:00
Alberto Garcia 0f0998f621 block: continue until base is found in bdrv_freeze_backing_chain() et al
All three functions that handle the BdrvChild.frozen attribute walk
the backing chain from 'bs' to 'base' and stop either when 'base' is
found or at the end of the chain if 'base' is NULL.

However if 'base' is not found then the functions return without
errors as if it was NULL.

This is wrong: if the caller passed an incorrect parameter that means
that there is a bug in the code.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-04-02 12:04:44 +02:00