Commit Graph

385 Commits

Author SHA1 Message Date
Vladimir Sementsov-Ogievskiy
8ac5aab255 block: bdrv_mark_request_serialising: split non-waiting function
We'll need a separate function, which will only "mark" request
serialising with specified align but not wait for conflicting
requests. So, it will be like old bdrv_mark_request_serialising(),
before merging bdrv_wait_serialising_requests_locked() into it.

To reduce the possible mess, let's do the following:

Public function that does both marking and waiting will be called
bdrv_make_request_serialising, and private function which will only
"mark" will be called tracked_request_set_serialising().

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20201021145859.11201-6-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-12-18 12:35:55 +01:00
Vladimir Sementsov-Ogievskiy
8b1170012b block: introduce BDRV_MAX_LENGTH
We are going to modify block layer to work with 64bit requests. And
first step is moving to int64_t type for both offset and bytes
arguments in all block request related functions.

It's mostly safe (when widening signed or unsigned int to int64_t), but
switching from uint64_t is questionable.

So, let's first establish the set of requests we want to work with.
First signed int64_t should be enough, as off_t is signed anyway. Then,
obviously offset + bytes should not overflow.

And most interesting: (offset + bytes) being aligned up should not
overflow as well. Aligned to what alignment? First thing that comes in
mind is bs->bl.request_alignment, as we align up request to this
alignment. But there is another thing: look at
bdrv_mark_request_serialising(). It aligns request up to some given
alignment. And this parameter may be bdrv_get_cluster_size(), which is
often a lot greater than bs->bl.request_alignment.
Note also, that bdrv_mark_request_serialising() uses signed int64_t for
calculations. So, actually, we already depend on some restrictions.

Happily, bdrv_get_cluster_size() returns int and
bs->bl.request_alignment has 32bit unsigned type, but defined to be a
power of 2 less than INT_MAX. So, we may establish, that INT_MAX is
absolute maximum for any kind of alignment that may occur with the
request.

Note, that bdrv_get_cluster_size() is not documented to return power
of 2, still bdrv_mark_request_serialising() behaves like it is.
Also, backup uses bdi.cluster_size and is not prepared to it not being
power of 2.
So, let's establish that Qemu supports only power-of-2 clusters and
alignments.

So, alignment can't be greater than 2^30.

Finally to be safe with calculations, to not calculate different
maximums for different nodes (depending on cluster size and
request_alignment), let's simply set QEMU_ALIGN_DOWN(INT64_MAX, 2^30)
as absolute maximum bytes length for Qemu. Actually, it's not much less
than INT64_MAX.

OK, then, let's apply it to block/io.

Let's consider all block/io entry points of offset/bytes:

4 bytes/offset interface functions: bdrv_co_preadv_part(),
bdrv_co_pwritev_part(), bdrv_co_copy_range_internal() and
bdrv_co_pdiscard() and we check them all with bdrv_check_request().

We also have one entry point with only offset: bdrv_co_truncate().
Check the offset.

And one public structure: BdrvTrackedRequest. Happily, it has only
three external users:

 file-posix.c: adopted by this patch
 write-threshold.c: only read fields
 test-write-threshold.c: sets obviously small constant values

Better is to make the structure private and add corresponding
interfaces.. Still it's not obvious what kind of interface is needed
for file-posix.c. Let's keep it public but add corresponding
assertions.

After this patch we'll convert functions in block/io.c to int64_t bytes
and offset parameters. We can assume that offset/bytes pair always
satisfy new restrictions, and make
corresponding assertions where needed. If we reach some offset/bytes
point in block/io.c missing bdrv_check_request() it is considered a
bug. As well, if block/io.c modifies a offset/bytes request, expanding
it more then aligning up to request_alignment, it's a bug too.

For all io requests except for discard we keep for now old restriction
of 32bit request length.

iotest 206 output error message changed, as now test disk size is
larger than new limit. Add one more test case with new maximum disk
size to cover too-big-L1 case.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20201203222713.13507-5-vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-12-11 17:52:40 +01:00
Greg Kurz
009cde17a5 block: Move bdrv_drain_all_end_quiesce() to block_int.h
This function is really an internal helper for bdrv_close(). Update its
doc comment to make this clear and make the function private.

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <160387245480.131299.13430357162209598411.stgit@bahia>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-11-09 15:44:21 +01:00
Max Reitz
0b877d09df block: Leave BDS.backing_{file,format} constant
Parts of the block layer treat BDS.backing_file as if it were whatever
the image header says (i.e., if it is a relative path, it is relative to
the overlay), other parts treat it like a cache for
bs->backing->bs->filename (relative paths are relative to the CWD).
Considering bs->backing->bs->filename exists, let us make it mean the
former.

Among other things, this now allows the user to specify a base when
using qemu-img to commit an image file in a directory that is not the
CWD (assuming, everything uses relative filenames).

Before this patch:

$ ./qemu-img create -f qcow2 foo/bot.qcow2 1M
$ ./qemu-img create -f qcow2 -b bot.qcow2 foo/mid.qcow2
$ ./qemu-img create -f qcow2 -b mid.qcow2 foo/top.qcow2
$ ./qemu-img commit -b mid.qcow2 foo/top.qcow2
qemu-img: Did not find 'mid.qcow2' in the backing chain of 'foo/top.qcow2'
$ ./qemu-img commit -b foo/mid.qcow2 foo/top.qcow2
qemu-img: Did not find 'foo/mid.qcow2' in the backing chain of 'foo/top.qcow2'
$ ./qemu-img commit -b $PWD/foo/mid.qcow2 foo/top.qcow2
qemu-img: Did not find '[...]/foo/mid.qcow2' in the backing chain of 'foo/top.qcow2'

After this patch:

$ ./qemu-img commit -b mid.qcow2 foo/top.qcow2
Image committed.
$ ./qemu-img commit -b foo/mid.qcow2 foo/top.qcow2
qemu-img: Did not find 'foo/mid.qcow2' in the backing chain of 'foo/top.qcow2'
$ ./qemu-img commit -b $PWD/foo/mid.qcow2 foo/top.qcow2
Image committed.

With this change, bdrv_find_backing_image() must look at whether the
user has overridden a BDS's backing file.  If so, it can no longer use
bs->backing_file, but must instead compare the given filename against
the backing node's filename directly.

Note that this changes the QAPI output for a node's backing_file.  We
had very inconsistent output there (sometimes what the image header
said, sometimes the actual filename of the backing image).  This
inconsistent output was effectively useless, so we have to decide one
way or the other.  Considering that bs->backing_file usually at runtime
contained the path to the image relative to qemu's CWD (or absolute),
this patch changes QAPI's backing_file to always report the
bs->backing->bs->filename from now on.  If you want to receive the image
header information, you have to refer to full-backing-filename.

This necessitates a change to iotest 228.  The interesting information
it really wanted is the image header, and it can get that now, but it
has to use full-backing-filename instead of backing_file.  Because of
this patch's changes to bs->backing_file's behavior, we also need some
reference output changes.

Along with the changes to bs->backing_file, stop updating
BDS.backing_format in bdrv_backing_attach() as well.  This way,
ImageInfo's backing-filename and backing-filename-format fields will
represent what the image header says and nothing else.

iotest 245 changes in behavior: With the backing node no longer
overriding the parent node's backing_file string, you can now omit the
@backing option when reopening a node with neither a default nor a
current backing file even if it used to have a backing node at some
point.

273 also changes: The base image is opened without a format layer, so
ImageInfo.backing-filename-format used to report "file" for the base
image's overlay after blockdev-snapshot.  However, the image header
never says "file" anywhere, so it now reports $IMGFMT.

Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-09-07 12:31:31 +02:00
Max Reitz
549ec0d978 block: Inline bdrv_co_block_status_from_*()
With bdrv_filter_bs(), we can easily handle this default filter behavior
in bdrv_co_block_status().

blkdebug wants to have an additional assertion, so it keeps its own
implementation, except bdrv_co_block_status_from_file() needs to be
inlined there.

Suggested-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
2020-09-07 12:31:31 +02:00
Max Reitz
f1a7f18f07 block: Drop backing_bs()
We want to make it explicit where bs->backing is used, and we have done
so.  The old role of backing_bs() is now effectively taken by
bdrv_cow_bs().

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
2020-09-07 12:31:31 +02:00
Max Reitz
d38d7eb8a5 block: Add chain helper functions
Add some helper functions for skipping filters in a chain of block
nodes.

Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-09-07 12:31:30 +02:00
Max Reitz
9a6fc88799 block: Add child access functions
There are BDS children that the general block layer code can access,
namely bs->file and bs->backing.  Since the introduction of filters and
external data files, their meaning is not quite clear.  bs->backing can
be a COW source, or it can be a filtered child; bs->file can be a
filtered child, it can be data and metadata storage, or it can be just
metadata storage.

This overloading really is not helpful.  This patch adds functions that
retrieve the correct child for each exact purpose.  Later patches in
this series will make use of them.  Doing so will allow us to handle
filter nodes in a meaningful way.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2020-09-07 12:31:30 +02:00
Alberto Garcia
7be2025258 qcow2: Add the 'extended_l2' option and the QCOW2_INCOMPAT_EXTL2 bit
Now that the implementation of subclusters is complete we can finally
add the necessary options to create and read images with this feature,
which we call "extended L2 entries".

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <6476caaa73216bd05b7bb2d504a20415e1665176.1594396418.git.berto@igalia.com>
[mreitz: %s/5\.1/5.2/; fixed 302's and 303's reference output]
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-08-25 09:19:55 +02:00
Kevin Wolf
ffa244c84a file-posix: Mitigate file fragmentation with extent size hints
Especially when O_DIRECT is used with image files so that the page cache
indirection can't cause a merge of allocating requests, the file will
fragment on the file system layer, with a potentially very small
fragment size (this depends on the requests the guest sent).

On Linux, fragmentation can be reduced by setting an extent size hint
when creating the file (at least on XFS, it can't be set any more after
the first extent has been allocated), basically giving raw files a
"cluster size" for allocation.

This adds a create option to set the extent size hint, and changes the
default from not setting a hint to setting it to 1 MB. The main reason
why qcow2 defaults to smaller cluster sizes is that COW becomes more
expensive, which is not an issue with raw files, so we can choose a
larger size. The tradeoff here is only potentially wasted disk space.

For qcow2 (or other image formats) over file-posix, the advantage should
even be greater because they grow sequentially without leaving holes, so
there won't be wasted space. Setting even larger extent size hints for
such images may make sense. This can be done with the new option, but
let's keep the default conservative for now.

The effect is very visible with a test that intentionally creates a
badly fragmented file with qemu-img bench (the time difference while
creating the file is already remarkable) and then looks at the number of
extents and the time a simple "qemu-img map" takes.

Without an extent size hint:

    $ ./qemu-img create -f raw -o extent_size_hint=0 ~/tmp/test.raw 10G
    Formatting '/home/kwolf/tmp/test.raw', fmt=raw size=10737418240 extent_size_hint=0
    $ ./qemu-img bench -f raw -t none -n -w ~/tmp/test.raw -c 1000000 -S 8192 -o 0
    Sending 1000000 write requests, 4096 bytes each, 64 in parallel (starting at offset 0, step size 8192)
    Run completed in 25.848 seconds.
    $ ./qemu-img bench -f raw -t none -n -w ~/tmp/test.raw -c 1000000 -S 8192 -o 4096
    Sending 1000000 write requests, 4096 bytes each, 64 in parallel (starting at offset 4096, step size 8192)
    Run completed in 19.616 seconds.
    $ filefrag ~/tmp/test.raw
    /home/kwolf/tmp/test.raw: 2000000 extents found
    $ time ./qemu-img map ~/tmp/test.raw
    Offset          Length          Mapped to       File
    0               0x1e8480000     0               /home/kwolf/tmp/test.raw

    real    0m1,279s
    user    0m0,043s
    sys     0m1,226s

With the new default extent size hint of 1 MB:

    $ ./qemu-img create -f raw -o extent_size_hint=1M ~/tmp/test.raw 10G
    Formatting '/home/kwolf/tmp/test.raw', fmt=raw size=10737418240 extent_size_hint=1048576
    $ ./qemu-img bench -f raw -t none -n -w ~/tmp/test.raw -c 1000000 -S 8192 -o 0
    Sending 1000000 write requests, 4096 bytes each, 64 in parallel (starting at offset 0, step size 8192)
    Run completed in 11.833 seconds.
    $ ./qemu-img bench -f raw -t none -n -w ~/tmp/test.raw -c 1000000 -S 8192 -o 4096
    Sending 1000000 write requests, 4096 bytes each, 64 in parallel (starting at offset 4096, step size 8192)
    Run completed in 10.155 seconds.
    $ filefrag ~/tmp/test.raw
    /home/kwolf/tmp/test.raw: 178 extents found
    $ time ./qemu-img map ~/tmp/test.raw
    Offset          Length          Mapped to       File
    0               0x1e8480000     0               /home/kwolf/tmp/test.raw

    real    0m0,061s
    user    0m0,040s
    sys     0m0,014s

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20200707142329.48303-1-kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-07-14 15:18:59 +02:00
Vladimir Sementsov-Ogievskiy
a2adbbf603 block: drop unallocated_blocks_are_zero
Currently this field only set by qed and qcow2. But in fact, all
backing-supporting formats (parallels, qcow, qcow2, qed, vmdk) share
these semantics: on unallocated blocks, if there is no backing file they
just memset the buffer with zeroes.

So, document this behavior for .supports_backing and drop
.unallocated_blocks_are_zero

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200528094405.145708-10-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-07-06 10:34:14 +02:00
Maxim Levitsky
ced914d0ab block/core: add generic infrastructure for x-blockdev-amend qmp command
blockdev-amend will be used similiar to blockdev-create
to allow on the fly changes of the structure of the format based block devices.

Current plan is to first support encryption keyslot management for luks
based formats (raw and embedded in qcow2)

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20200608094030.670121-12-mlevitsk@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-07-06 08:49:28 +02:00
Maxim Levitsky
df373fb0a3 block/amend: separate amend and create options for qemu-img
Some options are only useful for creation
(or hard to be amended, like cluster size for qcow2), while some other
options are only useful for amend, like upcoming keyslot management
options for luks

Since currently only qcow2 supports amend, move all its options
to a common macro and then include it in each action option list.

In future it might be useful to remove some options which are
not supported anyway from amend list, which currently
cause an error message if amended.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200608094030.670121-5-mlevitsk@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-07-06 08:49:28 +02:00
Maxim Levitsky
a3579bfa0a block/amend: add 'force' option
'force' option will be used for some unsafe amend operations.

This includes things like erasing last keyslot in luks based formats
(which destroys the data, unless the master key is backed up
by external means), but that _might_ be desired result.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200608094030.670121-4-mlevitsk@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-07-06 08:49:28 +02:00
Eric Blake
c6996cf9a6 blockdev: Promote several bitmap functions to non-static
The next patch will split blockdev.c, which will require accessing
some previously-static functions from more than one .c file.  But part
of promoting a function to public is picking a naming scheme that does
not reek of exposing too many internals (two of the three functions
were named starting with 'do_').  To make future code motion easier,
perform the function rename and non-static promotion into its own
patch.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200513011648.166876-5-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2020-05-19 10:32:14 -05:00
Eric Blake
ef893b5c84 block: Make it easier to learn which BDS support bitmaps
Upcoming patches will enhance bitmap support in qemu-img, but in doing
so, it turns out to be nice to suppress output when persistent bitmaps
make no sense (such as on a qcow2 v2 image).  Add a hook to make this
easier to query.

This patch adds a new callback .bdrv_supports_persistent_dirty_bitmap,
rather than trying to shoehorn the answer in via existing callbacks.
In particular, while it might have been possible to overload
.bdrv_co_can_store_new_dirty_bitmap to special-case a NULL input to
answer whether any persistent bitmaps are supported, that is at odds
with whether a particular bitmap can be stored (for example, even on
an image that supports persistent bitmaps but has currently filled up
the maximum number of bitmaps, attempts to store another one should
fail); and the new functionality doesn't require coroutine safety.
Similarly, we could have added one more piece of information to
.bdrv_get_info, but then again, most callers to that function tend to
already discard extraneous information, and making it a catch-all
rather than a series of dedicated scalar queries hasn't really
simplified life.

In the future, when we improve the ability to look up bitmaps through
a filter, we will probably also want to teach the block layer to
automatically let filters pass this request on through.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200513011648.166876-4-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2020-05-19 10:32:14 -05:00
Max Reitz
e5d8a40685 block: Drop @child_class from bdrv_child_perm()
Implementations should decide the necessary permissions based on @role.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200513110544.176672-35-mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-05-18 19:05:25 +02:00
Max Reitz
f6de853fa3 block: Drop child_file
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200513110544.176672-33-mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-05-18 19:05:25 +02:00
Max Reitz
9aab945e9c block: Drop bdrv_format_default_perms()
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200513110544.176672-32-mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-05-18 19:05:25 +02:00
Max Reitz
87278af1d9 block: Make bdrv_filter_default_perms() static
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200513110544.176672-31-mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-05-18 19:05:25 +02:00
Max Reitz
ff3541c4e2 block: Drop child_backing
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200513110544.176672-25-mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-05-18 19:05:25 +02:00
Max Reitz
f34ade1148 block: Drop child_format
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200513110544.176672-23-mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-05-18 19:05:25 +02:00
Max Reitz
2519f54919 block: Add bdrv_default_perms()
This callback can be used by BDSs that use child_of_bds with the
appropriate BdrvChildRole for their children.

Also, make bdrv_format_default_perms() use it for child_of_bds children
(just a temporary solution until we can drop bdrv_format_default_perms()
altogether).

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200513110544.176672-20-mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-05-18 19:05:25 +02:00
Max Reitz
4348355032 block: Add child_of_bds
Any current user of child_file, child_format, and child_backing can and
should use this generic BdrvChildClass instead, as it can handle all of
these cases.  However, to be able to do so, the users must pass the
appropriate BdrvChildRole when the child is created/attached.  (The
following commits will take care of that.)

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200513110544.176672-15-mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-05-18 19:05:25 +02:00
Max Reitz
3cdc69d31b block: Pass parent_is_format to .inherit_options()
We plan to unify the generic .inherit_options() functions.  The
resulting common function will need to decide whether to force-enable
format probing, force-disable it, or leave it as-is.  To make this
decision, it will need to know whether the parent node is a format node
or not (because we never want format probing if the parent is a format
node already (except for the backing chain)).

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200513110544.176672-9-mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-05-18 19:05:25 +02:00
Max Reitz
272c02eaef block: Pass BdrvChildRole to .inherit_options()
For now, all callers (effectively) pass 0 and no callee evaluates thie
value.  Later patches will change both.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200513110544.176672-8-mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-05-18 19:05:25 +02:00
Max Reitz
bf8e925eb5 block: Pass BdrvChildRole to bdrv_child_perm()
For now, all callers pass 0 and no callee evaluates this value.  Later
patches will change both.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200513110544.176672-7-mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-05-18 19:05:25 +02:00
Max Reitz
258b776515 block: Add BdrvChildRole to BdrvChild
For now, it is always set to 0.  Later patches in this series will
ensure that all callers pass an appropriate combination of flags.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200513110544.176672-6-mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-05-18 19:05:25 +02:00
Max Reitz
bd86fb990c block: Rename BdrvChildRole to BdrvChildClass
This structure nearly only contains parent callbacks for child state
changes.  It cannot really reflect a child's role, because different
roles may overlap (as we will see when real roles are introduced), and
because parents can have custom callbacks even when the child fulfills a
standard role.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-Id: <20200513110544.176672-4-mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-05-18 19:05:25 +02:00
Max Reitz
d67066d8bc block: Add BlockDriver.is_format
We want to unify child_format and child_file at some point.  One of the
important things that set format drivers apart from other drivers is
that they do not expect other format nodes under them (except in the
backing chain), i.e. we must not probe formats inside of formats.  That
means we need something on which to distinguish format drivers from
others, and hence this flag.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-Id: <20200513110544.176672-3-mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-05-18 19:05:25 +02:00
Denis Plotnikov
572ad9783f qcow2: introduce compression type feature
The patch adds some preparation parts for incompatible compression type
feature to qcow2 allowing the use different compression methods for
image clusters (de)compressing.

It is implied that the compression type is set on the image creation and
can be changed only later by image conversion, thus compression type
defines the only compression algorithm used for the image, and thus,
for all image clusters.

The goal of the feature is to add support of other compression methods
to qcow2. For example, ZSTD which is more effective on compression than ZLIB.

The default compression is ZLIB. Images created with ZLIB compression type
are backward compatible with older qemu versions.

Adding of the compression type breaks a number of tests because now the
compression type is reported on image creation and there are some changes
in the qcow2 header in size and offsets.

The tests are fixed in the following ways:
    * filter out compression_type for many tests
    * fix header size, feature table size and backing file offset
      affected tests: 031, 036, 061, 080
      header_size +=8: 1 byte compression type
                       7 bytes padding
      feature_table += 48: incompatible feature compression type
      backing_file_offset += 56 (8 + 48 -> header_change + feature_table_change)
    * add "compression type" for test output matching when it isn't filtered
      affected tests: 049, 060, 061, 065, 082, 085, 144, 182, 185, 198, 206,
                      242, 255, 274, 280

Signed-off-by: Denis Plotnikov <dplotnikov@virtuozzo.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
QAPI part:
Acked-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20200507082521.29210-2-dplotnikov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-05-13 14:20:31 +02:00
Eric Blake
47e0b38a13 block: Drop unused .bdrv_has_zero_init_truncate
Now that there are no clients of bdrv_has_zero_init_truncate, none of
the drivers need to worry about providing it.

What's more, this eliminates a source of some confusion: a literal
reading of the documentation as written in ceaca56f and implemented in
commit 1dcaf527 claims that a driver which returns 0 for
bdrv_has_zero_init_truncate() must not return 1 for
bdrv_has_zero_init(); this condition was violated for parallels, qcow,
and sometimes for vdi, although in practice it did not matter since
those drivers also lacked .bdrv_co_truncate.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200428202905.770727-10-eblake@redhat.com>
Acked-by: Richard W.M. Jones <rjones@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-05-08 13:26:35 +02:00
Kevin Wolf
92b92799dc block: Add flags to BlockDriver.bdrv_co_truncate()
This adds a new BdrvRequestFlags parameter to the .bdrv_co_truncate()
driver callbacks, and a supported_truncate_flags field in
BlockDriverState that allows drivers to advertise support for request
flags in the context of truncate.

For now, we always pass 0 and no drivers declare support for any flag.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200424125448.63318-2-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-04-30 17:51:07 +02:00
Maxim Levitsky
5a5e7f8cd8 block: trickle down the fallback image creation function use to the block drivers
Instead of checking the .bdrv_co_create_opts to see if we need the
fallback, just implement the .bdrv_co_create_opts in the drivers that
need it.

This way we don't break various places that need to know if the
underlying protocol/format really supports image creation, and this way
we still allow some drivers to not support image creation.

Fixes: fd17146cd9
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1816007

Note that technically this driver reverts the image creation fallback
for the vxhs driver since I don't have a means to test it, and IMHO it
is better to leave it not supported as it was prior to generic image
creation patches.

Also drop iscsi_create_opts which was left accidentally.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20200326011218.29230-3-mlevitsk@redhat.com>
Reviewed-by: Denis V. Lunev <den@openvz.org>
[mreitz: Fixed alignment, and moved bdrv_co_create_opts_simple() and
         bdrv_create_opts_simple from block.h into block_int.h]
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-03-26 14:44:33 +01:00
Maxim Levitsky
b92902dfea block: pass BlockDriver reference to the .bdrv_co_create
This will allow the reuse of a single generic .bdrv_co_create
implementation for several drivers.
No functional changes.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20200326011218.29230-2-mlevitsk@redhat.com>
Reviewed-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-03-26 14:44:33 +01:00
Daniel Henrique Barboza
9bffae14df block: introducing 'bdrv_co_delete_file' interface
Adding to Block Drivers the capability of being able to clean up
its created files can be useful in certain situations. For the
LUKS driver, for instance, a failure in one of its authentication
steps can leave files in the host that weren't there before.

This patch adds the 'bdrv_co_delete_file' interface to block
drivers and add it to the 'file' driver in file-posix.c. The
implementation is given by 'raw_co_delete_file'.

Suggested-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20200130213907.2830642-2-danielhb413@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-03-11 15:54:38 +01:00
Kevin Wolf
c7a0f2be8f block: Make bdrv_get_cumulative_perm() public
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20200310113831.27293-2-kwolf@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-03-11 15:54:38 +01:00
Maxim Levitsky
89802d5ae7 monitor/hmp: Move hmp_drive_add_node to block-hmp-cmds.c
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20200308092440.23564-12-mlevitsk@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-03-09 18:20:22 +00:00
Peter Krempa
17e1e2be5f block: Introduce 'bdrv_reopen_commit_post' step
Add another step in the reopen process where driver can execute code
after permission changes are comitted.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Message-Id: <adc02cf591c3cb34e98e33518eb1c540a0f27db1.1582893284.git.pkrempa@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-03-06 17:15:37 +01:00
Max Reitz
6b4907cf42 block: Remove bdrv_recurse_is_first_non_filter()
It no longer has any users.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20200218103454.296704-11-mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-02-18 11:55:40 +01:00
Max Reitz
5d69b5ab85 block: Add bdrv_recurse_can_replace()
After a couple of follow-up patches, this function will replace
bdrv_recurse_is_first_non_filter() in check_to_replace_node().

bdrv_recurse_is_first_non_filter() is both not sufficiently specific for
check_to_replace_node() (it allows cases that should not be allowed,
like replacing child nodes of quorum with dissenting data that have more
parents than just quorum), and it is too restrictive (it is perfectly
fine to replace filters).

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20200218103454.296704-7-mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-02-18 11:55:39 +01:00
Paolo Bonzini
18fbd0dec7 block/io: wait for serialising requests when a request becomes serialising
Marking without waiting would not result in actual serialising behavior.
Thus, make a call bdrv_mark_request_serialising sufficient for
serialisation to happen.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1578495356-46219-3-git-send-email-pbonzini@redhat.com
Message-Id: <1578495356-46219-3-git-send-email-pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-01-30 20:59:41 +00:00
Max Reitz
c28107e9e5 block: Add bdrv_co_get_self_request()
Cc: qemu-stable@nongnu.org
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20191101152510.11719-3-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-11-04 09:32:51 +01:00
Max Reitz
304d9d7f03 block: Make wait/mark serialising requests public
Make both bdrv_mark_request_serialising() and
bdrv_wait_serialising_requests() public so they can be used from block
drivers.

Cc: qemu-stable@nongnu.org
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20191101152510.11719-2-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-11-04 09:29:15 +01:00
Max Reitz
c80d8b06cf block: Add @exact parameter to bdrv_co_truncate()
We have two drivers (iscsi and file-posix) that (in some cases) return
success from their .bdrv_co_truncate() implementation if the block
device is larger than the requested offset, but cannot be shrunk.  Some
callers do not want that behavior, so this patch adds a new parameter
that they can use to turn off that behavior.

This patch just adds the parameter and lets the block/io.c and
block/block-backend.c functions pass it around.  All other callers
always pass false and none of the implementations evaluate it, so that
this patch does not change existing behavior.  Future patches take care
of that.

Suggested-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190918095144.955-5-mreitz@redhat.com
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-10-28 12:00:07 +01:00
Vladimir Sementsov-Ogievskiy
4dd09f6223 qcow2-bitmap: move bitmap reopen-rw code to qcow2_reopen_commit
The only reason I can imagine for this strange code at the very-end of
bdrv_reopen_commit is the fact that bs->read_only updated after
calling drv->bdrv_reopen_commit in bdrv_reopen_commit. And in the same
time, prior to previous commit, qcow2_reopen_bitmaps_rw did a wrong
check for being writable, when actually it only need writable file
child not self.

So, as it's fixed, let's move things to correct place.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Acked-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190927122355.7344-10-vsementsov@virtuozzo.com
Signed-off-by: John Snow <jsnow@redhat.com>
2019-10-17 17:53:28 -04:00
Vladimir Sementsov-Ogievskiy
d2c3080e41 block/qcow2: proper locking on bitmap add/remove paths
qmp_block_dirty_bitmap_add and do_block_dirty_bitmap_remove do acquire
aio context since 0a6c86d024. But this is not enough: we also must
lock qcow2 mutex when access in-image metadata. Especially it concerns
freeing qcow2 clusters.

To achieve this, move qcow2_can_store_new_dirty_bitmap and
qcow2_remove_persistent_dirty_bitmap to coroutine context.

Since we work in coroutines in correct aio context, we don't need
context acquiring in blockdev.c anymore, drop it.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 20190920082543.23444-4-vsementsov@virtuozzo.com
Signed-off-by: John Snow <jsnow@redhat.com>
2019-10-17 17:02:32 -04:00
Vladimir Sementsov-Ogievskiy
b56a1e3175 block/dirty-bitmap: return int from bdrv_remove_persistent_dirty_bitmap
It's more comfortable to not deal with local_err.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 20190920082543.23444-3-vsementsov@virtuozzo.com
Signed-off-by: John Snow <jsnow@redhat.com>
2019-10-17 17:02:32 -04:00
Vladimir Sementsov-Ogievskiy
00e30f05de block/backup: use backup-top instead of write notifiers
Drop write notifiers and use filter node instead.

= Changes =

1. Add filter-node-name argument for backup qmp api. We have to do it
in this commit, as 257 needs to be fixed.

2. There are no more write notifiers here, so is_write_notifier
parameter is dropped from block-copy paths.

3. To sync with in-flight requests at job finish we now have drained
removing of the filter, we don't need rw-lock.

4. Block-copy is now using BdrvChildren instead of BlockBackends

5. As backup-top owns these children, we also move block-copy state
into backup-top's ownership.

= Iotest changes =

56: op-blocker doesn't shoot now, as we set it on source, but then
check on filter, when trying to start second backup.
To keep the test we instead can catch another collision: both jobs will
get 'drive0' job-id, as job-id parameter is unspecified. To prevent
interleaving with file-posix locks (as they are dependent on config)
let's use another target for second backup.

Also, it's obvious now that we'd like to drop this op-blocker at all
and add a test-case for two backups from one node (to different
destinations) actually works. But not in these series.

141: Output changed: prepatch, "Node is in use" comes from bdrv_has_blk
check inside qmp_blockdev_del. But we've dropped block-copy blk
objects, so no more blk objects on source bs (job blk is on backup-top
filter bs). New message is from op-blocker, which is the next check in
qmp_blockdev_add.

257: The test wants to emulate guest write during backup. They should
go to filter node, not to original source node, of course. Therefore we
need to specify filter node name and use it.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20191001131409.14202-6-vsementsov@virtuozzo.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-10-10 10:56:18 +02:00
Anton Nefedov
d924559953 qapi: query-blockstat: add driver specific file-posix stats
A block driver can provide a callback to report driver-specific
statistics.

file-posix driver now reports discard statistics

Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
Message-id: 20190923121737.83281-10-anton.nefedov@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-10-10 10:56:18 +02:00