For now, this is a mechanical addition; all callers pass false. But
the next patch will use it to improve 'qemu-img rebase -u' when
selecting a backing file with no format.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Message-Id: <20200706203954.341758-10-eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Especially when O_DIRECT is used with image files so that the page cache
indirection can't cause a merge of allocating requests, the file will
fragment on the file system layer, with a potentially very small
fragment size (this depends on the requests the guest sent).
On Linux, fragmentation can be reduced by setting an extent size hint
when creating the file (at least on XFS, it can't be set any more after
the first extent has been allocated), basically giving raw files a
"cluster size" for allocation.
This adds a create option to set the extent size hint, and changes the
default from not setting a hint to setting it to 1 MB. The main reason
why qcow2 defaults to smaller cluster sizes is that COW becomes more
expensive, which is not an issue with raw files, so we can choose a
larger size. The tradeoff here is only potentially wasted disk space.
For qcow2 (or other image formats) over file-posix, the advantage should
even be greater because they grow sequentially without leaving holes, so
there won't be wasted space. Setting even larger extent size hints for
such images may make sense. This can be done with the new option, but
let's keep the default conservative for now.
The effect is very visible with a test that intentionally creates a
badly fragmented file with qemu-img bench (the time difference while
creating the file is already remarkable) and then looks at the number of
extents and the time a simple "qemu-img map" takes.
Without an extent size hint:
$ ./qemu-img create -f raw -o extent_size_hint=0 ~/tmp/test.raw 10G
Formatting '/home/kwolf/tmp/test.raw', fmt=raw size=10737418240 extent_size_hint=0
$ ./qemu-img bench -f raw -t none -n -w ~/tmp/test.raw -c 1000000 -S 8192 -o 0
Sending 1000000 write requests, 4096 bytes each, 64 in parallel (starting at offset 0, step size 8192)
Run completed in 25.848 seconds.
$ ./qemu-img bench -f raw -t none -n -w ~/tmp/test.raw -c 1000000 -S 8192 -o 4096
Sending 1000000 write requests, 4096 bytes each, 64 in parallel (starting at offset 4096, step size 8192)
Run completed in 19.616 seconds.
$ filefrag ~/tmp/test.raw
/home/kwolf/tmp/test.raw: 2000000 extents found
$ time ./qemu-img map ~/tmp/test.raw
Offset Length Mapped to File
0 0x1e8480000 0 /home/kwolf/tmp/test.raw
real 0m1,279s
user 0m0,043s
sys 0m1,226s
With the new default extent size hint of 1 MB:
$ ./qemu-img create -f raw -o extent_size_hint=1M ~/tmp/test.raw 10G
Formatting '/home/kwolf/tmp/test.raw', fmt=raw size=10737418240 extent_size_hint=1048576
$ ./qemu-img bench -f raw -t none -n -w ~/tmp/test.raw -c 1000000 -S 8192 -o 0
Sending 1000000 write requests, 4096 bytes each, 64 in parallel (starting at offset 0, step size 8192)
Run completed in 11.833 seconds.
$ ./qemu-img bench -f raw -t none -n -w ~/tmp/test.raw -c 1000000 -S 8192 -o 4096
Sending 1000000 write requests, 4096 bytes each, 64 in parallel (starting at offset 4096, step size 8192)
Run completed in 10.155 seconds.
$ filefrag ~/tmp/test.raw
/home/kwolf/tmp/test.raw: 178 extents found
$ time ./qemu-img map ~/tmp/test.raw
Offset Length Mapped to File
0 0x1e8480000 0 /home/kwolf/tmp/test.raw
real 0m0,061s
user 0m0,040s
sys 0m0,014s
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20200707142329.48303-1-kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
If we want to check error after errp-function call, we need to
introduce local_err and then propagate it to errp. Instead, use
the ERRP_GUARD() macro, benefits are:
1. No need of explicit error_propagate call
2. No need of explicit local_err variable: use errp directly
3. ERRP_GUARD() leaves errp as is if it's not NULL or
&error_fatal, this means that we don't break error_abort
(we'll abort on error_set, not on error_propagate)
If we want to add some info to errp (by error_prepend() or
error_append_hint()), we must use the ERRP_GUARD() macro.
Otherwise, this info will not be added when errp == &error_fatal
(the program will exit prior to the error_append_hint() or
error_prepend() call). Fix several such cases, e.g. in nbd_read().
This commit is generated by command
sed -n '/^Network Block Device (NBD)$/,/^$/{s/^F: //p}' \
MAINTAINERS | \
xargs git ls-files | grep '\.[hc]$' | \
xargs spatch \
--sp-file scripts/coccinelle/errp-guard.cocci \
--macro-file scripts/cocci-macro-file.h \
--in-place --no-show-diff --max-width 80
Reported-by: Kevin Wolf <kwolf@redhat.com>
Reported-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
[Commit message tweaked]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20200707165037.1026246-8-armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
[ERRP_AUTO_PROPAGATE() renamed to ERRP_GUARD(), and
auto-propagated-errp.cocci to errp-guard.cocci. Commit message
tweaked again.]
Currently this field only set by qed and qcow2. But in fact, all
backing-supporting formats (parallels, qcow, qcow2, qed, vmdk) share
these semantics: on unallocated blocks, if there is no backing file they
just memset the buffer with zeroes.
So, document this behavior for .supports_backing and drop
.unallocated_blocks_are_zero
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200528094405.145708-10-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
The function has only one user: bdrv_co_block_status(). Inline it to
simplify reviewing of the following patches, which will finally drop
unallocated_blocks_are_zero field too.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200528094405.145708-3-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
blockdev-amend will be used similiar to blockdev-create
to allow on the fly changes of the structure of the format based block devices.
Current plan is to first support encryption keyslot management for luks
based formats (raw and embedded in qcow2)
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20200608094030.670121-12-mlevitsk@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Some options are only useful for creation
(or hard to be amended, like cluster size for qcow2), while some other
options are only useful for amend, like upcoming keyslot management
options for luks
Since currently only qcow2 supports amend, move all its options
to a common macro and then include it in each action option list.
In future it might be useful to remove some options which are
not supported anyway from amend list, which currently
cause an error message if amended.
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200608094030.670121-5-mlevitsk@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
'force' option will be used for some unsafe amend operations.
This includes things like erasing last keyslot in luks based formats
(which destroys the data, unless the master key is backed up
by external means), but that _might_ be desired result.
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200608094030.670121-4-mlevitsk@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
I'm not aware of any immediate bugs in qemu where a second runtime
evaluation of the arguments to MIN() or MAX() causes a problem, but
proactively preventing such abuse is easier than falling prey to an
unintended case down the road. At any rate, here's the conversation
that sparked the current patch:
https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg05718.html
Update the MIN/MAX macros to only evaluate their argument once at
runtime; this uses typeof(1 ? (a) : (b)) to ensure that we are
promoting the temporaries to the same type as the final comparison (we
have to trigger type promotion, as typeof(bitfield) won't compile; and
we can't use typeof((a) + (b)) or even typeof((a) + 0), as some of our
uses of MAX are on void* pointers where such addition is undefined).
However, we are unable to work around gcc refusing to compile ({}) in
a constant context (such as the array length of a static variable),
even when only used in the dead branch of a __builtin_choose_expr(),
so we have to provide a second macro pair MIN_CONST and MAX_CONST for
use when both arguments are known to be compile-time constants and
where the result must also be usable as a constant; this second form
evaluates arguments multiple times but that doesn't matter for
constants. By using a void expression as the expansion if a
non-constant is presented to this second form, we can enlist the
compiler to ensure the double evaluation is not attempted on
non-constants.
Alas, as both macros now rely on compiler intrinsics, they are no
longer usable in preprocessor #if conditions; those will just have to
be open-coded or the logic rewritten into #define or runtime 'if'
conditions (but where the compiler dead-code-elimination will probably
still apply).
I tested that both gcc 10.1.1 and clang 10.0.0 produce errors for all
forms of macro mis-use. As the errors can sometimes be cryptic, I'm
demonstrating the gcc output:
Use of MIN when MIN_CONST is needed:
In file included from /home/eblake/qemu/qemu-img.c:25:
/home/eblake/qemu/include/qemu/osdep.h:249:5: error: braced-group within expression allowed only inside a function
249 | ({ \
| ^
/home/eblake/qemu/qemu-img.c:92:12: note: in expansion of macro ‘MIN’
92 | char array[MIN(1, 2)] = "";
| ^~~
Use of MIN_CONST when MIN is needed:
/home/eblake/qemu/qemu-img.c: In function ‘is_allocated_sectors’:
/home/eblake/qemu/qemu-img.c:1225:15: error: void value not ignored as it ought to be
1225 | i = MIN_CONST(i, n);
| ^
Use of MIN in the preprocessor:
In file included from /home/eblake/qemu/accel/tcg/translate-all.c:20:
/home/eblake/qemu/accel/tcg/translate-all.c: In function ‘page_check_range’:
/home/eblake/qemu/include/qemu/osdep.h:249:6: error: token "{" is not valid in preprocessor expressions
249 | ({ \
| ^
Fix the resulting callsites that used #if or computed a compile-time
constant min or max to use the new macros. cpu-defs.h is interesting,
as CPU_TLB_DYN_MAX_BITS is sometimes used as a constant and sometimes
dynamic.
It may be worth improving glib's MIN/MAX definitions to be saner, but
that is a task for another day.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20200625162602.700741-1-eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Message-Id: <20200609190333.59390-6-its@irrelevant.dk>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
To be used for bitmap migration in further commit.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200521220648.3255-3-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
The next patch will split blockdev.c, which will require accessing
some previously-static functions from more than one .c file. But part
of promoting a function to public is picking a naming scheme that does
not reek of exposing too many internals (two of the three functions
were named starting with 'do_'). To make future code motion easier,
perform the function rename and non-static promotion into its own
patch.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200513011648.166876-5-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Upcoming patches will enhance bitmap support in qemu-img, but in doing
so, it turns out to be nice to suppress output when persistent bitmaps
make no sense (such as on a qcow2 v2 image). Add a hook to make this
easier to query.
This patch adds a new callback .bdrv_supports_persistent_dirty_bitmap,
rather than trying to shoehorn the answer in via existing callbacks.
In particular, while it might have been possible to overload
.bdrv_co_can_store_new_dirty_bitmap to special-case a NULL input to
answer whether any persistent bitmaps are supported, that is at odds
with whether a particular bitmap can be stored (for example, even on
an image that supports persistent bitmaps but has currently filled up
the maximum number of bitmaps, attempts to store another one should
fail); and the new functionality doesn't require coroutine safety.
Similarly, we could have added one more piece of information to
.bdrv_get_info, but then again, most callers to that function tend to
already discard extraneous information, and making it a catch-all
rather than a series of dedicated scalar queries hasn't really
simplified life.
In the future, when we improve the ability to look up bitmaps through
a filter, we will probably also want to teach the block layer to
automatically let filters pass this request on through.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200513011648.166876-4-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
The glib event loop does not call fdmon_io_uring_wait() so fd handlers
waiting to be submitted build up in the list. There is no benefit is
using io_uring when the glib GSource is being used, so disable it
instead of implementing a more complex fix.
This fixes a memory leak where AioHandlers would build up and increasing
amounts of CPU time were spent iterating them in aio_pending(). The
symptom is that guests become slow when QEMU is built with io_uring
support.
Buglink: https://bugs.launchpad.net/qemu/+bug/1877716
Fixes: 73fd282e7b ("aio-posix: add io_uring fd monitoring implementation")
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-by: Oleksandr Natalenko <oleksandr@redhat.com>
Message-id: 20200511183630.279750-3-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Implementations should decide the necessary permissions based on @role.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200513110544.176672-35-mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200513110544.176672-33-mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200513110544.176672-32-mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200513110544.176672-31-mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200513110544.176672-25-mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200513110544.176672-23-mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This callback can be used by BDSs that use child_of_bds with the
appropriate BdrvChildRole for their children.
Also, make bdrv_format_default_perms() use it for child_of_bds children
(just a temporary solution until we can drop bdrv_format_default_perms()
altogether).
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200513110544.176672-20-mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Any current user of child_file, child_format, and child_backing can and
should use this generic BdrvChildClass instead, as it can handle all of
these cases. However, to be able to do so, the users must pass the
appropriate BdrvChildRole when the child is created/attached. (The
following commits will take care of that.)
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200513110544.176672-15-mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
We plan to unify the generic .inherit_options() functions. The
resulting common function will need to decide whether to force-enable
format probing, force-disable it, or leave it as-is. To make this
decision, it will need to know whether the parent node is a format node
or not (because we never want format probing if the parent is a format
node already (except for the backing chain)).
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200513110544.176672-9-mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
For now, all callers (effectively) pass 0 and no callee evaluates thie
value. Later patches will change both.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200513110544.176672-8-mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
For now, all callers pass 0 and no callee evaluates this value. Later
patches will change both.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200513110544.176672-7-mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
For now, it is always set to 0. Later patches in this series will
ensure that all callers pass an appropriate combination of flags.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200513110544.176672-6-mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This mask will supplement BdrvChildClass when it comes to what role (or
combination of roles) a child takes for its parent. It consists of
BdrvChildRoleBits values (which is an enum).
Because empty enums are not allowed, let us just start with it filled.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200513110544.176672-5-mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This structure nearly only contains parent callbacks for child state
changes. It cannot really reflect a child's role, because different
roles may overlap (as we will see when real roles are introduced), and
because parents can have custom callbacks even when the child fulfills a
standard role.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-Id: <20200513110544.176672-4-mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
We want to unify child_format and child_file at some point. One of the
important things that set format drivers apart from other drivers is
that they do not expect other format nodes under them (except in the
backing chain), i.e. we must not probe formats inside of formats. That
means we need something on which to distinguish format drivers from
others, and hence this flag.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-Id: <20200513110544.176672-3-mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Right now, all users of bdrv_make_empty() call the BlockDriver method
directly. That is not only bad style, it is also wrong, unless the
caller has a BdrvChild with a WRITE or WRITE_UNCHANGED permission.
(WRITE_UNCHANGED suffices, because callers generally use this function
to clear a node with a backing file after a commit operation.)
Introduce bdrv_make_empty() that verifies that it does.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200429141126.85159-2-mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The patch adds some preparation parts for incompatible compression type
feature to qcow2 allowing the use different compression methods for
image clusters (de)compressing.
It is implied that the compression type is set on the image creation and
can be changed only later by image conversion, thus compression type
defines the only compression algorithm used for the image, and thus,
for all image clusters.
The goal of the feature is to add support of other compression methods
to qcow2. For example, ZSTD which is more effective on compression than ZLIB.
The default compression is ZLIB. Images created with ZLIB compression type
are backward compatible with older qemu versions.
Adding of the compression type breaks a number of tests because now the
compression type is reported on image creation and there are some changes
in the qcow2 header in size and offsets.
The tests are fixed in the following ways:
* filter out compression_type for many tests
* fix header size, feature table size and backing file offset
affected tests: 031, 036, 061, 080
header_size +=8: 1 byte compression type
7 bytes padding
feature_table += 48: incompatible feature compression type
backing_file_offset += 56 (8 + 48 -> header_change + feature_table_change)
* add "compression type" for test output matching when it isn't filtered
affected tests: 049, 060, 061, 065, 082, 085, 144, 182, 185, 198, 206,
242, 255, 274, 280
Signed-off-by: Denis Plotnikov <dplotnikov@virtuozzo.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
QAPI part:
Acked-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20200507082521.29210-2-dplotnikov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Now that there are no clients of bdrv_has_zero_init_truncate, none of
the drivers need to worry about providing it.
What's more, this eliminates a source of some confusion: a literal
reading of the documentation as written in ceaca56f and implemented in
commit 1dcaf527 claims that a driver which returns 0 for
bdrv_has_zero_init_truncate() must not return 1 for
bdrv_has_zero_init(); this condition was violated for parallels, qcow,
and sometimes for vdi, although in practice it did not matter since
those drivers also lacked .bdrv_co_truncate.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200428202905.770727-10-eblake@redhat.com>
Acked-by: Richard W.M. Jones <rjones@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This patch introduces support for PMR that has been defined as part of NVMe 1.4
spec. User can now specify a pmrdev option that should point to HostMemoryBackend.
pmrdev memory region will subsequently be exposed as PCI BAR 2 in emulated NVMe
device. Guest OS can perform mmio read and writes to the PMR region that will stay
persistent across system reboot.
Signed-off-by: Andrzej Jakowski <andrzej.jakowski@linux.intel.com>
Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20200330164656.9348-1-andrzej.jakowski@linux.intel.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Now that block drivers can support flags for .bdrv_co_truncate, expose
the parameter in the node level interfaces bdrv_co_truncate() and
bdrv_truncate().
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200424125448.63318-3-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This adds a new BdrvRequestFlags parameter to the .bdrv_co_truncate()
driver callbacks, and a supported_truncate_flags field in
BlockDriverState that allows drivers to advertise support for request
flags in the context of truncate.
For now, we always pass 0 and no drivers declare support for any flag.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200424125448.63318-2-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Any thread that is not a iothread returns NULL for qemu_get_current_aio_context().
As a result, it would also return true for
in_aio_context_home_thread(qemu_get_aio_context()), causing
AIO_WAIT_WHILE to invoke aio_poll() directly. This is incorrect
if the BQL is not held, because aio_poll() does not expect to
run concurrently from multiple threads, and it can actually
happen when savevm writes to the vmstate file from the
migration thread.
Therefore, restrict in_aio_context_home_thread to return true
for the main AioContext only if the BQL is held.
The function is moved to aio-wait.h because it is mostly used
there and to avoid a circular reference between main-loop.h
and block/aio.h.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20200407140746.8041-5-pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Instead of checking the .bdrv_co_create_opts to see if we need the
fallback, just implement the .bdrv_co_create_opts in the drivers that
need it.
This way we don't break various places that need to know if the
underlying protocol/format really supports image creation, and this way
we still allow some drivers to not support image creation.
Fixes: fd17146cd9
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1816007
Note that technically this driver reverts the image creation fallback
for the vxhs driver since I don't have a means to test it, and IMHO it
is better to leave it not supported as it was prior to generic image
creation patches.
Also drop iscsi_create_opts which was left accidentally.
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20200326011218.29230-3-mlevitsk@redhat.com>
Reviewed-by: Denis V. Lunev <den@openvz.org>
[mreitz: Fixed alignment, and moved bdrv_co_create_opts_simple() and
bdrv_create_opts_simple from block.h into block_int.h]
Signed-off-by: Max Reitz <mreitz@redhat.com>
This will allow the reuse of a single generic .bdrv_co_create
implementation for several drivers.
No functional changes.
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20200326011218.29230-2-mlevitsk@redhat.com>
Reviewed-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Firstly, _next_dirty_area is for scenarios when we may contiguously
search for next dirty area inside some limited region, so it is more
comfortable to specify "end" which should not be recalculated on each
iteration.
Secondly, let's add a possibility to limit resulting area size, not
limiting searching area. This will be used in NBD code in further
commit. (Note that now bdrv_dirty_bitmap_next_dirty_area is unused)
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 20200205112041.6003-8-vsementsov@virtuozzo.com
Signed-off-by: John Snow <jsnow@redhat.com>
We have bdrv_dirty_bitmap_next_zero, let's add corresponding
bdrv_dirty_bitmap_next_dirty, which is more comfortable to use than
bitmap iterators in some cases.
For test modify test_hbitmap_next_zero_check_range to check both
next_zero and next_dirty and add some new checks.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 20200205112041.6003-7-vsementsov@virtuozzo.com
Signed-off-by: John Snow <jsnow@redhat.com>
We are going to introduce bdrv_dirty_bitmap_next_dirty so that same
variable may be used to store its return value and to be its parameter,
so it would int64_t.
Similarly, we are going to refactor hbitmap_next_dirty_area to use
hbitmap_next_dirty together with hbitmap_next_zero, therefore we want
hbitmap_next_zero parameter type to be int64_t too.
So, for convenience update all parameters of *_next_zero and
*_next_dirty_area to be int64_t.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 20200205112041.6003-6-vsementsov@virtuozzo.com
Signed-off-by: John Snow <jsnow@redhat.com>
Using the new 'bdrv_co_delete_file' interface, a pure co_routine function
'bdrv_co_delete_file' inside block.c can can be used in a way similar of
the existing bdrv_create_file to to clean up a created file.
We're creating a pure co_routine because the only caller of
'bdrv_co_delete_file' will be already in co_routine context, thus there
is no need to add all the machinery to check for qemu_in_coroutine() and
create a separated co_routine to do the job.
Suggested-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20200130213907.2830642-3-danielhb413@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Adding to Block Drivers the capability of being able to clean up
its created files can be useful in certain situations. For the
LUKS driver, for instance, a failure in one of its authentication
steps can leave files in the host that weren't there before.
This patch adds the 'bdrv_co_delete_file' interface to block
drivers and add it to the 'file' driver in file-posix.c. The
implementation is given by 'raw_co_delete_file'.
Suggested-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20200130213907.2830642-2-danielhb413@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20200310113831.27293-2-kwolf@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Hide structure definitions and add explicit API instead, to keep an
eye on the scope of the shared fields.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200311103004.7649-10-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
offset/bytes pair is more usual naming in block layer, let's use it.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200311103004.7649-8-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>