Commit Graph

539 Commits

Author SHA1 Message Date
Kevin Wolf
4720cbeea1 block: Fix hangs in synchronous APIs with iothreads
In the block layer, synchronous APIs are often implemented by creating a
coroutine that calls the asynchronous coroutine-based implementation and
then waiting for completion with BDRV_POLL_WHILE().

For this to work with iothreads (more specifically, when the synchronous
API is called in a thread that is not the home thread of the block
device, so that the coroutine will run in a different thread), we must
make sure to call aio_wait_kick() at the end of the operation. Many
places are missing this, so that BDRV_POLL_WHILE() keeps hanging even if
the condition has long become false.

Note that bdrv_dec_in_flight() involves an aio_wait_kick() call. This
corresponds to the BDRV_POLL_WHILE() in the drain functions, but it is
generally not enough for most other operations because they haven't set
the return value in the coroutine entry stub yet. To avoid race
conditions there, we need to kick after setting the return value.

The race window is small enough that the problem doesn't usually surface
in the common path. However, it does surface and causes easily
reproducible hangs if the operation can return early before even calling
bdrv_inc/dec_in_flight, which many of them do (trivial error or no-op
success paths).

The bug in bdrv_truncate(), bdrv_check() and bdrv_invalidate_cache() is
slightly different: These functions even neglected to schedule the
coroutine in the home thread of the node. This avoids the hang, but is
obviously wrong, too. Fix those to schedule the coroutine in the right
AioContext in addition to adding aio_wait_kick() calls.

Cc: qemu-stable@nongnu.org
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2019-02-01 13:46:44 +01:00
Vladimir Sementsov-Ogievskiy
e23c9d7a1c qcow2: do decompression in threads
Do decompression in threads, like it is already done for compression.
This improves asynchronous compressed reads performance.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-12-14 11:52:41 +01:00
Vladimir Sementsov-Ogievskiy
c3c10f7295 qcow2: aio support for compressed cluster read
Allocate buffers locally and release qcow2 lock. Than, reads inside
qcow2_co_preadv_compressed may be done in parallel, however all
decompression is still done synchronously. Let's improve it in the
following commit.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-12-14 11:52:41 +01:00
Vladimir Sementsov-Ogievskiy
c068a1cd52 qcow2: use byte-based read in qcow2_decompress_cluster
We are gradually moving away from sector-based interfaces, towards
byte-based.  Get rid of it here too.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-12-14 11:52:40 +01:00
Vladimir Sementsov-Ogievskiy
341926ab83 qcow2: refactor decompress_buffer
- make it look more like a pair of qcow2_compress - rename the function
  and its parameters
- drop extra out_len variable, check filling of output buffer by strm
  structure itself
- fix code style
- add some documentation

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-12-14 11:52:40 +01:00
Vladimir Sementsov-Ogievskiy
f4b3e2a960 qcow2: move decompression from qcow2-cluster.c to qcow2.c
Compression is done in threads in qcow2.c. We want to do decompression
in the same way, so, firstly, move it to the same file.

The only change is braces around if-body in decompress_buffer, to
satisfy checkpatch.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-12-14 11:52:40 +01:00
Vladimir Sementsov-Ogievskiy
6994fd78b9 qcow2: make more generic interface for qcow2_compress
Give explicit size both for source and destination buffers, to make it
similar with decompression path and than cleanly reuse parameter
structure for decompression threads.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-12-14 11:52:40 +01:00
Vladimir Sementsov-Ogievskiy
19a4448853 qcow2: use Z_OK instead of 0 for deflateInit2 return code check
Use appropriate macro, corresponding to deflateInit2 spec.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-12-14 11:52:40 +01:00
Vladimir Sementsov-Ogievskiy
c972fa123c crypto: support multiple threads accessing one QCryptoBlock
The two thing that should be handled are cipher and ivgen. For ivgen
the solution is just mutex, as iv calculations should not be long in
comparison with encryption/decryption. And for cipher let's just keep
per-thread ciphers.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2018-12-12 11:16:49 +00:00
Alberto Garcia
6f8f015c0c qcow2: Get the request alignment for encrypted images from QCryptoBlock
This doesn't have any practical effect at the moment because the
values of BDRV_SECTOR_SIZE, QCRYPTO_BLOCK_LUKS_SECTOR_SIZE and
QCRYPTO_BLOCK_QCOW_SECTOR_SIZE are all the same (512 bytes), but
future encryption methods could have different requirements.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-11-05 15:09:54 +01:00
Peter Maydell
3b698f52f9 block/qcow2: Don't take address of fields in packed structs
Taking the address of a field in a packed struct is a bad idea, because
it might not be actually aligned enough for that pointer type (and
thus cause a crash on dereference on some host architectures). Newer
versions of clang warn about this. Avoid the bug by not using the
"modify in place" byte swapping functions.

There are a few places where the in-place swap function is
used on something other than a packed struct field; we convert
those anyway, for consistency.

This patch was produced with the following spatch script
(and hand-editing to fold a few resulting overlength lines):

@@
expression E;
@@
-be16_to_cpus(&E);
+E = be16_to_cpu(E);
@@
expression E;
@@
-be32_to_cpus(&E);
+E = be32_to_cpu(E);
@@
expression E;
@@
-be64_to_cpus(&E);
+E = be64_to_cpu(E);
@@
expression E;
@@
-cpu_to_be16s(&E);
+E = cpu_to_be16(E);
@@
expression E;
@@
-cpu_to_be32s(&E);
+E = cpu_to_be32(E);
@@
expression E;
@@
-cpu_to_be64s(&E);
+E = cpu_to_be64(E);

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Tested-by: John Snow <jsnow@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-11-05 15:09:54 +01:00
Vladimir Sementsov-Ogievskiy
9c98f145df dirty-bitmaps: clean-up bitmaps loading and migration logic
This patch aims to bring the following behavior:

1. We don't load bitmaps, when started in inactive mode. It's the case
of incoming migration. In this case we wait for bitmaps migration
through migration channel (if 'dirty-bitmaps' capability is enabled) or
for invalidation (to load bitmaps from the image).

2. We don't remove persistent bitmaps on inactivation. Instead, we only
remove bitmaps after storing. This is the only way to restore bitmaps,
if we decided to resume source after [failed] migration with
'dirty-bitmaps' capability enabled (which means, that bitmaps were not
stored).

3. We load bitmaps on open and any invalidation, it's ok for all cases:
  - normal open
  - migration target invalidation with dirty-bitmaps capability
    (bitmaps are migrating through migration channel, the are not
     stored, so they should have IN_USE flag set and will be skipped
     when loading. However, it would fail if bitmaps are read-only[1])
  - migration target invalidation without dirty-bitmaps capability
    (normal load of the bitmaps, if migrated with shared storage)
  - source invalidation with dirty-bitmaps capability
    (skip because IN_USE)
  - source invalidation without dirty-bitmaps capability
    (bitmaps were dropped, reload them)

[1]: to accurately handle this, migration of read-only bitmaps is
     explicitly forbidden in this patch.

New mechanism for not storing bitmaps when migrate with dirty-bitmaps
capability is introduced: migration filed in BdrvDirtyBitmap.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: John Snow <jsnow@redhat.com>
2018-10-29 16:23:17 -04:00
Vladimir Sementsov-Ogievskiy
2ea427efff bloc/qcow2: drop dirty_bitmaps_loaded state variable
This variable doesn't work as it should, because it is actually cleared
in qcow2_co_invalidate_cache() by memset(). Drop it, as the following
patch will introduce new behavior.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Signed-off-by: John Snow <jsnow@redhat.com>
2018-10-29 16:23:15 -04:00
Vladimir Sementsov-Ogievskiy
132adb6820 block/qcow2: improve error message in qcow2_inactivate
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
[Maintainer edit -- touched up error message. --js]
Reviewed-by: John Snow <jsnow@redhat.com>
Signed-off-by: John Snow <jsnow@redhat.com>
2018-10-29 16:23:15 -04:00
Markus Armbruster
4b5766488f error: Fix use of error_prepend() with &error_fatal, &error_abort
From include/qapi/error.h:

  * Pass an existing error to the caller with the message modified:
  *     error_propagate(errp, err);
  *     error_prepend(errp, "Could not frobnicate '%s': ", name);

Fei Li pointed out that doing error_propagate() first doesn't work
well when @errp is &error_fatal or &error_abort: the error_prepend()
is never reached.

Since I doubt fixing the documentation will stop people from getting
it wrong, introduce error_propagate_prepend(), in the hope that it
lures people away from using its constituents in the wrong order.
Update the instructions in error.h accordingly.

Convert existing error_prepend() next to error_propagate to
error_propagate_prepend().  If any of these get reached with
&error_fatal or &error_abort, the error messages improve.  I didn't
check whether that's the case anywhere.

Cc: Fei Li <fli@suse.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20181017082702.5581-2-armbru@redhat.com>
2018-10-19 14:51:34 +02:00
Leonid Bloch
bd016b912c qcow2: Explicit number replaced by a constant
Signed-off-by: Leonid Bloch <lbloch@janustech.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-10-01 12:51:12 +02:00
Leonid Bloch
e957b50b8d qcow2: Set the default cache-clean-interval to 10 minutes
The default cache-clean-interval is set to 10 minutes, in order to lower
the overhead of the qcow2 caches (before the default was 0, i.e.
disabled).

* For non-Linux platforms the default is kept at 0, because
  cache-clean-interval is not supported there yet.

Signed-off-by: Leonid Bloch <lbloch@janustech.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-10-01 12:51:12 +02:00
Leonid Bloch
45b4949c7b qcow2: Resize the cache upon image resizing
The caches are now recalculated upon image resizing. This is done
because the new default behavior of assigning L2 cache relatively to
the image size, implies that the cache will be adapted accordingly
after an image resize.

Signed-off-by: Leonid Bloch <lbloch@janustech.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-10-01 12:51:12 +02:00
Leonid Bloch
b749562d98 qcow2: Assign the L2 cache relatively to the image size
Sufficient L2 cache can noticeably improve the performance when using
large images with frequent I/O.

Previously, unless 'cache-size' was specified and was large enough, the
L2 cache was set to a certain size without taking the virtual image size
into account.

Now, the L2 cache assignment is aware of the virtual size of the image,
and will cover the entire image, unless the cache size needed for that is
larger than a certain maximum. This maximum is set to 1 MB by default
(enough to cover an 8 GB image with the default cluster size) but can
be increased or decreased using the 'l2-cache-size' option. This option
was previously documented as the *maximum* L2 cache size, and this patch
makes it behave as such, instead of as a constant size. Also, the
existing option 'cache-size' can limit the sum of both L2 and refcount
caches, as previously.

Signed-off-by: Leonid Bloch <lbloch@janustech.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-10-01 12:51:12 +02:00
Leonid Bloch
657ada52ab qcow2: Avoid duplication in setting the refcount cache size
The refcount cache size does not need to be set to its minimum value in
read_cache_sizes(), as it is set to at least its minimum value in
qcow2_update_options_prepare().

Signed-off-by: Leonid Bloch <lbloch@janustech.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-10-01 12:51:12 +02:00
Leonid Bloch
b6a95c6d10 qcow2: Make sizes more humanly readable
Signed-off-by: Leonid Bloch <lbloch@janustech.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-10-01 12:51:12 +02:00
Peter Xu
3ab72385b2 qapi: Drop qapi_event_send_FOO()'s Error ** argument
The generated qapi_event_send_FOO() take an Error ** argument.  They
can't actually fail, because all they do with the argument is passing it
to functions that can't fail: the QObject output visitor, and the
@qmp_emit callback, which is either monitor_qapi_event_queue() or
event_test_emit().

Drop the argument, and pass &error_abort to the QObject output visitor
and @qmp_emit instead.

Suggested-by: Eric Blake <eblake@redhat.com>
Suggested-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20180815133747.25032-4-peterx@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
[Commit message rewritten, update to qapi-code-gen.txt corrected]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2018-08-28 18:21:38 +02:00
Leonid Bloch
308999e9d4 qcow2: A grammar fix in conflicting cache sizing error message
Signed-off-by: Leonid Bloch <lbloch@janustech.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-07-30 15:35:37 +02:00
Peter Maydell
7851f1a706 Block layer patches:
- Copy offloading fixes for when the copy increases the image size
 - Temporary revert of the removal of deprecated -drive options
 - Fix request serialisation in the image fleecing scenario
 - Fix copy-on-read crash with unaligned image size
 - Fix another drain crash
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJbRNLQAAoJEH8JsnLIjy/WOaQQALlZk01JohETuwGG6HGl0LdI
 jEEm+N0J+BlGOVjoGU67OKGidUCl5WvBsQTlyYkmlaToGuk/njWxCa/GA6+iNRnt
 MDq7Ovr8uZI3D+0Fuc6xg/6NBiLkukgh0Q9gMWkzn3jaNWzO2WcTr8WXwepvP6sj
 YtPhEQOXTT3sXf/MFY8ig7qRrZ6f7LFOoKu7LMnrD+QWDo8TY3QLZaxP9OUFHH7S
 A6J0LIfuRZlq79a7SgrRkCR2ddtgYyBQ+zD7PD5kf1vLW4+dOhDOutQEsZCMCPgR
 ft99kNhrZcJGN6n2r8/oVcvRkw5c4I1JPgakm/GoW/NllfPMebuPospKaS4wiJnB
 zI4YOtmco4Mfxkw/wK+Ep/bPCpxEF43uDcpPiEjsNADrdLq0eKnPn5ctwSyWlGvn
 ayQWxDoKoYckn/ccjtLxJ2xPws8433cTXrBdIKnJadWxi3iRNzlIKHRuEfXf9zQt
 G+Nq7ruysT9TPf9ifuCHcZnTsi3SLYLsjCj7pAgBkazBYE2cCI3eKN8kxsDJi7qv
 cWzFCpwE28pHRJ6FwtdzBVkNcfTlC/XopR1M66OzYZlLqR/4hbNhyHL3hBV+yfrM
 fC7mPi81ttI6e+JAgC6K8t3Ey242MjSzUYa7pJUNws7RpqUhfhr6EXXbBceJKsVW
 F8qKZoiIEK7wDacUiEiE
 =FXOo
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging

Block layer patches:

- Copy offloading fixes for when the copy increases the image size
- Temporary revert of the removal of deprecated -drive options
- Fix request serialisation in the image fleecing scenario
- Fix copy-on-read crash with unaligned image size
- Fix another drain crash

# gpg: Signature made Tue 10 Jul 2018 16:37:52 BST
# gpg:                using RSA key 7F09B272C88F2FD6
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>"
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74  56FE 7F09 B272 C88F 2FD6

* remotes/kevin/tags/for-upstream: (24 commits)
  block: Use common write req handling in truncate
  block: Fix bdrv_co_truncate overlap check
  block: Use common req handling in copy offloading
  block: Use common req handling for discard
  block: Fix handling of image enlarging write
  block: Extract common write req handling
  block: Use uint64_t for BdrvTrackedRequest byte fields
  block: Use BdrvChild to discard
  block: Add copy offloading trace points
  block: Prefix file driver trace points with "file_"
  Revert "block: Remove deprecated -drive geometry options"
  Revert "block: Remove deprecated -drive option addr"
  Revert "block: Remove deprecated -drive option serial"
  Revert "block: Remove dead deprecation warning code"
  block/blklogwrites: Make sure the log sector size is not too small
  qapi/block-core.json: Add missing documentation for blklogwrites log-append option
  block/backup: fix fleecing scheme: use serialized writes
  block: add BDRV_REQ_SERIALISING flag
  block: split flags in copy_range
  block/io: fix copy_range
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-07-10 17:28:29 +01:00
Vladimir Sementsov-Ogievskiy
67b51fb998 block: split flags in copy_range
Pass read flags and write flags separately. This is needed to handle
coming BDRV_REQ_NO_SERIALISING clearly in following patches.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-07-10 13:04:25 +02:00
Vladimir Sementsov-Ogievskiy
0e4e4318ea qcow2: add overlap check for bitmap directory
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20180705151515.779173-1-vsementsov@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-07-09 19:43:24 +02:00
Fam Zheng
65f1899e9b qcow2: Drop unreachable break
Reported-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-id: 20180702025836.20957-4-famz@redhat.com
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-07-09 19:43:24 +02:00
Fam Zheng
148546c822 qcow2: Drop unused cluster_data
Reported-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-id: 20180702025836.20957-2-famz@redhat.com
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-07-09 19:43:24 +02:00
Vladimir Sementsov-Ogievskiy
ceb029cd6f qcow2: add compress threads
Do data compression in separate threads. This significantly improve
performance for qemu-img convert with -W (allow async writes) and -c
(compressed) options.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-07-05 10:29:19 +02:00
Vladimir Sementsov-Ogievskiy
2714f13d69 qcow2: refactor data compression
Make a separate function for compression to be parallelized later.
 - use .avail_out field instead of .next_out to calculate size of
   compressed data. It looks more natural and it allows to keep dest to
   be void pointer
 - set avail_out to be at least one byte less than input, to be sure
   avoid inefficient compression earlier

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-07-05 10:29:00 +02:00
Fam Zheng
e06f4639d8 qcow2: Fix src_offset in copy offloading
Not updating src_offset will result in wrong data being written to dst
image.

Reported-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-06-29 14:20:56 +02:00
Kevin Wolf
8b24cd1415 qcow2: Free allocated clusters on write error
If we managed to allocate the clusters, but then failed to write the
data, there's a good chance that we'll still be able to free the
clusters again in order to avoid cluster leaks (the refcounts are
cached, so even if we can't write them out right now, we may be able to
do so when the VM is resumed after a werror=stop/enospc pause).

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Tested-by: Eric Blake <eblake@redhat.com>
2018-06-29 14:20:56 +02:00
Markus Armbruster
796d323945 block/crypto: Simplify block_crypto_{open,create}_opts_init()
block_crypto_open_opts_init() and block_crypto_create_opts_init()
contain a virtual visit of QCryptoBlockOptions and
QCryptoBlockCreateOptions less member "format", respectively.

Change their callers to put member "format" in the QDict, so they can
use the generated visitors for these types instead.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-06-29 14:20:56 +02:00
Fam Zheng
354d930dc6 qcow2: Remove dead check on !ret
In the beginning of the function, we initialize the local variable to 0,
and in the body of the function, we check the assigned values and exit
the loop immediately. So here it can never be non-zero.

Reported-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-06-29 14:20:56 +02:00
Kevin Wolf
47e86b868d qcow2: Remove coroutine trampoline for preallocate_co()
All callers are coroutine_fns now, so we can just directly call
preallocate_co().

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2018-06-29 14:20:56 +02:00
Kevin Wolf
061ca8a368 block: Convert .bdrv_truncate callback to coroutine_fn
bdrv_truncate() is an operation that can block (even for a quite long
time, depending on the PreallocMode) in I/O paths that shouldn't block.
Convert it to a coroutine_fn so that we have the infrastructure for
drivers to make their .bdrv_co_truncate implementation asynchronous.

This change could potentially introduce new race conditions because
bdrv_truncate() isn't necessarily executed atomically any more. Whether
this is a problem needs to be evaluated for each block driver that
supports truncate:

* file-posix/win32, gluster, iscsi, nfs, rbd, ssh, sheepdog: The
  protocol drivers are trivially safe because they don't actually yield
  yet, so there is no change in behaviour.

* copy-on-read, crypto, raw-format: Essentially just filter drivers that
  pass the request to a child node, no problem.

* qcow2: The implementation modifies metadata, so it needs to hold
  s->lock to be safe with concurrent I/O requests. In order to avoid
  double locking, this requires pulling the locking out into
  preallocate_co() and using qcow2_write_caches() instead of
  bdrv_flush().

* qed: Does a single header update, this is fine without locking.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2018-06-29 14:20:56 +02:00
Kevin Wolf
ae5475e82f qcow2: Fix qcow2_truncate() error return value
If qcow2_alloc_clusters_at() returns an error, we do need to negate it
to get back the positive errno code for error_setg_errno(), but we still
need to return the negative error code.

Fixes: 772d1f973f
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2018-06-29 14:20:56 +02:00
Anton Nefedov
29cd0403f1 qapi: remove empty flat union branches and types
Flat unions may now have uncovered branches, so it is possible to get rid
of empty types defined for that purpose only.

Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <1529311206-76847-3-git-send-email-anton.nefedov@virtuozzo.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2018-06-22 16:33:46 +02:00
Markus Armbruster
af91062ee1 block: Factor out qobject_input_visitor_new_flat_confused()
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-06-15 14:49:44 +02:00
Markus Armbruster
92adf9dbcd block: Clean up a misuse of qobject_to() in .bdrv_co_create_opts()
The following pattern occurs in the .bdrv_co_create_opts() methods of
parallels, qcow, qcow2, qed, vhdx and vpc:

    qobj = qdict_crumple_for_keyval_qiv(qdict, errp);
    qobject_unref(qdict);
    qdict = qobject_to(QDict, qobj);
    if (qdict == NULL) {
         ret = -EINVAL;
         goto done;
    }

    v = qobject_input_visitor_new_keyval(QOBJECT(qdict));
    [...]
    ret = 0;
done:
    qobject_unref(qdict);
    [...]
    return ret;

If qobject_to() fails, we return failure without setting errp.  That's
wrong.  As far as I can tell, it cannot fail here.  Clean it up
anyway, by removing the useless conversion.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-06-15 14:49:44 +02:00
Markus Armbruster
e5af0da1dc block: Fix -blockdev for certain non-string scalars
Configuration flows through the block subsystem in a rather peculiar
way.  Configuration made with -drive enters it as QemuOpts.
Configuration made with -blockdev / blockdev-add enters it as QAPI
type BlockdevOptions.  The block subsystem uses QDict, QemuOpts and
QAPI types internally.  The precise flow is next to impossible to
explain (I tried for this commit message, but gave up after wasting
several hours).  What I can explain is a flaw in the BlockDriver
interface that leads to this bug:

    $ qemu-system-x86_64 -blockdev node-name=n1,driver=nfs,server.type=inet,server.host=localhost,path=/foo/bar,user=1234
    qemu-system-x86_64: -blockdev node-name=n1,driver=nfs,server.type=inet,server.host=localhost,path=/foo/bar,user=1234: Internal error: parameter user invalid

QMP blockdev-add is broken the same way.

Here's what happens.  The block layer passes configuration represented
as flat QDict (with dotted keys) to BlockDriver methods
.bdrv_file_open().  The QDict's members are typed according to the
QAPI schema.

nfs_file_open() converts it to QAPI type BlockdevOptionsNfs, with
qdict_crumple() and a qobject input visitor.

This visitor comes in two flavors.  The plain flavor requires scalars
to be typed according to the QAPI schema.  That's the case here.  The
keyval flavor requires string scalars.  That's not the case here.
nfs_file_open() uses the latter, and promptly falls apart for members
@user, @group, @tcp-syn-count, @readahead-size, @page-cache-size,
@debug.

Switching to the plain flavor would fix -blockdev, but break -drive,
because there the scalars arrive in nfs_file_open() as strings.

The proper fix would be to replace the QDict by QAPI type
BlockdevOptions in the BlockDriver interface.  Sadly, that's beyond my
reach right now.

Next best would be to fix the block layer to always pass correctly
typed QDicts to the BlockDriver methods.  Also beyond my reach.

What I can do is throw another hack onto the pile: have
nfs_file_open() convert all members to string, so use of the keyval
flavor actually works, by replacing qdict_crumple() by new function
qdict_crumple_for_keyval_qiv().

The pattern "pass result of qdict_crumple() to
qobject_input_visitor_new_keyval()" occurs several times more:

* qemu_rbd_open()

  Same issue as nfs_file_open(), but since BlockdevOptionsRbd has only
  string members, its only a latent bug.  Fix it anyway.

* parallels_co_create_opts(), qcow_co_create_opts(),
  qcow2_co_create_opts(), bdrv_qed_co_create_opts(),
  sd_co_create_opts(), vhdx_co_create_opts(), vpc_co_create_opts()

  These work, because they create the QDict with
  qemu_opts_to_qdict_filtered(), which creates only string scalars.
  The function sports a TODO comment asking for better typing; that's
  going to be fun.  Use qdict_crumple_for_keyval_qiv() to be safe.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-06-15 14:49:44 +02:00
Max Reitz
609f45ea95 block: Add block-specific QDict header
There are numerous QDict functions that have been introduced for and are
used only by the block layer.  Move their declarations into an own
header file to reflect that.

While qdict_extract_subqdict() is in fact used outside of the block
layer (in util/qemu-config.c), it is still a function related very
closely to how the block layer works with nested QDicts, namely by
sometimes flattening them.  Therefore, its declaration is put into this
header as well and util/qemu-config.c includes it with a comment stating
exactly which function it needs.

Suggested-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20180509165530.29561-7-mreitz@redhat.com>
[Copyright note tweaked, superfluous includes dropped]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-06-15 14:49:44 +02:00
Max Reitz
ddf3b47ef4 qcow2: Do not mark inactive images corrupt
When signaling a corruption on a read-only image, qcow2 already makes
fatal events non-fatal (i.e., they will not result in the image being
closed, and the image header's corrupt flag will not be set).  This is
necessary because we cannot set the corrupt flag on read-only images,
and it is possible because further corruption of read-only images is
impossible.

Inactive images are effectively read-only, too, so we should do the same
for them.  bdrv_is_writable() can tell us whether an image can actually
be written to, so use its result instead of !bs->read_only.

(Otherwise, the assert(!(bs->open_flags & BDRV_O_INACTIVE)) in
bdrv_co_pwritev() will fail, crashing qemu.)

Cc: qemu-stable@nongnu.org
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20180606193702.7113-3-mreitz@redhat.com
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-06-11 16:18:45 +02:00
Max Reitz
d1402b5026 block: Add Error parameter to bdrv_amend_options
Looking at the qcow2 code that is riddled with error_report() calls,
this is really how it should have been from the start.

Along the way, turn the target_version/current_version comparisons at
the beginning of qcow2_downgrade() into assertions (the caller has to
make sure these conditions are met), and rephrase the error message on
using compat=1.1 to get refcount widths other than 16 bits.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20180509210023.20283-3-mreitz@redhat.com
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-06-11 16:18:45 +02:00
Peter Maydell
0d514fa234 Pull request
* Copy offloading for qemu-img convert (iSCSI, raw, and qcow2)
 
    If the underlying storage supports copy offloading, qemu-img convert will
    use it instead of performing reads and writes.  This avoids data transfers
    and thus frees up storage bandwidth for other purposes.  SCSI EXTENDED COPY
    and Linux copy_file_range(2) are used to implement this optimization.
 
  * Drop spurious "WARNING: I\/O thread spun for 1000 iterations" warning
 -----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJbFSBoAAoJEJykq7OBq3PISpEIAIcMao4/rzinAWXzS+ncK9LO
 6FtRVpgutpHaWX2ayySaz5n2CdR3cNMrpCI7sjY2Kw0lrdkqxPgl5n0SWD+VCl4W
 7+JLz/uF0iUV8X+99e7WGAjZbm9LSlxgn5AQKfrrwyPf0ZfzoYQ5nBMcQ6xjEeQP
 48j2WqJqN9/u8RBD07o11yn0+CE5g56/f12xVjR5ASVodzsAmcZ2OQRMQbM01isU
 1mBekJQkDxJkt5l13Rql8+t+vWz8/9BEW2c/eIDKvoayMqYJpdfKv4DqLloIuHnc
 3RkquA0zUuKtl7xEnEkH/We7fi4QPGW/vyBN7ychS/zKzZFQrXmwqrAuFSw3dKU=
 =vZp+
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/stefanha/tags/block-pull-request' into staging

Pull request

 * Copy offloading for qemu-img convert (iSCSI, raw, and qcow2)

   If the underlying storage supports copy offloading, qemu-img convert will
   use it instead of performing reads and writes.  This avoids data transfers
   and thus frees up storage bandwidth for other purposes.  SCSI EXTENDED COPY
   and Linux copy_file_range(2) are used to implement this optimization.

 * Drop spurious "WARNING: I\/O thread spun for 1000 iterations" warning

# gpg: Signature made Mon 04 Jun 2018 12:20:08 BST
# gpg:                using RSA key 9CA4ABB381AB73C8
# gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>"
# gpg:                 aka "Stefan Hajnoczi <stefanha@gmail.com>"
# Primary key fingerprint: 8695 A8BF D3F9 7CDA AC35  775A 9CA4 ABB3 81AB 73C8

* remotes/stefanha/tags/block-pull-request:
  main-loop: drop spin_counter
  qemu-img: Convert with copy offloading
  block-backend: Add blk_co_copy_range
  iscsi: Implement copy offloading
  iscsi: Create and use iscsi_co_wait_for_task
  iscsi: Query and save device designator when opening
  file-posix: Implement bdrv_co_copy_range
  qcow2: Implement copy offloading
  raw: Implement copy offloading
  raw: Check byte range uniformly
  block: Introduce API for copy offloading

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-06-04 18:34:04 +01:00
Peter Maydell
f67c9b693a acpi, vhost, misc: fixes, features
vDPA support, fix to vhost blk RO bit handling, some include path
 cleanups, NFIT ACPI table.
 
 Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJbEXNvAAoJECgfDbjSjVRpc8gH/R8xrcFrV+k9wwbgYcOcGb6Y
 LWjseE31pqJcxRV80vLOdzYEuLStZQKQQY7xBDMlA5vdyvZxIA6FLO2IsiJSbFAk
 EK8pclwhpwQAahr8BfzenabohBv2UO7zu5+dqSvuJCiMWF3jGtPAIMxInfjXaOZY
 odc1zY2D2EgsC7wZZ1hfraRbISBOiRaez9BoGDKPOyBY9G1ASEgxJgleFgoBLfsK
 a1XU+fDM6hAVdxftfkTm0nibyf7PWPDyzqghLqjR9WXLvZP3Cqud4p8N29mY51pR
 KSTjA4FYk6Z9EVMltyBHfdJs6RQzglKjxcNGdlrvacDfyFi79fGdiosVllrjfJM=
 =3+V0
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging

acpi, vhost, misc: fixes, features

vDPA support, fix to vhost blk RO bit handling, some include path
cleanups, NFIT ACPI table.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

# gpg: Signature made Fri 01 Jun 2018 17:25:19 BST
# gpg:                using RSA key 281F0DB8D28D5469
# gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>"
# gpg:                 aka "Michael S. Tsirkin <mst@redhat.com>"
# Primary key fingerprint: 0270 606B 6F3C DF3D 0B17  0970 C350 3912 AFBE 8E67
#      Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA  8A0D 281F 0DB8 D28D 5469

* remotes/mst/tags/for_upstream: (31 commits)
  vhost-blk: turn on pre-defined RO feature bit
  ACPI testing: test NFIT platform capabilities
  nvdimm, acpi: support NFIT platform capabilities
  tests/.gitignore: add entry for generated file
  arch_init: sort architectures
  ui: use local path for local headers
  qga: use local path for local headers
  colo: use local path for local headers
  migration: use local path for local headers
  usb: use local path for local headers
  sd: fix up include
  vhost-scsi: drop an unused include
  ppc: use local path for local headers
  rocker: drop an unused include
  e1000e: use local path for local headers
  ioapic: fix up includes
  ide: use local path for local headers
  display: use local path for local headers
  trace: use local path for local headers
  migration: drop an unused include
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-06-04 10:15:16 +01:00
Fam Zheng
fd9fcd37a8 qcow2: Implement copy offloading
The two callbacks are implemented quite similarly to the read/write
functions: bdrv_co_copy_range_from maps for read and calls into bs->file
or bs->backing depending on the allocation status; bdrv_co_copy_range_to
maps for write and calls into bs->file.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-id: 20180601092648.24614-5-famz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2018-06-01 14:41:48 +01:00
Michael S. Tsirkin
0d8c41dae5 block: use local path for local headers
When pulling in headers that are in the same directory as the C file (as
opposed to one in include/), we should use its relative path, without a
directory.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
2018-05-31 04:16:06 +03:00
Alberto Garcia
7af5eea9b3 qcow2: Fix Coverity warning when calculating the refcount cache size
MIN_REFCOUNT_CACHE_SIZE is 4 and the cluster size is guaranteed to be
at most 2MB, so the minimum refcount cache size (in bytes) is always
going to fit in a 32-bit integer.

Coverity doesn't know that, and since we're storing the result in a
uint64_t (*refcount_cache_size) it thinks that we need the 64 bits and
that we probably want to do a 64-bit multiplication to prevent the
result from being truncated.

This is a false positive in this case, but it's a fair warning.
We could do a 64-bit multiplication to get rid of it, but since we
know that a 32-bit variable is enough to store this value let's simply
reuse min_refcount_cache, make it a normal int and stop doing casts.

Reported-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-05-29 20:09:17 +02:00
Alberto Garcia
52253998ec qcow2: Give the refcount cache the minimum possible size by default
The L2 and refcount caches have default sizes that can be overridden
using the l2-cache-size and refcount-cache-size (an additional
parameter named cache-size sets the combined size of both caches).

Unless forced by one of the aforementioned parameters, QEMU will set
the unspecified sizes so that the L2 cache is 4 times larger than the
refcount cache.

This is based on the premise that the refcount metadata needs to be
only a fourth of the L2 metadata to cover the same amount of disk
space. This is incorrect for two reasons:

 a) The amount of disk covered by an L2 table depends solely on the
    cluster size, but in the case of a refcount block it depends on
    the cluster size *and* the width of each refcount entry.
    The 4/1 ratio is only valid with 16-bit entries (the default).

 b) When we talk about disk space and L2 tables we are talking about
    guest space (L2 tables map guest clusters to host clusters),
    whereas refcount blocks are used for host clusters (including
    L1/L2 tables and the refcount blocks themselves). On a fully
    populated (and uncompressed) qcow2 file, image size > virtual size
    so there are more refcount entries than L2 entries.

Problem (a) could be fixed by adjusting the algorithm to take into
account the refcount entry width. Problem (b) could be fixed by
increasing a bit the refcount cache size to account for the clusters
used for qcow2 metadata.

However this patch takes a completely different approach and instead
of keeping a ratio between both cache sizes it assigns as much as
possible to the L2 cache and the remainder to the refcount cache.

The reason is that L2 tables are used for every single I/O request
from the guest and the effect of increasing the cache is significant
and clearly measurable. Refcount blocks are however only used for
cluster allocation and internal snapshots and in practice are accessed
sequentially in most cases, so the effect of increasing the cache is
negligible (even when doing random writes from the guest).

So, make the refcount cache as small as possible unless the user
explicitly asks for a larger one.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 9695182c2eb11b77cb319689a1ebaa4e7c9d6591.1523968389.git.berto@igalia.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-05-15 16:15:21 +02:00