Commit Graph

3905 Commits

Author SHA1 Message Date
Kevin Wolf 4be6a6d118 block: Poll after drain on attaching a node
Commit dcf94a23b1 ('block: Don't poll in parent drain callbacks')
removed polling in bdrv_child_cb_drained_begin() on the grounds that the
original bdrv_drain() already will poll and BdrvChildRole.drained_begin
calls must not cause graph changes (and therefore must not call
aio_poll() or the recursion through the graph will break.

This reasoning is correct for calls through bdrv_do_drained_begin().
However, BdrvChildRole.drained_begin is also called when a node that is
already in a drained section (i.e. bdrv_do_drained_begin() has already
returned and therefore can't poll any more) is attached to a new parent.
In this case, we must explicitly poll to have all requests completed
before the drained new child can be attached to the parent.

In bdrv_replace_child_noperm(), we know that we're not inside the
recursion of bdrv_do_drained_begin() because graph changes are not
allowed there, and bdrv_replace_child_noperm() is a graph change. The
call of BdrvChildRole.drained_begin() must therefore be followed by a
BDRV_POLL_WHILE() that waits for the completion of requests.

Reported-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-07-10 10:36:15 +02:00
Peter Maydell 1daf14ec9e Block layer patches:
- qcow2: Use worker threads for compression to improve performance of
   'qemu-img convert -W' and compressed backup jobs
 - blklogwrites: New filter driver to log write requests to an image in
   the dm-log-writes format
 - file-posix: Fix image locking during image creation
 - crypto: Fix memory leak in error path
 - Error out instead of silently truncating node names
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJbPfHhAAoJEH8JsnLIjy/Wxp8P/jeEJ/jJ0N2doix5KJSvtkdP
 dYx4cJxHg9tMSnmBfA49qc1ilgbzZqjbF/wH3Ko3zPsmFP0xsxx+9zE4c3iL7HA6
 6E4/r3szznO8DIxdgzAIeHjZc7YPpsY3alrthT55eR/FnDyc8zofi/iUMgvTKPYA
 8TnWqgi2fdaVwNy+lIRJFdhzzQNtPxOEO+0DFHaOZvQ9vlc5DGPC+Hj3Qc11GK8M
 u7orkeYqQPknxy0hJKPxtWHvzCQUJMcSSs6PbuslnzOerYiQLmx+RVIwMhXfVZjV
 lXe2SppAszBujtcIENhZlj1cECs7MrWTmFDcWvBA+Mh/JhFEWwykmlQHYrjCqdvw
 QyWhVLgP/6jQDaBls6LADSZDxX7i0sV27DzVGN4gYUX/KcyVPEDROm90MyE8nxN0
 Y5hhujTkzu94Zvd/OlKZRs4tPxmbRrd2SWy0id8Kj5/gO/+UWECOZCrPrlB5uzec
 bsxHoeXLVe2/56JkCIiOw3hLILMY6gPLJgeaQjz6hu1oZJqYuoof8grVFjUCSH0C
 BBlz8O4upHdPIKNRl4rmdgwasg5YIiJ7SFl4h7KMCD0elkKmo3SyTKosagvqrpVN
 JvW1YuosCedcfYIKNPnZdEGWDKuTaZtcnZnM9IRY6hr8ySMw1LloFsgXsJ3tqTwe
 wb1Wjm4Qlz5BaoB7o/Os
 =z5AC
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging

Block layer patches:

- qcow2: Use worker threads for compression to improve performance of
  'qemu-img convert -W' and compressed backup jobs
- blklogwrites: New filter driver to log write requests to an image in
  the dm-log-writes format
- file-posix: Fix image locking during image creation
- crypto: Fix memory leak in error path
- Error out instead of silently truncating node names

# gpg: Signature made Thu 05 Jul 2018 11:24:33 BST
# gpg:                using RSA key 7F09B272C88F2FD6
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>"
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74  56FE 7F09 B272 C88F 2FD6

* remotes/kevin/tags/for-upstream:
  file-posix: Unlock FD after creation
  file-posix: Fix creation locking
  block/blklogwrites: Add an option for the update interval of the log superblock
  block/blklogwrites: Add an option for appending to an old log
  block/blklogwrites: Change log_sector_size from int64_t to uint64_t
  block/crypto: Fix memory leak in create error path
  block: Don't silently truncate node names
  block: Add blklogwrites
  block: Move two block permission constants to the relevant enum
  qcow2: add compress threads
  qcow2: refactor data compression
  qemu-img: allow compressed not-in-order writes

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-07-05 15:53:04 +01:00
Max Reitz 7c20c808a5 file-posix: Unlock FD after creation
Closing the FD does not necessarily mean that it is unlocked.  Fix this
by relinquishing all permission locks before qemu_close().

Reported-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-07-05 11:07:58 +02:00
Max Reitz d815efcaf0 file-posix: Fix creation locking
raw_apply_lock_bytes() takes a bit mask of "permissions that are NOT
shared".

Also, make the "perm" and "shared" variables uint64_t, because I do not
particularly like using ~ on signed integers (and other permission masks
are usually uint64_t, too).

Reported-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-07-05 11:07:58 +02:00
Ari Sundholm 1dce698ea8 block/blklogwrites: Add an option for the update interval of the log superblock
This is a way to ensure that the log superblock is periodically
updated. Before, this was only done on flush requests, which may
not be enough if the VM exits abnormally, omitting the final flush.

The default interval is 4096 write requests.

Signed-off-by: Ari Sundholm <ari@tuxera.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-07-05 10:50:21 +02:00
Ari Sundholm 0878b3c113 block/blklogwrites: Add an option for appending to an old log
Suggested by Kevin Wolf. May be useful when testing multiple batches
of writes or doing long-term testing involving restarts of the VM.

Signed-off-by: Ari Sundholm <ari@tuxera.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-07-05 10:50:20 +02:00
Ari Sundholm 2dacaf7c82 block/blklogwrites: Change log_sector_size from int64_t to uint64_t
This was a simple oversight when working on intermediate versions
of the original patch which introduced blklogwrites.

Signed-off-by: Ari Sundholm <ari@tuxera.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-07-05 10:50:20 +02:00
Kevin Wolf 0b68589d17 block/crypto: Fix memory leak in create error path
Fixes: Coverity CID 1393782
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
2018-07-05 10:29:19 +02:00
Aapo Vienamo bfcc224e3c block: Add blklogwrites
Implements a block device write logging system, similar to Linux kernel
device mapper dm-log-writes. The write operations that are performed
on a block device are logged to a file or another block device. The
write log format is identical to the dm-log-writes format. Currently,
log markers are not supported.

This functionality can be used for crash consistency and fs consistency
testing. By implementing it in qemu, tests utilizing write logs can be
be used to test non-Linux drivers and older kernels.

The driver accepts an optional parameter to set the sector size used
for logging. This makes the driver require all requests to be aligned
to this sector size and also makes offsets and sizes of writes in the
log metadata to be expressed in terms of this value (the log format has
a granularity of one sector for offsets and sizes). This allows
accurate logging of writes to guest block devices that have unusual
sector sizes.

The implementation is based on the blkverify and blkdebug block
drivers.

Signed-off-by: Aapo Vienamo <aapo@tuxera.com>
Signed-off-by: Ari Sundholm <ari@tuxera.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-07-05 10:29:19 +02:00
Vladimir Sementsov-Ogievskiy ceb029cd6f qcow2: add compress threads
Do data compression in separate threads. This significantly improve
performance for qemu-img convert with -W (allow async writes) and -c
(compressed) options.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-07-05 10:29:19 +02:00
Vladimir Sementsov-Ogievskiy 2714f13d69 qcow2: refactor data compression
Make a separate function for compression to be parallelized later.
 - use .avail_out field instead of .next_out to calculate size of
   compressed data. It looks more natural and it allows to keep dest to
   be void pointer
 - set avail_out to be at least one byte less than input, to be sure
   avoid inefficient compression earlier

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-07-05 10:29:00 +02:00
Vladimir Sementsov-Ogievskiy 58f72b965e dirty-bitmap: fix double lock on bitmap enabling
Bitmap lock/unlock were added to bdrv_enable_dirty_bitmap in
8b1402ce80, but some places were not updated correspondingly, which
leads to trying to take this lock twice, which is dead-lock. Fix this.

Actually, iotest 199 (about dirty bitmap postcopy migration) is broken
now, and this fixes it.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20180625165745.25259-3-vsementsov@virtuozzo.com
Signed-off-by: John Snow <jsnow@redhat.com>
2018-07-04 02:12:49 -04:00
Vladimir Sementsov-Ogievskiy 92bcea40d3 block/dirty-bitmap: add bdrv_enable_dirty_bitmap_locked
Add _locked version of bdrv_enable_dirty_bitmap, to fix dirty bitmap
migration in the following patch.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20180625165745.25259-2-vsementsov@virtuozzo.com
Signed-off-by: John Snow <jsnow@redhat.com>
2018-07-04 02:12:49 -04:00
Peter Maydell a395717cbd -----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJbOvCTAAoJEL2+eyfA3jBX354P/RFVLsozhcb3DeFj5Ocq2kfS
 sRt/82Ke/f/w/8lNd4wNbOsdG/eg2M4RmLWF4ONWWoeO7Z0KIatUOTtw5HxjBxBX
 XfhQy7RZ65luuCHnLDU6c4IdnDvXBVG/kErydhDZjEyeY8qlxrurBB8331cTRFwu
 hisweIwogPOFDA+/Bty0W0EyVQWFAobL3ExYFlOYFuHwsqJfMPQbytw2zDzC4kjn
 8Ecppyt7rfLsEcyf/4OAoHfbbYOiQl7PkXE7/uXDFyL8zPdRpIlDFSZtmy1Zb213
 mcYhPmehUkFHV/BDF/LdnzjlraK8oMaNu0IDld5cX/1xUU4VtbW2YjAt6OdCn7Ll
 7YbNNKYU/mM1QUPshX4qJkbUaCu7JoTDKPiBbJei/MV7zMJBLpNVG/AuJE2gbweI
 2levV76QzS2+fQVKv/9LUliqOYEp5T0/aybb+35Vzhf5WNpSO7s1oaCDAvSgUhS+
 qU1MIAROQQPCmdM8PwqzG9b2TGp/tYcWOju5bqt488Twmo0BbTGjYCFl6StJHibC
 mN5fASP5nQiz1fc3FrBp0h/PCQlGtd2ZgeyeC+lkPVcFovclA2vo/ib1k2LK/0nU
 TzkKIFRJZH58yYjppuBrB6c/aFfVkutE4Hz25i+3nZ91ZEyQKbv1mDxCyRNXNWjt
 Gteul6gUo/AjzMOWFFvH
 =AKR7
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/cody/tags/block-pull-request' into staging

# gpg: Signature made Tue 03 Jul 2018 04:42:11 BST
# gpg:                using RSA key BDBE7B27C0DE3057
# gpg: Good signature from "Jeffrey Cody <jcody@redhat.com>"
# gpg:                 aka "Jeffrey Cody <jeff@codyprime.org>"
# gpg:                 aka "Jeffrey Cody <codyprime@gmail.com>"
# Primary key fingerprint: 9957 4B4D 3474 90E7 9D98  D624 BDBE 7B27 C0DE 3057

* remotes/cody/tags/block-pull-request:
  backup: Use copy offloading
  block: Honour BDRV_REQ_NO_SERIALISING in copy range
  block: Fix parameter checking in bdrv_co_copy_range_internal

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-07-03 11:49:51 +01:00
Peter Maydell 9b75dcb15f nbd patches for 2018-07-02
Bug fixes and iotest exposure of fleecing via NBD (serving a
 read-only point-in-time view via blockdev-backup sync:none,
 as well as serving dirty bitmaps over NBD), including a new
 x-dirty-bitmap parameter when opening NBD clients as the
 counterpart to x-nbd-server-add-bitmap. Also a random fix
 for iscsi block_status spotted by Coverity that missed other
 miscellaneous trees.
 
 - Eric Blake: nbd/server: Fix dirty bitmap logic regression
 - Eric Blake: iscsi: Avoid potential for get_status overflow
 - John Snow/Vladimir Sementsov-Ogievskiy: 0/2 block: formalize and test fleecing
 - Eric Blake: 0/2 test NBD bitmap export
 -----BEGIN PGP SIGNATURE-----
 Comment: Public key at http://people.redhat.com/eblake/eblake.gpg
 
 iQEcBAABCAAGBQJbOtJPAAoJEKeha0olJ0NqEvwH/3FwWnlBdBvdYGgPjzGE1Atm
 ofCKcyxE/2VJtxeWlZQHzs1VqSq81s7am5SdzOrIWnQekvHFcLu6/71RABiauzMd
 neCvVOrXOVdktj1i1Z2Gg4BgjDmqbTDlo5ssVh/oXP0Zebi6OZFfQrB7y3cGBvui
 4XI7lW9qJxt6F1FlKloXnofWRDENyo5vgdz6QjQXfauthw2T5045RIPTfiz03FCp
 fbs+6K0+bKxfPdNLrqxxOZo/loYnEXbDYv6VBAIWBqztXVnMHxCqnh0YN05jwsfF
 TRW0/YT8lpWarOZ1soIC6a/OGXQZbxgRhZ+Zr+Wa2jw0YNHJanU9isxi37aUqQo=
 =Xivx
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/ericb/tags/pull-nbd-2018-07-02' into staging

nbd patches for 2018-07-02

Bug fixes and iotest exposure of fleecing via NBD (serving a
read-only point-in-time view via blockdev-backup sync:none,
as well as serving dirty bitmaps over NBD), including a new
x-dirty-bitmap parameter when opening NBD clients as the
counterpart to x-nbd-server-add-bitmap. Also a random fix
for iscsi block_status spotted by Coverity that missed other
miscellaneous trees.

- Eric Blake: nbd/server: Fix dirty bitmap logic regression
- Eric Blake: iscsi: Avoid potential for get_status overflow
- John Snow/Vladimir Sementsov-Ogievskiy: 0/2 block: formalize and test fleecing
- Eric Blake: 0/2 test NBD bitmap export

# gpg: Signature made Tue 03 Jul 2018 02:33:03 BST
# gpg:                using RSA key A7A16B4A2527436A
# gpg: Good signature from "Eric Blake <eblake@redhat.com>"
# gpg:                 aka "Eric Blake (Free Software Programmer) <ebb9@byu.net>"
# gpg:                 aka "[jpeg image of size 6874]"
# Primary key fingerprint: 71C2 CC22 B1C4 6029 27D2  F3AA A7A1 6B4A 2527 436A

* remotes/ericb/tags/pull-nbd-2018-07-02:
  iotests: New test 223 for exporting dirty bitmap over NBD
  nbd/client: Add x-dirty-bitmap to query bitmap from server
  iotests: add 222 to test basic fleecing
  blockdev: enable non-root nodes for backup source
  iscsi: Avoid potential for get_status overflow
  nbd/server: Fix dirty bitmap logic regression

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-07-03 10:47:02 +01:00
Fam Zheng 9ded4a0114 backup: Use copy offloading
The implementation is similar to the 'qemu-img convert'. In the
beginning of the job, offloaded copy is attempted. If it fails, further
I/O will go through the existing bounce buffer code path.

Then, as Kevin pointed out, both this and qemu-img convert can benefit
from a local check if one request fails because of, for example, the
offset is beyond EOF, but another may well be accepted by the protocol
layer. This will be implemented separately.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-id: 20180703023758.14422-4-famz@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
2018-07-02 23:23:45 -04:00
Fam Zheng dee12de893 block: Honour BDRV_REQ_NO_SERIALISING in copy range
This semantics is needed by drive-backup so implement it before using
this API there.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-id: 20180703023758.14422-3-famz@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
2018-07-02 23:23:45 -04:00
Fam Zheng d4d3e5a0d5 block: Fix parameter checking in bdrv_co_copy_range_internal
src may be NULL if BDRV_REQ_ZERO_WRITE flag is set, in this case only
check dst and dst->bs. This bug was introduced when moving in the
request tracking code from bdrv_co_copy_range, in 37aec7d75e.

This especially fixes the possible segfault when initializing src_bs
with a NULL src.

Signed-off-by: Fam Zheng <famz@redhat.com>
Message-id: 20180703023758.14422-2-famz@redhat.com
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Jeff Cody <jcody@redhat.com>
2018-07-02 23:23:45 -04:00
Eric Blake 216ee3657e nbd/client: Add x-dirty-bitmap to query bitmap from server
In order to test that the NBD server is properly advertising
dirty bitmaps, we need a bare minimum client that can request
and read the context.  Since feature freeze for 3.0 is imminent,
this is the smallest workable patch, which replaces the qemu
block status report with the results of the NBD server's dirty
bitmap (making it very easy to use 'qemu-img map --output=json'
to learn where the dirty portions are).  Note that the NBD
protocol defines a dirty section with the same bit but opposite
sense that normal "base:allocation" uses to report an allocated
section; so in qemu-img map output, "data":true corresponds to
clean, "data":false corresponds to dirty.

A more complete solution that allows dirty bitmaps to be queried
at the same time as normal block status will be required before
this addition can lose the x- prefix.  Until then, the fact that
this replaces normal status with dirty status means actions
like 'qemu-img convert' will likely misbehave due to treating
dirty regions of the file as if they are unallocated.

The next patch adds an iotest to exercise this new code.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180702191458.28741-2-eblake@redhat.com>
2018-07-02 15:27:38 -05:00
Eric Blake 8ee1cef459 iscsi: Avoid potential for get_status overflow
Detected by Coverity: Multiplying two 32-bit int and assigning
the result to a 64-bit number is a risk of overflow.  Prior to
the conversion to byte-based interfaces, the block layer took
care of ensuring that a status request never exceeded 2G in
the driver; but after that conversion, the block layer expects
drivers to deal with any size request (the driver can always
truncate the request size back down, as long as it makes
progress).  So, in the off-chance that someone makes a large
request, we are at the mercy of whether iscsi_get_lba_status_task()
will cap things to at most INT_MAX / iscsilun->block_size when
it populates lbasd->num_blocks; since I could not easily audit
that, it's better to be safe than sorry by just forcing a 64-bit
multiply.

Fixes: 92809c36
CC: qemu-stable@nongnu.org
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180508212718.1482663-1-eblake@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
2018-07-02 14:28:26 -05:00
Philippe Mathieu-Daudé f043568f54 vdi: Use definitions from "qemu/units.h"
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Stefan Weil <sw@weilnetz.de>
Message-Id: <20180625124238.25339-3-f4bug@amsat.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-07-02 14:45:23 +02:00
Eric Blake 583c99d393 block: Remove unused sector-based vectored I/O
We are gradually moving away from sector-based interfaces, towards
byte-based.  Now that all callers of vectored I/O have been converted
to use our preferred byte-based bdrv_co_p{read,write}v(), we can
delete the unused bdrv_co_{read,write}v().

Furthermore, this gets rid of the signature difference between the
public bdrv_co_writev() and the callback .bdrv_co_writev (the
latter still exists, because some drivers still need more work
before they are fully byte-based).

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-06-29 14:20:56 +02:00
Eric Blake 3a7404b31e vhdx: Switch to byte-based calls
We are gradually moving away from sector-based interfaces, towards
byte-based.  Make the change for the last few sector-based calls
into the block layer from the vhdx driver.

Ideally, the vhdx driver should switch to doing everything
byte-based, but that's a more invasive change that requires a
bit more auditing.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-06-29 14:20:56 +02:00
Eric Blake 04a11d87d1 replication: Switch to byte-based calls
We are gradually moving away from sector-based interfaces, towards
byte-based.  Make the change for the last few sector-based calls
into the block layer from the replication driver.

Ideally, the replication driver should switch to doing everything
byte-based, but that's a more invasive change that requires a
bit more auditing.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-06-29 14:20:56 +02:00
Eric Blake 609841a3f8 qcow: Switch to a byte-based driver
We are gradually moving away from sector-based interfaces, towards
byte-based.  The qcow driver is now ready to fully utilize the
byte-based callback interface, as long as we override the default
alignment to still be 512 (needed at least for asserts present
because of encryption, but easier to do everywhere than to audit
which sub-sector requests are handled correctly, especially since
we no longer recommend qcow for new disk images).

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-06-29 14:20:56 +02:00
Eric Blake d1326a786d qcow: Switch qcow_co_writev to byte-based calls
We are gradually moving away from sector-based interfaces, towards
byte-based.  Make the change for the internals of the qcow
driver write function, by iterating over offset/bytes instead of
sector_num/nb_sectors, and with a rename of index_in_cluster and
repurposing of n to track bytes instead of sectors.

A later patch will then switch the qcow driver as a whole over
to byte-based operation.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-06-29 14:20:56 +02:00
Eric Blake a15312b017 qcow: Switch qcow_co_readv to byte-based calls
We are gradually moving away from sector-based interfaces, towards
byte-based.  Make the change for the internals of the qcow
driver read function, by iterating over offset/bytes instead of
sector_num/nb_sectors, and with a rename of index_in_cluster and
repurposing of n to track bytes instead of sectors.

A later patch will then switch the qcow driver as a whole over
to byte-based operation.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-06-29 14:20:56 +02:00
Eric Blake 787993a543 qcow: Switch get_cluster_offset to be byte-based
We are gradually moving away from sector-based interfaces, towards
byte-based.  Make the change for the internal helper function
get_cluster_offset(), by changing n_start and n_end to be byte
offsets rather than sector indices within the cluster being
allocated.  However, assert that these values are still
sector-aligned (at least qcrypto_block_encrypt() still wants that).
For now we get that alignment for free because we still use
sector-based driver callbacks.

A later patch will then switch the qcow driver as a whole over
to byte-based operation; but will still leave things at sector
alignments as it is not worth auditing the qcow image format
to worry about sub-sector requests.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-06-29 14:20:56 +02:00
Eric Blake d08c2a245f parallels: Switch to byte-based calls
We are gradually moving away from sector-based interfaces, towards
byte-based.  Make the change for the last few sector-based calls
into the block layer from the parallels driver.

Ideally, the parallels driver should switch to doing everything
byte-based, but that's a more invasive change that requires a
bit more auditing.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-06-29 14:20:56 +02:00
Fam Zheng c436e3d014 file-posix: Fix EINTR handling
EINTR should be checked against errno, not ret. While fixing the bug,
collect the branches with a switch block.

Also, change the return value from -ENOSTUP to -ENOSPC when the actual
issue is request range passes EOF, which should be distinguishable from
the case of error == ENOSYS by the caller, so that it could still retry
with other byte ranges, whereas it shouldn't retry anymore upon ENOSYS.

Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-06-29 14:20:56 +02:00
Fam Zheng 1439b9c110 iscsi: Don't blindly use designator length in response for memcpy
Per SCSI definition the designator_length we receive from INQUIRY is 8,
12 or at most 16, but we should be careful because the remote iscsi
target may misbehave, otherwise we could have a buffer overflow.

Reported-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-06-29 14:20:56 +02:00
Fam Zheng e06f4639d8 qcow2: Fix src_offset in copy offloading
Not updating src_offset will result in wrong data being written to dst
image.

Reported-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-06-29 14:20:56 +02:00
Kevin Wolf 33d70fb6fa file-posix: Implement co versions of discard/flush
This simplifies file-posix by implementing the coroutine variants of
the discard and flush BlockDriver callbacks. These were the last
remaining users of paio_submit(), which can be removed now.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2018-06-29 14:20:56 +02:00
Kevin Wolf 8b24cd1415 qcow2: Free allocated clusters on write error
If we managed to allocate the clusters, but then failed to write the
data, there's a good chance that we'll still be able to free the
clusters again in order to avoid cluster leaks (the refcounts are
cached, so even if we can't write them out right now, we may be able to
do so when the VM is resumed after a werror=stop/enospc pause).

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Tested-by: Eric Blake <eblake@redhat.com>
2018-06-29 14:20:56 +02:00
Markus Armbruster 796d323945 block/crypto: Simplify block_crypto_{open,create}_opts_init()
block_crypto_open_opts_init() and block_crypto_create_opts_init()
contain a virtual visit of QCryptoBlockOptions and
QCryptoBlockCreateOptions less member "format", respectively.

Change their callers to put member "format" in the QDict, so they can
use the generated visitors for these types instead.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-06-29 14:20:56 +02:00
Fam Zheng 37aec7d75e block: Move request tracking to children in copy offloading
in_flight and tracked requests need to be tracked in every layer during
recursion. For now the only user is qemu-img convert where overlapping
requests and IOThreads don't exist, therefore this change doesn't make
much difference form user point of view, but it is incorrect as part of
the API. Fix it.

Reported-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-06-29 14:20:56 +02:00
Fam Zheng 354d930dc6 qcow2: Remove dead check on !ret
In the beginning of the function, we initialize the local variable to 0,
and in the body of the function, we check the assigned values and exit
the loop immediately. So here it can never be non-zero.

Reported-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-06-29 14:20:56 +02:00
Kevin Wolf 93f4e2ff4b file-posix: Make .bdrv_co_truncate asynchronous
This moves the code to resize an image file to the thread pool to avoid
blocking.

Creating large images with preallocation with blockdev-create is now
actually a background job instead of blocking the monitor (and most
other things) until the preallocation has completed.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2018-06-29 14:20:56 +02:00
Kevin Wolf 1bc5f09f2e block: Use tracked request for truncate
When growing an image, block drivers (especially protocol drivers) may
initialise the newly added area. I/O requests to the same area need to
wait for this initialisation to be completed so that data writes don't
get overwritten and reads don't read uninitialised data.

To avoid overhead in the fast I/O path by adding new locking in the
protocol drivers and to restrict the impact to requests that actually
touch the new area, reuse the existing tracked request infrastructure in
block/io.c and mark all discard requests as serialising.

With this change, it is safe for protocol drivers to make
.bdrv_co_truncate actually asynchronous.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2018-06-29 14:20:56 +02:00
Kevin Wolf 3d9f2d2af6 block: Move bdrv_truncate() implementation to io.c
This moves the bdrv_truncate() implementation from block.c to block/io.c
so it can have access to the tracked requests infrastructure.

This involves making refresh_total_sectors() public (in block_int.h).

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2018-06-29 14:20:56 +02:00
Kevin Wolf 47e86b868d qcow2: Remove coroutine trampoline for preallocate_co()
All callers are coroutine_fns now, so we can just directly call
preallocate_co().

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2018-06-29 14:20:56 +02:00
Kevin Wolf 061ca8a368 block: Convert .bdrv_truncate callback to coroutine_fn
bdrv_truncate() is an operation that can block (even for a quite long
time, depending on the PreallocMode) in I/O paths that shouldn't block.
Convert it to a coroutine_fn so that we have the infrastructure for
drivers to make their .bdrv_co_truncate implementation asynchronous.

This change could potentially introduce new race conditions because
bdrv_truncate() isn't necessarily executed atomically any more. Whether
this is a problem needs to be evaluated for each block driver that
supports truncate:

* file-posix/win32, gluster, iscsi, nfs, rbd, ssh, sheepdog: The
  protocol drivers are trivially safe because they don't actually yield
  yet, so there is no change in behaviour.

* copy-on-read, crypto, raw-format: Essentially just filter drivers that
  pass the request to a child node, no problem.

* qcow2: The implementation modifies metadata, so it needs to hold
  s->lock to be safe with concurrent I/O requests. In order to avoid
  double locking, this requires pulling the locking out into
  preallocate_co() and using qcow2_write_caches() instead of
  bdrv_flush().

* qed: Does a single header update, this is fine without locking.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2018-06-29 14:20:56 +02:00
Kevin Wolf ae5475e82f qcow2: Fix qcow2_truncate() error return value
If qcow2_alloc_clusters_at() returns an error, we do need to negate it
to get back the positive errno code for error_setg_errno(), but we still
need to return the negative error code.

Fixes: 772d1f973f
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2018-06-29 14:20:56 +02:00
Markus Armbruster e6af90f3fb block/crypto: Pacify Coverity after commit f853465aac
Coverity can't see that qobject_input_visitor_new_flat_confused()
returns non-null when it doesn't set @local_err.  Check the return
value instead, like all the other callers do.

Fixes: CID 1393615
Fixes: CID 1393616
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-06-29 14:20:56 +02:00
Nishanth Aravamudan ed6e216171 linux-aio: properly bubble up errors from initialization
laio_init() can fail for a couple of reasons, which will lead to a NULL
pointer dereference in laio_attach_aio_context().

To solve this, add a aio_setup_linux_aio() function which is called
early in raw_open_common. If this fails, propagate the error up. The
signature of aio_get_linux_aio() was not modified, because it seems
preferable to return the actual errno from the possible failing
initialization calls.

Additionally, when the AioContext changes, we need to associate a
LinuxAioState with the new AioContext. Use the bdrv_attach_aio_context
callback and call the new aio_setup_linux_aio(), which will allocate a
new AioContext if needed, and return errors on failures. If it fails for
any reason, fallback to threaded AIO with an error message, as the
device is already in-use by the guest.

Add an assert that aio_get_linux_aio() cannot return NULL.

Signed-off-by: Nishanth Aravamudan <naravamudan@digitalocean.com>
Message-id: 20180622193700.6523-1-naravamudan@digitalocean.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2018-06-27 13:06:34 +01:00
Anton Nefedov 29cd0403f1 qapi: remove empty flat union branches and types
Flat unions may now have uncovered branches, so it is possible to get rid
of empty types defined for that purpose only.

Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <1529311206-76847-3-git-send-email-anton.nefedov@virtuozzo.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2018-06-22 16:33:46 +02:00
Max Reitz 481debaa32 block/mirror: Add copy mode QAPI interface
This patch allows the user to specify whether to use active or only
background mode for mirror block jobs.  Currently, this setting will
remain constant for the duration of the entire block job.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-id: 20180613181823.13618-14-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-06-18 17:05:16 +02:00
Max Reitz d06107ade0 block/mirror: Add active mirroring
This patch implements active synchronous mirroring.  In active mode, the
passive mechanism will still be in place and is used to copy all
initially dirty clusters off the source disk; but every write request
will write data both to the source and the target disk, so the source
cannot be dirtied faster than data is mirrored to the target.  Also,
once the block job has converged (BLOCK_JOB_READY sent), source and
target are guaranteed to stay in sync (unless an error occurs).

Active mode is completely optional and currently disabled at runtime.  A
later patch will add a way for users to enable it.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 20180613181823.13618-13-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-06-18 17:05:15 +02:00
Max Reitz 429076e88d block/mirror: Add MirrorBDSOpaque
This will allow us to access the block job data when the mirror block
driver becomes more complex.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 20180613181823.13618-11-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-06-18 17:04:59 +02:00
Max Reitz 72d10a9421 block/dirty-bitmap: Add bdrv_dirty_iter_next_area
This new function allows to look for a consecutively dirty area in a
dirty bitmap.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 20180613181823.13618-10-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-06-18 17:04:57 +02:00