qemu-e2k

Author SHA1 Message Date

Author	SHA1	Message	Date
Eric Blake	8417e1378c	qemu-img: Make unallocated part of backing chain obvious in map The recently-added NBD context qemu:allocation-depth is able to distinguish between locally-present data (even when that data is sparse) [shown as depth 1 over NBD], and data that could not be found anywhere in the backing chain [shown as depth 0]; and the libnbd project was recently patched to give the human-readable name "absent" to an allocation-depth of 0. But qemu-img map --output=json predates that addition, and has the unfortunate behavior that all portions of the backing chain that resolve without finding a hit in any backing layer report the same depth as the final backing layer. This makes it harder to reconstruct a qcow2 backing chain using just 'qemu-img map' output, especially when using "backing":null to artificially limit a backing chain, because it is impossible to distinguish between a QCOW2_CLUSTER_UNALLOCATED (which defers to a [missing] backing file) and a QCOW2_CLUSTER_ZERO_PLAIN cluster (which would override any backing file), since both types of clusters otherwise show as "data":false,"zero":true" (but note that we can distinguish a QCOW2_CLUSTER_ZERO_ALLOCATED, which would also have an "offset": listing). The task of reconstructing a qcow2 chain was made harder in commit `0da9856851` (nbd: server: Report holes for raw images), because prior to that point, it was possible to abuse NBD's block status command to see which portions of a qcow2 file resulted in BDRV_BLOCK_ALLOCATED (showing up as NBD_STATE_ZERO in isolation) vs. missing from the chain (showing up as NBD_STATE_ZERO\|NBD_STATE_HOLE); but now qemu reports more accurate sparseness information over NBD. An obvious solution is to make 'qemu-img map --output=json' add an additional "present":false designation to any cluster lacking an allocation anywhere in the chain, without any change to the "depth" parameter to avoid breaking existing clients. The iotests have several examples where this distinction demonstrates the additional accuracy. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20210701190655.2131223-3-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> [eblake: fix more iotest fallout] Signed-off-by: Eric Blake <eblake@redhat.com>	2021-07-12 11:10:53 -05:00
Nir Soffer	3a20013fbb	block: posix: Always allocate the first block When creating an image with preallocation "off" or "falloc", the first block of the image is typically not allocated. When using Gluster storage backed by XFS filesystem, reading this block using direct I/O succeeds regardless of request length, fooling alignment detection. In this case we fallback to a safe value (4096) instead of the optimal value (512), which may lead to unneeded data copying when aligning requests. Allocating the first block avoids the fallback. Since we allocate the first block even with preallocation=off, we no longer create images with zero disk size: $ ./qemu-img create -f raw test.raw 1g Formatting 'test.raw', fmt=raw size=1073741824 $ ls -lhs test.raw 4.0K -rw-r--r--. 1 nsoffer nsoffer 1.0G Aug 16 23:48 test.raw And converting the image requires additional cluster: $ ./qemu-img measure -f raw -O qcow2 test.raw required size: 458752 fully allocated size: 1074135040 When using format like vmdk with multiple files per image, we allocate one block per file: $ ./qemu-img create -f vmdk -o subformat=twoGbMaxExtentFlat test.vmdk 4g Formatting 'test.vmdk', fmt=vmdk size=4294967296 compat6=off hwversion=undefined subformat=twoGbMaxExtentFlat $ ls -lhs test*.vmdk 4.0K -rw-r--r--. 1 nsoffer nsoffer 2.0G Aug 27 03:23 test-f001.vmdk 4.0K -rw-r--r--. 1 nsoffer nsoffer 2.0G Aug 27 03:23 test-f002.vmdk 4.0K -rw-r--r--. 1 nsoffer nsoffer 353 Aug 27 03:23 test.vmdk I did quick performance test for copying disks with qemu-img convert to new raw target image to Gluster storage with sector size of 512 bytes: for i in $(seq 10); do rm -f dst.raw sleep 10 time ./qemu-img convert -f raw -O raw -t none -T none src.raw dst.raw done Here is a table comparing the total time spent: Type Before(s) After(s) Diff(%) --------------------------------------- real 530.028 469.123 -11.4 user 17.204 10.768 -37.4 sys 17.881 7.011 -60.7 We can see very clear improvement in CPU usage. Signed-off-by: Nir Soffer <nsoffer@redhat.com> Message-id: 20190827010528.8818-2-nsoffer@redhat.com Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-09-03 14:55:35 +02:00
Max Reitz	2fab30c80b	iotests: Test unaligned raw images with O_DIRECT We already have 221 for accesses through the page cache, but it is better to create a new file for O_DIRECT instead of integrating those test cases into 221. This way, we can make use of _supported_cache_modes (and _default_cache_mode) so the test is automatically skipped on filesystems that do not support O_DIRECT. As part of the split, add _supported_cache_modes to 221. With that, it no longer fails when run with -c none or -c directsync. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-05-20 17:08:57 +02:00

Eric Blake

8417e1378c

qemu-img: Make unallocated part of backing chain obvious in map

The recently-added NBD context qemu:allocation-depth is able to
distinguish between locally-present data (even when that data is
sparse) [shown as depth 1 over NBD], and data that could not be found
anywhere in the backing chain [shown as depth 0]; and the libnbd
project was recently patched to give the human-readable name "absent"
to an allocation-depth of 0.  But qemu-img map --output=json predates
that addition, and has the unfortunate behavior that all portions of
the backing chain that resolve without finding a hit in any backing
layer report the same depth as the final backing layer.  This makes it
harder to reconstruct a qcow2 backing chain using just 'qemu-img map'
output, especially when using "backing":null to artificially limit a
backing chain, because it is impossible to distinguish between a
QCOW2_CLUSTER_UNALLOCATED (which defers to a [missing] backing file)
and a QCOW2_CLUSTER_ZERO_PLAIN cluster (which would override any
backing file), since both types of clusters otherwise show as
"data":false,"zero":true" (but note that we can distinguish a
QCOW2_CLUSTER_ZERO_ALLOCATED, which would also have an "offset":
listing).

The task of reconstructing a qcow2 chain was made harder in commit
0da9856851 (nbd: server: Report holes for raw images), because prior
to that point, it was possible to abuse NBD's block status command to
see which portions of a qcow2 file resulted in BDRV_BLOCK_ALLOCATED
(showing up as NBD_STATE_ZERO in isolation) vs. missing from the chain
(showing up as NBD_STATE_ZERO|NBD_STATE_HOLE); but now qemu reports
more accurate sparseness information over NBD.

An obvious solution is to make 'qemu-img map --output=json' add an
additional "present":false designation to any cluster lacking an
allocation anywhere in the chain, without any change to the "depth"
parameter to avoid breaking existing clients.  The iotests have
several examples where this distinction demonstrates the additional
accuracy.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20210701190655.2131223-3-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
[eblake: fix more iotest fallout]
Signed-off-by: Eric Blake <eblake@redhat.com>

2021-07-12 11:10:53 -05:00

Nir Soffer

3a20013fbb

block: posix: Always allocate the first block

When creating an image with preallocation "off" or "falloc", the first
block of the image is typically not allocated. When using Gluster
storage backed by XFS filesystem, reading this block using direct I/O
succeeds regardless of request length, fooling alignment detection.

In this case we fallback to a safe value (4096) instead of the optimal
value (512), which may lead to unneeded data copying when aligning
requests.  Allocating the first block avoids the fallback.

Since we allocate the first block even with preallocation=off, we no
longer create images with zero disk size:

    $ ./qemu-img create -f raw test.raw 1g
    Formatting 'test.raw', fmt=raw size=1073741824

    $ ls -lhs test.raw
    4.0K -rw-r--r--. 1 nsoffer nsoffer 1.0G Aug 16 23:48 test.raw

And converting the image requires additional cluster:

    $ ./qemu-img measure -f raw -O qcow2 test.raw
    required size: 458752
    fully allocated size: 1074135040

When using format like vmdk with multiple files per image, we allocate
one block per file:

    $ ./qemu-img create -f vmdk -o subformat=twoGbMaxExtentFlat test.vmdk 4g
    Formatting 'test.vmdk', fmt=vmdk size=4294967296 compat6=off hwversion=undefined subformat=twoGbMaxExtentFlat

    $ ls -lhs test*.vmdk
    4.0K -rw-r--r--. 1 nsoffer nsoffer 2.0G Aug 27 03:23 test-f001.vmdk
    4.0K -rw-r--r--. 1 nsoffer nsoffer 2.0G Aug 27 03:23 test-f002.vmdk
    4.0K -rw-r--r--. 1 nsoffer nsoffer  353 Aug 27 03:23 test.vmdk

I did quick performance test for copying disks with qemu-img convert to
new raw target image to Gluster storage with sector size of 512 bytes:

    for i in $(seq 10); do
        rm -f dst.raw
        sleep 10
        time ./qemu-img convert -f raw -O raw -t none -T none src.raw dst.raw
    done

Here is a table comparing the total time spent:

Type    Before(s)   After(s)    Diff(%)
---------------------------------------
real      530.028    469.123      -11.4
user       17.204     10.768      -37.4
sys        17.881      7.011      -60.7

We can see very clear improvement in CPU usage.

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
Message-id: 20190827010528.8818-2-nsoffer@redhat.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>

2019-09-03 14:55:35 +02:00

Max Reitz

2fab30c80b

iotests: Test unaligned raw images with O_DIRECT

We already have 221 for accesses through the page cache, but it is
better to create a new file for O_DIRECT instead of integrating those
test cases into 221.  This way, we can make use of
_supported_cache_modes (and _default_cache_mode) so the test is
automatically skipped on filesystems that do not support O_DIRECT.

As part of the split, add _supported_cache_modes to 221.  With that, it
no longer fails when run with -c none or -c directsync.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

2019-05-20 17:08:57 +02:00

3 Commits