Commit Graph

8 Commits

Author SHA1 Message Date
Kevin Wolf
ffa244c84a file-posix: Mitigate file fragmentation with extent size hints
Especially when O_DIRECT is used with image files so that the page cache
indirection can't cause a merge of allocating requests, the file will
fragment on the file system layer, with a potentially very small
fragment size (this depends on the requests the guest sent).

On Linux, fragmentation can be reduced by setting an extent size hint
when creating the file (at least on XFS, it can't be set any more after
the first extent has been allocated), basically giving raw files a
"cluster size" for allocation.

This adds a create option to set the extent size hint, and changes the
default from not setting a hint to setting it to 1 MB. The main reason
why qcow2 defaults to smaller cluster sizes is that COW becomes more
expensive, which is not an issue with raw files, so we can choose a
larger size. The tradeoff here is only potentially wasted disk space.

For qcow2 (or other image formats) over file-posix, the advantage should
even be greater because they grow sequentially without leaving holes, so
there won't be wasted space. Setting even larger extent size hints for
such images may make sense. This can be done with the new option, but
let's keep the default conservative for now.

The effect is very visible with a test that intentionally creates a
badly fragmented file with qemu-img bench (the time difference while
creating the file is already remarkable) and then looks at the number of
extents and the time a simple "qemu-img map" takes.

Without an extent size hint:

    $ ./qemu-img create -f raw -o extent_size_hint=0 ~/tmp/test.raw 10G
    Formatting '/home/kwolf/tmp/test.raw', fmt=raw size=10737418240 extent_size_hint=0
    $ ./qemu-img bench -f raw -t none -n -w ~/tmp/test.raw -c 1000000 -S 8192 -o 0
    Sending 1000000 write requests, 4096 bytes each, 64 in parallel (starting at offset 0, step size 8192)
    Run completed in 25.848 seconds.
    $ ./qemu-img bench -f raw -t none -n -w ~/tmp/test.raw -c 1000000 -S 8192 -o 4096
    Sending 1000000 write requests, 4096 bytes each, 64 in parallel (starting at offset 4096, step size 8192)
    Run completed in 19.616 seconds.
    $ filefrag ~/tmp/test.raw
    /home/kwolf/tmp/test.raw: 2000000 extents found
    $ time ./qemu-img map ~/tmp/test.raw
    Offset          Length          Mapped to       File
    0               0x1e8480000     0               /home/kwolf/tmp/test.raw

    real    0m1,279s
    user    0m0,043s
    sys     0m1,226s

With the new default extent size hint of 1 MB:

    $ ./qemu-img create -f raw -o extent_size_hint=1M ~/tmp/test.raw 10G
    Formatting '/home/kwolf/tmp/test.raw', fmt=raw size=10737418240 extent_size_hint=1048576
    $ ./qemu-img bench -f raw -t none -n -w ~/tmp/test.raw -c 1000000 -S 8192 -o 0
    Sending 1000000 write requests, 4096 bytes each, 64 in parallel (starting at offset 0, step size 8192)
    Run completed in 11.833 seconds.
    $ ./qemu-img bench -f raw -t none -n -w ~/tmp/test.raw -c 1000000 -S 8192 -o 4096
    Sending 1000000 write requests, 4096 bytes each, 64 in parallel (starting at offset 4096, step size 8192)
    Run completed in 10.155 seconds.
    $ filefrag ~/tmp/test.raw
    /home/kwolf/tmp/test.raw: 178 extents found
    $ time ./qemu-img map ~/tmp/test.raw
    Offset          Length          Mapped to       File
    0               0x1e8480000     0               /home/kwolf/tmp/test.raw

    real    0m0,061s
    user    0m0,040s
    sys     0m0,014s

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20200707142329.48303-1-kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-07-14 15:18:59 +02:00
Max Reitz
407fb56a8e iotests: Replace IMGOPTS= by -o
Tests should not overwrite all user-supplied image options, but only add
to it (which will effectively overwrite conflicting values).  Accomplish
this by passing options to _make_test_img via -o instead of $IMGOPTS.

For some tests, there is no functional change because they already only
appended options to IMGOPTS.  For these, this patch is just a
simplification.

For others, this is a change, so they now heed user-specified $IMGOPTS.
Some of those tests do not work with all image options, though, so we
need to disable them accordingly.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-id: 20191107163708.833192-12-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-01-06 13:43:07 +01:00
Nir Soffer
7e3dc2ba9a iotests: Test allocate_first_block() with O_DIRECT
Using block_resize we can test allocate_first_block() with file
descriptor opened with O_DIRECT, ensuring that it works for any size
larger than 4096 bytes.

Testing smaller sizes is tricky as the result depends on the filesystem
used for testing. For example on NFS any size will work since O_DIRECT
does not require any alignment.

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190827010528.8818-3-nsoffer@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-09-03 14:55:35 +02:00
Nir Soffer
3a20013fbb block: posix: Always allocate the first block
When creating an image with preallocation "off" or "falloc", the first
block of the image is typically not allocated. When using Gluster
storage backed by XFS filesystem, reading this block using direct I/O
succeeds regardless of request length, fooling alignment detection.

In this case we fallback to a safe value (4096) instead of the optimal
value (512), which may lead to unneeded data copying when aligning
requests.  Allocating the first block avoids the fallback.

Since we allocate the first block even with preallocation=off, we no
longer create images with zero disk size:

    $ ./qemu-img create -f raw test.raw 1g
    Formatting 'test.raw', fmt=raw size=1073741824

    $ ls -lhs test.raw
    4.0K -rw-r--r--. 1 nsoffer nsoffer 1.0G Aug 16 23:48 test.raw

And converting the image requires additional cluster:

    $ ./qemu-img measure -f raw -O qcow2 test.raw
    required size: 458752
    fully allocated size: 1074135040

When using format like vmdk with multiple files per image, we allocate
one block per file:

    $ ./qemu-img create -f vmdk -o subformat=twoGbMaxExtentFlat test.vmdk 4g
    Formatting 'test.vmdk', fmt=vmdk size=4294967296 compat6=off hwversion=undefined subformat=twoGbMaxExtentFlat

    $ ls -lhs test*.vmdk
    4.0K -rw-r--r--. 1 nsoffer nsoffer 2.0G Aug 27 03:23 test-f001.vmdk
    4.0K -rw-r--r--. 1 nsoffer nsoffer 2.0G Aug 27 03:23 test-f002.vmdk
    4.0K -rw-r--r--. 1 nsoffer nsoffer  353 Aug 27 03:23 test.vmdk

I did quick performance test for copying disks with qemu-img convert to
new raw target image to Gluster storage with sector size of 512 bytes:

    for i in $(seq 10); do
        rm -f dst.raw
        sleep 10
        time ./qemu-img convert -f raw -O raw -t none -T none src.raw dst.raw
    done

Here is a table comparing the total time spent:

Type    Before(s)   After(s)    Diff(%)
---------------------------------------
real      530.028    469.123      -11.4
user       17.204     10.768      -37.4
sys        17.881      7.011      -60.7

We can see very clear improvement in CPU usage.

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
Message-id: 20190827010528.8818-2-nsoffer@redhat.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-09-03 14:55:35 +02:00
Max Reitz
a3bd71b577 iotests: Filter 175's allocation information
It is possible for an empty file to take up blocks on a filesystem, for
example:

$ qemu-img create -f raw test.img 1G
Formatting 'test.img', fmt=raw size=1073741824
$ mkfs.ext4 -I 128 -q test.img
$ mkdir test-mount
$ sudo mount -o loop test.img test-mount
$ sudo touch test-mount/test-file
$ stat -c 'blocks=%b' test-mount/test-file
blocks=8

These extra blocks (one cluster) are apparently used for metadata,
because they are always there, on top of blocks used for data:

$ sudo dd if=/dev/zero of=test-mount/test-file bs=1M count=1
1+0 records in
1+0 records out
1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.00135339 s, 775 MB/s
$ stat -c 'blocks=%b' test-mount/test-file
blocks=2056

Make iotest 175 take this into account.

Reported-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Nir Soffer <nsoffer@redhat.com>
Message-id: 20190516144319.12570-1-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-06-14 14:16:57 +02:00
Philippe Mathieu-Daudé
11a82d1429 qemu-iotests: Improve portability by searching bash in the $PATH
Bash is not always installed as /bin/bash. In particular on OpenBSD,
the package installs it in /usr/local/bin.
Use the 'env' shebang to search bash in the $PATH.

Patch created mechanically by running:

  $ git grep -lE '#! ?/bin/bash' -- tests/qemu-iotests \
    | while read f; do \
      sed -i 's|^#!.\?/bin/bash$|#!/usr/bin/env bash|' $f; \
    done

Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-03-08 12:26:45 +01:00
Mao Zhongyi
bf22957309 qemu-iotests: remove unused variable 'here'
Running
git grep '\$here' tests/qemu-iotests

has 0 hits, which means we are setting a variable that has
no use.  It appears that commit e8f8624d removed the last
use.  So execute the following cmd to remove all of
the 'here=...' lines as dead code.

sed -i '/^here=/d' $(git grep -l '^here=' tests/qemu-iotests)

Cc: kwolf@redhat.com
Cc: mreitz@redhat.com
Cc: eblake@redhat.com
Suggested-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Mao Zhongyi <maozhongyi@cmss.chinamobile.com>
Message-Id: <20181024094051.4470-3-maozhongyi@cmss.chinamobile.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
[eblake: touch up commit message, reorder series, rebase to master]
Signed-off-by: Eric Blake <eblake@redhat.com>
2018-11-19 10:08:19 -06:00
Nir Soffer
6f993f3fca qemu-img: Add tests for raw image preallocation
Add tests for creating raw image with and without the preallocation
option.

Signed-off-by: Nir Soffer <nirsof@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-02-24 16:09:22 +01:00