qemu-e2k/block
Kevin Wolf ffa244c84a file-posix: Mitigate file fragmentation with extent size hints
Especially when O_DIRECT is used with image files so that the page cache
indirection can't cause a merge of allocating requests, the file will
fragment on the file system layer, with a potentially very small
fragment size (this depends on the requests the guest sent).

On Linux, fragmentation can be reduced by setting an extent size hint
when creating the file (at least on XFS, it can't be set any more after
the first extent has been allocated), basically giving raw files a
"cluster size" for allocation.

This adds a create option to set the extent size hint, and changes the
default from not setting a hint to setting it to 1 MB. The main reason
why qcow2 defaults to smaller cluster sizes is that COW becomes more
expensive, which is not an issue with raw files, so we can choose a
larger size. The tradeoff here is only potentially wasted disk space.

For qcow2 (or other image formats) over file-posix, the advantage should
even be greater because they grow sequentially without leaving holes, so
there won't be wasted space. Setting even larger extent size hints for
such images may make sense. This can be done with the new option, but
let's keep the default conservative for now.

The effect is very visible with a test that intentionally creates a
badly fragmented file with qemu-img bench (the time difference while
creating the file is already remarkable) and then looks at the number of
extents and the time a simple "qemu-img map" takes.

Without an extent size hint:

    $ ./qemu-img create -f raw -o extent_size_hint=0 ~/tmp/test.raw 10G
    Formatting '/home/kwolf/tmp/test.raw', fmt=raw size=10737418240 extent_size_hint=0
    $ ./qemu-img bench -f raw -t none -n -w ~/tmp/test.raw -c 1000000 -S 8192 -o 0
    Sending 1000000 write requests, 4096 bytes each, 64 in parallel (starting at offset 0, step size 8192)
    Run completed in 25.848 seconds.
    $ ./qemu-img bench -f raw -t none -n -w ~/tmp/test.raw -c 1000000 -S 8192 -o 4096
    Sending 1000000 write requests, 4096 bytes each, 64 in parallel (starting at offset 4096, step size 8192)
    Run completed in 19.616 seconds.
    $ filefrag ~/tmp/test.raw
    /home/kwolf/tmp/test.raw: 2000000 extents found
    $ time ./qemu-img map ~/tmp/test.raw
    Offset          Length          Mapped to       File
    0               0x1e8480000     0               /home/kwolf/tmp/test.raw

    real    0m1,279s
    user    0m0,043s
    sys     0m1,226s

With the new default extent size hint of 1 MB:

    $ ./qemu-img create -f raw -o extent_size_hint=1M ~/tmp/test.raw 10G
    Formatting '/home/kwolf/tmp/test.raw', fmt=raw size=10737418240 extent_size_hint=1048576
    $ ./qemu-img bench -f raw -t none -n -w ~/tmp/test.raw -c 1000000 -S 8192 -o 0
    Sending 1000000 write requests, 4096 bytes each, 64 in parallel (starting at offset 0, step size 8192)
    Run completed in 11.833 seconds.
    $ ./qemu-img bench -f raw -t none -n -w ~/tmp/test.raw -c 1000000 -S 8192 -o 4096
    Sending 1000000 write requests, 4096 bytes each, 64 in parallel (starting at offset 4096, step size 8192)
    Run completed in 10.155 seconds.
    $ filefrag ~/tmp/test.raw
    /home/kwolf/tmp/test.raw: 178 extents found
    $ time ./qemu-img map ~/tmp/test.raw
    Offset          Length          Mapped to       File
    0               0x1e8480000     0               /home/kwolf/tmp/test.raw

    real    0m0,061s
    user    0m0,040s
    sys     0m0,014s

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20200707142329.48303-1-kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-07-14 15:18:59 +02:00
..
monitor blockdev: Split off basic bitmap operations for qemu-img 2020-05-19 10:32:14 -05:00
accounting.c
aio_task.c
amend.c block/core: add generic infrastructure for x-blockdev-amend qmp command 2020-07-06 08:49:28 +02:00
backup-top.c block: Drop @child_class from bdrv_child_perm() 2020-05-18 19:05:25 +02:00
backup-top.h
backup.c backup: Make sure that source and target size match 2020-05-08 13:26:35 +02:00
blkdebug.c error: Eliminate error_propagate() with Coccinelle, part 2 2020-07-10 15:18:08 +02:00
blklogwrites.c error: Eliminate error_propagate() with Coccinelle, part 2 2020-07-10 15:18:08 +02:00
blkreplay.c block: Use bdrv_default_perms() 2020-05-18 19:05:25 +02:00
blkverify.c error: Eliminate error_propagate() with Coccinelle, part 2 2020-07-10 15:18:08 +02:00
block-backend.c block: Pass BdrvChildRole in remaining cases 2020-05-18 19:05:25 +02:00
block-copy.c block/block-copy: block_copy_dirty_clusters: fix failure check 2020-07-06 08:33:06 +02:00
bochs.c block: Use bdrv_default_perms() 2020-05-18 19:05:25 +02:00
cloop.c block: Use bdrv_default_perms() 2020-05-18 19:05:25 +02:00
commit.c block: Drop @child_class from bdrv_child_perm() 2020-05-18 19:05:25 +02:00
copy-on-read.c block: Drop @child_class from bdrv_child_perm() 2020-05-18 19:05:25 +02:00
create.c
crypto.c error: Eliminate error_propagate() with Coccinelle, part 2 2020-07-10 15:18:08 +02:00
crypto.h block/crypto: implement the encryption key management 2020-07-06 08:49:28 +02:00
curl.c error: Eliminate error_propagate() with Coccinelle, part 1 2020-07-10 15:18:08 +02:00
dirty-bitmap.c block/dirty-bitmap: add bdrv_has_named_bitmaps helper 2020-05-28 13:15:22 -05:00
dmg-bz2.c
dmg-lzfse.c
dmg.c block: Use bdrv_default_perms() 2020-05-18 19:05:25 +02:00
dmg.h
file-posix.c file-posix: Mitigate file fragmentation with extent size hints 2020-07-14 15:18:59 +02:00
file-win32.c error: Eliminate error_propagate() with Coccinelle, part 2 2020-07-10 15:18:08 +02:00
filter-compress.c block: Use bdrv_default_perms() 2020-05-18 19:05:25 +02:00
gluster.c error: Reduce unnecessary error propagation 2020-07-10 15:18:08 +02:00
io_uring.c io_uring: use io_uring_cq_ready() to check for ready cqes 2020-06-05 09:54:48 +01:00
io.c block: drop unallocated_blocks_are_zero 2020-07-06 10:34:14 +02:00
iscsi-opts.c
iscsi.c iscsi: return -EIO when sense fields are meaningless 2020-07-10 18:02:23 -04:00
linux-aio.c misc: Replace zero-length arrays with flexible array member (automatic) 2020-03-16 22:07:42 +01:00
Makefile.objs block/core: add generic infrastructure for x-blockdev-amend qmp command 2020-07-06 08:49:28 +02:00
mirror.c block: Drop @child_class from bdrv_child_perm() 2020-05-18 19:05:25 +02:00
nbd.c nbd: Use ERRP_GUARD() 2020-07-10 15:18:09 +02:00
nfs.c qapi: Smooth another visitor error checking pattern 2020-07-10 15:18:08 +02:00
null.c
nvme.c block/nvme: support nested aio_poll() 2020-06-23 15:46:08 +01:00
parallels.c error: Avoid error_propagate() after migrate_add_blocker() 2020-07-10 15:18:08 +02:00
parallels.h
qapi-sysemu.c block: Move system emulator QMP commands to block/qapi-sysemu.c 2020-03-06 17:15:38 +01:00
qapi.c block: Fix VM size field width in snapshot dump 2020-02-20 16:43:42 +01:00
qcow2-bitmap.c qcow2: Tweak comments on qcow2_get_persistent_dirty_bitmap_size 2020-06-17 14:53:39 +02:00
qcow2-cache.c
qcow2-cluster.c qcow2: Support BDRV_REQ_ZERO_WRITE for truncate 2020-04-30 17:51:07 +02:00
qcow2-refcount.c block: Comment cleanups 2020-05-05 13:17:36 +02:00
qcow2-snapshot.c qcow2: Allow resize of images with internal snapshots 2020-05-05 13:17:36 +02:00
qcow2-threads.c qcow2: add zstd cluster compression 2020-05-13 14:20:31 +02:00
qcow2.c qapi: Smooth another visitor error checking pattern 2020-07-10 15:18:08 +02:00
qcow2.h qcow2: Expose bitmaps' size during measure 2020-05-28 13:16:16 -05:00
qcow.c error: Avoid error_propagate() after migrate_add_blocker() 2020-07-10 15:18:08 +02:00
qed-check.c
qed-cluster.c
qed-l2-cache.c
qed-table.c
qed.c qapi: Smooth another visitor error checking pattern 2020-07-10 15:18:08 +02:00
qed.h qed: Simplify backing reads 2020-07-06 10:34:14 +02:00
quorum.c error: Reduce unnecessary error propagation 2020-07-10 15:18:08 +02:00
raw-format.c error: Eliminate error_propagate() with Coccinelle, part 2 2020-07-10 15:18:08 +02:00
rbd.c qapi: Smooth another visitor error checking pattern 2020-07-10 15:18:08 +02:00
replication.c error: Reduce unnecessary error propagation 2020-07-10 15:18:08 +02:00
sheepdog.c qapi: Smooth another visitor error checking pattern 2020-07-10 15:18:08 +02:00
snapshot.c block/snapshot: rename Error ** parameter to more common errp 2019-12-18 08:43:19 +01:00
ssh.c qapi: Smooth another visitor error checking pattern 2020-07-10 15:18:08 +02:00
stream.c block/stream: Remove redundant statement in stream_run() 2020-03-09 15:59:31 +01:00
throttle-groups.c error: Eliminate error_propagate() with Coccinelle, part 1 2020-07-10 15:18:08 +02:00
throttle.c error: Eliminate error_propagate() with Coccinelle, part 2 2020-07-10 15:18:08 +02:00
trace-events block/nvme: support nested aio_poll() 2020-06-23 15:46:08 +01:00
vdi.c error: Avoid error_propagate() after migrate_add_blocker() 2020-07-10 15:18:08 +02:00
vhdx-endian.c
vhdx-log.c block: Add flags to bdrv(_co)_truncate() 2020-04-30 17:51:07 +02:00
vhdx.c error: Avoid error_propagate() after migrate_add_blocker() 2020-07-10 15:18:08 +02:00
vhdx.h
vmdk.c error: Avoid error_propagate() after migrate_add_blocker() 2020-07-10 15:18:08 +02:00
vpc.c error: Avoid error_propagate() after migrate_add_blocker() 2020-07-10 15:18:08 +02:00
vvfat.c error: Avoid error_propagate() after migrate_add_blocker() 2020-07-10 15:18:08 +02:00
vxhs.c error: Reduce unnecessary error propagation 2020-07-10 15:18:08 +02:00
win32-aio.c
write-threshold.c