qemu-e2k/block
Denis V. Lunev 459b4e6612 block: align bounce buffers to page
The following sequence
    int fd = open(argv[1], O_RDWR | O_CREAT | O_DIRECT, 0644);
    for (i = 0; i < 100000; i++)
            write(fd, buf, 4096);
performs 5% better if buf is aligned to 4096 bytes.

The difference is quite reliable.

On the other hand we do not want at the moment to enforce bounce
buffering if guest request is aligned to 512 bytes.

The patch changes default bounce buffer optimal alignment to
MAX(page size, 4k). 4k is chosen as maximal known sector size on real
HDD.

The justification of the performance improve is quite interesting.
From the kernel point of view each request to the disk was split
by two. This could be seen by blktrace like this:
  9,0   11  1     0.000000000 11151  Q  WS 312737792 + 1023 [qemu-img]
  9,0   11  2     0.000007938 11151  Q  WS 312738815 + 8 [qemu-img]
  9,0   11  3     0.000030735 11151  Q  WS 312738823 + 1016 [qemu-img]
  9,0   11  4     0.000032482 11151  Q  WS 312739839 + 8 [qemu-img]
  9,0   11  5     0.000041379 11151  Q  WS 312739847 + 1016 [qemu-img]
  9,0   11  6     0.000042818 11151  Q  WS 312740863 + 8 [qemu-img]
  9,0   11  7     0.000051236 11151  Q  WS 312740871 + 1017 [qemu-img]
  9,0    5  1     0.169071519 11151  Q  WS 312741888 + 1023 [qemu-img]
After the patch the pattern becomes normal:
  9,0    6  1     0.000000000 12422  Q  WS 314834944 + 1024 [qemu-img]
  9,0    6  2     0.000038527 12422  Q  WS 314835968 + 1024 [qemu-img]
  9,0    6  3     0.000072849 12422  Q  WS 314836992 + 1024 [qemu-img]
  9,0    6  4     0.000106276 12422  Q  WS 314838016 + 1024 [qemu-img]
and the amount of requests sent to disk (could be calculated counting
number of lines in the output of blktrace) is reduced about 2 times.

Both qemu-img and qemu-io are affected while qemu-kvm is not. The guest
does his job well and real requests comes properly aligned (to page).

Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-id: 1431441056-26198-3-git-send-email-den@openvz.org
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-05-22 09:37:33 +01:00
..
Makefile.objs block: move I/O request processing to block/io.c 2015-04-28 15:36:17 +02:00
accounting.c block: add accounting for merged requests 2015-02-06 17:24:21 +01:00
archipelago.c block: remove superfluous '\n' around error_report/error_setg 2015-03-10 08:15:33 +03:00
backup.c block: Ensure consistent bitmap function prototypes 2015-04-28 15:36:10 +02:00
blkdebug.c blkdebug: Add bdrv_truncate() 2015-04-28 15:36:09 +02:00
blkverify.c block: Rename BlockDriverCompletionFunc to BlockCompletionFunc 2014-10-20 13:41:27 +02:00
block-backend.c block-backend: Expose bdrv_write_zeroes() 2015-04-28 15:36:08 +02:00
bochs.c block: Use g_new() & friends to avoid multiplying sizes 2014-08-20 11:51:28 +02:00
cloop.c cloop: Handle failure for potentially large allocations 2014-08-15 15:07:15 +02:00
commit.c block: let commit blockjob run in BDS AioContext 2014-11-03 11:41:49 +00:00
curl.c block/curl: Improve type safety of s->timeout. 2014-11-03 11:41:47 +00:00
dmg.c block/dmg: improve zeroes handling 2015-02-06 17:24:21 +01:00
gluster.c block: don't convert file size to sector size 2014-09-12 15:43:06 +02:00
io.c block: align bounce buffers to page 2015-05-22 09:37:33 +01:00
iscsi.c block/iscsi: use the allocationmap also if cache.direct=on 2015-04-28 15:36:10 +02:00
linux-aio.c linux-aio: simplify removal of completed iocbs from the list 2014-12-12 16:57:55 +00:00
mirror.c block/mirror: Always call block_job_sleep_ns() 2015-04-28 15:36:11 +02:00
nbd-client.c nbd: Set block size to BDRV_SECTOR_SIZE 2015-03-18 12:07:01 +01:00
nbd-client.h nbd: Set block size to BDRV_SECTOR_SIZE 2015-03-18 12:07:01 +01:00
nbd.c nbd: Fix nbd_establish_connection()'s return value 2015-03-18 12:05:38 +01:00
nfs.c block/nfs: Add create_opts 2014-12-10 10:31:19 +01:00
null.c block/null: Support reopen 2015-04-28 15:36:09 +02:00
parallels.c block/parallels: improve image writing performance further 2015-05-22 09:37:32 +01:00
qapi.c qobject: Clean up around qtype_code 2015-05-11 08:59:07 -04:00
qcow.c block: use bdrv_get_device_or_node_name() in error messages 2015-04-28 15:36:09 +02:00
qcow2-cache.c block: Give always priority to unused entries in the qcow2 L2 cache 2015-02-06 17:24:22 +01:00
qcow2-cluster.c qcow2: Use 64 bits for refcount values 2015-03-10 14:02:21 +01:00
qcow2-refcount.c Convert (ffs(val) - 1) to ctz32(val) 2015-04-28 15:36:08 +02:00
qcow2-snapshot.c savevm: create snapshot failed when id_str already exists 2015-04-28 15:36:08 +02:00
qcow2.c block: add 'node-name' field to BLOCK_IMAGE_CORRUPTED 2015-04-28 15:36:09 +02:00
qcow2.h qcow2: Fix header update with overridden backing file 2015-04-08 10:29:20 +01:00
qed-check.c block: Use g_new() & friends to avoid multiplying sizes 2014-08-20 11:51:28 +02:00
qed-cluster.c Use glib memory allocation and free functions 2011-08-20 23:01:08 -05:00
qed-gencb.c block: Rename BlockDriverCompletionFunc to BlockCompletionFunc 2014-10-20 13:41:27 +02:00
qed-l2-cache.c qed: do not evict in-use L2 table cache entries 2012-03-12 15:14:06 +01:00
qed-table.c block: Rename BlockDriverCompletionFunc to BlockCompletionFunc 2014-10-20 13:41:27 +02:00
qed.c block: use bdrv_get_device_or_node_name() in error messages 2015-04-28 15:36:09 +02:00
qed.h qed: Really remove unused field QEDAIOCB.finished 2015-02-06 17:24:21 +01:00
quorum.c block: add bdrv_get_device_or_node_name() 2015-04-28 15:36:09 +02:00
raw-aio.h linux-aio: drop return code from laio_io_unplug and ioq_submit 2014-12-12 16:57:55 +00:00
raw-posix.c block: align bounce buffers to page 2015-05-22 09:37:33 +01:00
raw-win32.c block: Remove "growable" from BDS 2015-02-16 15:07:19 +00:00
raw_bsd.c block: Add driver methods to probe blocksizes and geometry 2015-03-10 14:02:22 +01:00
rbd.c Convert (ffs(val) - 1) to ctz32(val) 2015-04-28 15:36:08 +02:00
sheepdog.c sheepdog: fix resource leak with sd_snapshot_create 2015-05-08 14:11:10 +03:00
snapshot.c block: use bdrv_get_device_or_node_name() in error messages 2015-04-28 15:36:09 +02:00
ssh.c ssh: Don't crash if either host or path is not specified. 2014-10-03 10:30:33 +01:00
stream.c block: let stream blockjob run in BDS AioContext 2014-11-03 11:41:49 +00:00
vdi.c block: use bdrv_get_device_or_node_name() in error messages 2015-04-28 15:36:09 +02:00
vhdx-endian.c block: VHDX endian fixes 2014-08-15 15:07:14 +02:00
vhdx-log.c block: Drop some superfluous casts from void * 2014-08-20 11:51:28 +02:00
vhdx.c block: use bdrv_get_device_or_node_name() in error messages 2015-04-28 15:36:09 +02:00
vhdx.h block: vhdx - update PAYLOAD_BLOCK_UNMAPPED value to match 1.00 spec 2014-12-12 15:42:22 +00:00
vmdk.c vmdk: Widen before shifting 32 bit header field 2015-04-28 15:36:11 +02:00
vpc.c block: use bdrv_get_device_or_node_name() in error messages 2015-04-28 15:36:09 +02:00
vvfat.c block: use bdrv_get_device_or_node_name() in error messages 2015-04-28 15:36:09 +02:00
win32-aio.c block: Rename BlockDriverCompletionFunc to BlockCompletionFunc 2014-10-20 13:41:27 +02:00
write-threshold.c block: Fix block-set-write-threshold not to use funky error class 2015-03-16 17:07:25 +01:00