qemu-e2k/block
Fam Zheng 9103f1ceb4 file-posix: Consider max_segments for BlockLimits.max_transfer
BlockLimits.max_transfer can be too high without this fix, guest will
encounter I/O error or even get paused with werror=stop or rerror=stop. The
cause is explained below.

Linux has a separate limit, /sys/block/.../queue/max_segments, which in
the worst case can be more restrictive than the BLKSECTGET which we
already consider (note that they are two different things). So, the
failure scenario before this patch is:

1) host device has max_sectors_kb = 4096 and max_segments = 64;
2) guest learns max_sectors_kb limit from QEMU, but doesn't know
   max_segments;
3) guest issues e.g. a 512KB request thinking it's okay, but actually
   it's not, because it will be passed through to host device as an
   SG_IO req that has niov > 64;
4) host kernel doesn't like the segmenting of the request, and returns
   -EINVAL;

This patch checks the max_segments sysfs entry for the host device and
calculates a "conservative" bytes limit using the page size, which is
then merged into the existing max_transfer limit. Guest will discover
this from the usual virtual block device interfaces. (In the case of
scsi-generic, it will be done in the INQUIRY reply interception in
device model.)

The other possibility is to actually propagate it as a separate limit,
but it's not better. On the one hand, there is a big complication: the
limit is per-LUN in QEMU PoV (because we can attach LUNs from different
host HBAs to the same virtio-scsi bus), but the channel to communicate
it in a per-LUN manner is missing down the stack; on the other hand,
two limits versus one doesn't change much about the valid size of I/O
(because guest has no control over host segmenting).

Also, the idea to fall back to bounce buffering in QEMU, upon -EINVAL,
was explored. Unfortunately there is no neat way to ensure the bounce
buffer is less segmented (in terms of DMA addr) than the guest buffer.

Practically, this bug is not very common. It is only reported on a
Emulex (lpfc), so it's okay to get it fixed in the easier way.

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-03-13 12:49:33 +01:00
..
accounting.c
archipelago.c block: explicitly acquire aiocontext in aio callbacks that need it 2017-02-21 11:39:39 +00:00
backup.c backup: allow target without .bdrv_get_info 2017-03-13 12:49:33 +01:00
blkdebug.c block: Request child permissions in filter drivers 2017-02-28 20:40:36 +01:00
blkreplay.c block: Request child permissions in filter drivers 2017-02-28 20:40:36 +01:00
blkverify.c block: Request child permissions in filter drivers 2017-02-28 20:40:36 +01:00
block-backend.c block: Don't use error_abort in blk_new_open 2017-03-07 14:53:29 +01:00
bochs.c block: Request child permissions in format drivers 2017-02-28 20:40:36 +01:00
cloop.c block: Request child permissions in format drivers 2017-02-28 20:40:36 +01:00
commit.c commit: Don't use error_abort in commit_start 2017-03-07 14:53:29 +01:00
crypto.c block: Request child permissions in format drivers 2017-02-28 20:40:36 +01:00
curl.c curl: do not use aio_context_acquire/release 2017-02-27 13:33:24 +00:00
dirty-bitmap.c block: More operations for meta dirty bitmap 2016-10-24 17:56:07 +02:00
dmg-bz2.c dmg: Move libbz2 code to dmg-bz2.so 2016-10-07 14:14:06 +02:00
dmg.c block: Request child permissions in format drivers 2017-02-28 20:40:36 +01:00
dmg.h dmg: Move libbz2 code to dmg-bz2.so 2016-10-07 14:14:06 +02:00
file-posix.c file-posix: Consider max_segments for BlockLimits.max_transfer 2017-03-13 12:49:33 +01:00
file-win32.c block: Rename raw-{posix,win32} to file-*.c 2017-01-09 13:30:53 +01:00
gluster.c qapi-schema: Rename SocketAddressFlat's variant tcp to inet 2017-03-07 14:53:29 +01:00
io.c block: Assertions for resize permission 2017-02-28 20:47:50 +01:00
iscsi-opts.c block/iscsi: statically link qemu_iscsi_opts 2017-01-27 18:07:58 +01:00
iscsi.c iscsi: fix missing unlock 2017-03-03 16:41:20 +01:00
linux-aio.c block: explicitly acquire aiocontext in aio callbacks that need it 2017-02-21 11:39:39 +00:00
Makefile.objs block/iscsi: statically link qemu_iscsi_opts 2017-01-27 18:07:58 +01:00
mirror.c block: Fix error handling in bdrv_replace_in_backing_chain() 2017-03-07 14:53:28 +01:00
nbd-client.c coroutine-lock: add mutex argument to CoQueue APIs 2017-02-21 11:39:40 +00:00
nbd-client.h nbd: convert to use qio_channel_yield 2017-02-21 11:14:08 +00:00
nbd.c qapi: Drop unused non-strict qobject input visitor 2017-03-05 09:14:19 +01:00
nfs.c qapi: Drop unused non-strict qobject input visitor 2017-03-05 09:14:19 +01:00
null.c block: explicitly acquire aiocontext in aio callbacks that need it 2017-02-21 11:39:39 +00:00
parallels.c block: Add BDRV_O_RESIZE for blk_new_open() 2017-02-28 20:40:36 +01:00
qapi.c block: Don't bother asserting type of output visitor's output 2017-02-22 19:52:20 +01:00
qcow2-cache.c qcow2: Remove stale comment 2016-11-25 13:51:30 +01:00
qcow2-cluster.c coroutine-lock: add mutex argument to CoQueue APIs 2017-02-21 11:39:40 +00:00
qcow2-refcount.c block: Pass BdrvChild to bdrv_truncate() 2017-02-24 16:09:23 +01:00
qcow2-snapshot.c block: Convert bdrv_pwrite(v/_sync) to BdrvChild 2016-07-05 16:46:27 +02:00
qcow2.c block: Add BDRV_O_RESIZE for blk_new_open() 2017-02-28 20:40:36 +01:00
qcow2.h qcow2: Optimize the refcount-block overlap check 2017-02-12 00:47:43 +01:00
qcow.c block: Add BDRV_O_RESIZE for blk_new_open() 2017-02-28 20:40:36 +01:00
qed-check.c
qed-cluster.c block: explicitly acquire aiocontext in aio callbacks that need it 2017-02-21 11:39:39 +00:00
qed-gencb.c
qed-l2-cache.c
qed-table.c block: explicitly acquire aiocontext in aio callbacks that need it 2017-02-21 11:39:39 +00:00
qed.c block: Add BDRV_O_RESIZE for blk_new_open() 2017-02-28 20:40:36 +01:00
qed.h block: explicitly acquire aiocontext in timers that need it 2017-02-21 11:14:08 +00:00
quorum.c block: Request child permissions in filter drivers 2017-02-28 20:40:36 +01:00
raw-format.c block: Request child permissions in filter drivers 2017-02-28 20:40:36 +01:00
rbd.c block/rbd: add support for 'mon_host', 'auth_supported' via QAPI 2017-03-01 22:39:25 -05:00
replication.c commit: Add filter-node-name to block-commit 2017-02-28 20:47:50 +01:00
sheepdog.c sheepdog: Implement bdrv_parse_filename() 2017-03-07 14:53:28 +01:00
snapshot.c
ssh.c qapi: Drop unused non-strict qobject input visitor 2017-03-05 09:14:19 +01:00
stream.c block: Add Error parameter to bdrv_set_backing_hd() 2017-02-28 20:47:51 +01:00
throttle-groups.c coroutine-lock: add mutex argument to CoQueue APIs 2017-02-21 11:39:40 +00:00
trace-events trace: clean up trace-events files 2017-01-31 17:12:15 +00:00
vdi.c block: Add BDRV_O_RESIZE for blk_new_open() 2017-02-28 20:40:36 +01:00
vhdx-endian.c vhdx: Use QEMU UUID API 2016-09-23 11:42:52 +08:00
vhdx-log.c block: Pass BdrvChild to bdrv_truncate() 2017-02-24 16:09:23 +01:00
vhdx.c block: Add BDRV_O_RESIZE for blk_new_open() 2017-02-28 20:40:36 +01:00
vhdx.h
vmdk.c block: Add BDRV_O_RESIZE for blk_new_open() 2017-02-28 20:40:36 +01:00
vpc.c block: Add BDRV_O_RESIZE for blk_new_open() 2017-02-28 20:40:36 +01:00
vvfat.c block: Add Error parameter to bdrv_set_backing_hd() 2017-02-28 20:47:51 +01:00
win32-aio.c block: explicitly acquire aiocontext in aio callbacks that need it 2017-02-21 11:39:39 +00:00
write-threshold.c block: use bdrv_add_before_write_notifier 2016-10-07 13:34:07 +02:00