qemu-e2k

History

Fam Zheng ab27c3b5e7 mirror: Workaround for unexpected iohandler events during completion Commit `5a7e7a0ba` moved mirror_exit to a BH handler but didn't add any protection against new requests that could sneak in just before the BH is dispatched. For example (assuming a code base at that commit): main_loop_wait # 1 os_host_main_loop_wait g_main_context_dispatch aio_ctx_dispatch aio_dispatch ... mirror_run bdrv_drain (a) block_job_defer_to_main_loop qemu_iohandler_poll virtio_queue_host_notifier_read ... virtio_submit_multiwrite (b) blk_aio_multiwrite main_loop_wait # 2 <snip> aio_dispatch aio_bh_poll (c) mirror_exit At (a) we know the BDS has no pending request. However, the same main_loop_wait call is going to dispatch iohandlers (EventNotifier events), which may lead to a new I/O from guest. So the invariant is already broken at (c). Data loss. Commit `f3926945c8` made iohandler to use aio API. The order of virtio_queue_host_notifier_read and block_job_defer_to_main_loop within a main_loop_wait becomes unpredictable, and even worse, if the host notifier event arrives at the next main_loop_wait call, the unpredictable order between mirror_exit and virtio_queue_host_notifier_read is also a trouble. As shown below, this commit made the bug easier to trigger: - Bug case 1: main_loop_wait # 1 os_host_main_loop_wait g_main_context_dispatch aio_ctx_dispatch (qemu_aio_context) ... mirror_run bdrv_drain (a) block_job_defer_to_main_loop aio_ctx_dispatch (iohandler_ctx) virtio_queue_host_notifier_read ... virtio_submit_multiwrite (b) blk_aio_multiwrite main_loop_wait # 2 ... aio_dispatch aio_bh_poll (c) mirror_exit - Bug case 2: main_loop_wait # 1 os_host_main_loop_wait g_main_context_dispatch aio_ctx_dispatch (qemu_aio_context) ... mirror_run bdrv_drain (a) block_job_defer_to_main_loop main_loop_wait # 2 ... aio_ctx_dispatch (iohandler_ctx) virtio_queue_host_notifier_read ... virtio_submit_multiwrite (b) blk_aio_multiwrite aio_dispatch aio_bh_poll (c) mirror_exit In both cases, (b) breaks the invariant wanted by (a) and (c). Until then, the request loss has been silent. Later, `3f09bfbc7b` added asserts at (c) to check the invariant (in bdrv_replace_in_backing_chain), and Max reported an assertion failure first visible there, by doing active committing while the guest is running bonnie++. 2.5 added bdrv_drained_begin at (a) to protect the dataplane case from similar problems, but we never realize the main loop bug until now. As a bandage, this patch disables iohandler's external events temporarily together with bs->ctx. Launchpad Bug: 1570134 Cc: qemu-stable@nongnu.org Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>		2016-04-22 16:44:09 +02:00
..
accounting.c
archipelago.c
backup.c
blkdebug.c
blkreplay.c
blkverify.c
block-backend.c
bochs.c
cloop.c
commit.c
crypto.c
curl.c
dirty-bitmap.c
dmg.c
gluster.c
io.c
iscsi.c
linux-aio.c
Makefile.objs
mirror.c
nbd-client.c
nbd-client.h
nbd.c
nfs.c
null.c
parallels.c
qapi.c
qcow2-cache.c
qcow2-cluster.c
qcow2-refcount.c
qcow2-snapshot.c
qcow2.c
qcow2.h
qcow.c
qed-check.c
qed-cluster.c
qed-gencb.c
qed-l2-cache.c
qed-table.c
qed.c
qed.h
quorum.c
raw_bsd.c
raw-aio.h
raw-posix.c
raw-win32.c
rbd.c
sheepdog.c
snapshot.c
ssh.c
stream.c
throttle-groups.c
vdi.c
vhdx-endian.c
vhdx-log.c
vhdx.c
vhdx.h
vmdk.c
vpc.c
vvfat.c
win32-aio.c
write-threshold.c