qemu-e2k

History

Bin Wu 141cabe6f1 nbd: fix the co_queue multi-adding bug When we tested the VM migartion between different hosts with NBD devices, we found if we sent a cancel command after the drive_mirror was just started, a coroutine re-enter error would occur. The stack was as follow: (gdb) bt 00) 0x00007fdfc744d885 in raise () from /lib64/libc.so.6 01) 0x00007fdfc744ee61 in abort () from /lib64/libc.so.6 02) 0x00007fdfca467cc5 in qemu_coroutine_enter (co=0x7fdfcaedb400, opaque=0x0) at qemu-coroutine.c:118 03) 0x00007fdfca467f6c in qemu_co_queue_run_restart (co=0x7fdfcaedb400) at qemu-coroutine-lock.c:59 04) 0x00007fdfca467be5 in coroutine_swap (from=0x7fdfcaf3c4e8, to=0x7fdfcaedb400) at qemu-coroutine.c:96 05) 0x00007fdfca467cea in qemu_coroutine_enter (co=0x7fdfcaedb400, opaque=0x0) at qemu-coroutine.c:123 06) 0x00007fdfca467f6c in qemu_co_queue_run_restart (co=0x7fdfcaedbdc0) at qemu-coroutine-lock.c:59 07) 0x00007fdfca467be5 in coroutine_swap (from=0x7fdfcaf3c4e8, to=0x7fdfcaedbdc0) at qemu-coroutine.c:96 08) 0x00007fdfca467cea in qemu_coroutine_enter (co=0x7fdfcaedbdc0, opaque=0x0) at qemu-coroutine.c:123 09) 0x00007fdfca4a1fa4 in nbd_recv_coroutines_enter_all (s=0x7fdfcaef7dd0) at block/nbd-client.c:41 10) 0x00007fdfca4a1ff9 in nbd_teardown_connection (client=0x7fdfcaef7dd0) at block/nbd-client.c:50 11) 0x00007fdfca4a20f0 in nbd_reply_ready (opaque=0x7fdfcaef7dd0) at block/nbd-client.c:92 12) 0x00007fdfca45ed80 in aio_dispatch (ctx=0x7fdfcae15e90) at aio-posix.c:144 13) 0x00007fdfca45ef1b in aio_poll (ctx=0x7fdfcae15e90, blocking=false) at aio-posix.c:222 14) 0x00007fdfca448c34 in aio_ctx_dispatch (source=0x7fdfcae15e90, callback=0x0, user_data=0x0) at async.c:212 15) 0x00007fdfc8f2f69a in g_main_context_dispatch () from /usr/lib64/libglib-2.0.so.0 16) 0x00007fdfca45c391 in glib_pollfds_poll () at main-loop.c:190 17) 0x00007fdfca45c489 in os_host_main_loop_wait (timeout=1483677098) at main-loop.c:235 18) 0x00007fdfca45c57b in main_loop_wait (nonblocking=0) at main-loop.c:484 19) 0x00007fdfca25f403 in main_loop () at vl.c:2249 20) 0x00007fdfca266fc2 in main (argc=42, argv=0x7ffff517d638, envp=0x7ffff517d790) at vl.c:4814 We find the nbd_recv_coroutines_enter_all function (triggered by a cancel command or a network connection breaking down) will enter a coroutine which is waiting for the sending lock. If the lock is still held by another coroutine, the entering coroutine will be added into the co_queue again. Latter, when the lock is released, a coroutine re-enter error will occur. This bug can be fixed simply by delaying the setting of recv_coroutine as suggested by paolo. After applying this patch, we have tested the cancel operation in mirror phase looply for more than 5 hous and everything is fine. Without this patch, a coroutine re-enter error will occur in 5 minutes. Signed-off-by: Bn Wu <wu.wubin@huawei.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 1423552846-3896-1-git-send-email-wu.wubin@huawei.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>		2015-02-16 15:07:17 +00:00
..
accounting.c	block: add accounting for merged requests	2015-02-06 17:24:21 +01:00
archipelago.c	block: Rename BlockDriverCompletionFunc to BlockCompletionFunc	2014-10-20 13:41:27 +02:00
backup.c	qmp: Add command 'blockdev-backup'	2015-01-13 11:47:56 +00:00
blkdebug.c	blkdebug: Simplify and improve filename generation	2014-12-10 10:31:11 +01:00
blkverify.c	block: Rename BlockDriverCompletionFunc to BlockCompletionFunc	2014-10-20 13:41:27 +02:00
block-backend.c	block-backend: expose bs->bl.max_transfer_length	2015-02-06 17:24:21 +01:00
bochs.c	block: Use g_new() & friends to avoid multiplying sizes	2014-08-20 11:51:28 +02:00
cloop.c	cloop: Handle failure for potentially large allocations	2014-08-15 15:07:15 +02:00
commit.c	block: let commit blockjob run in BDS AioContext	2014-11-03 11:41:49 +00:00
curl.c	block/curl: Improve type safety of s->timeout.	2014-11-03 11:41:47 +00:00
dmg.c	block/dmg: improve zeroes handling	2015-02-06 17:24:21 +01:00
gluster.c	block: don't convert file size to sector size	2014-09-12 15:43:06 +02:00
iscsi.c	block/iscsi: fix uninitialized variable	2015-01-03 09:22:13 +01:00
linux-aio.c	linux-aio: simplify removal of completed iocbs from the list	2014-12-12 16:57:55 +00:00
Makefile.objs	block/dmg: support bzip2 block entry types	2015-02-06 17:24:21 +01:00
mirror.c	block: mirror - change string allocation to 2-bytes	2015-01-23 18:17:06 +01:00
nbd-client.c	nbd: fix the co_queue multi-adding bug	2015-02-16 15:07:17 +00:00
nbd-client.h	nbd: Drop BDS backpointer	2015-02-16 14:36:03 +00:00
nbd.c	nbd: Drop BDS backpointer	2015-02-16 14:36:03 +00:00
nfs.c	block/nfs: Add create_opts	2014-12-10 10:31:19 +01:00
null.c	block: Rename BlockDriverCompletionFunc to BlockCompletionFunc	2014-10-20 13:41:27 +02:00
parallels.c	block/parallels: fix access to not initialized memory in catalog_bitmap	2014-11-03 09:48:41 +00:00
qapi.c	block: add event when disk usage exceeds threshold	2015-02-06 17:24:21 +01:00
qcow2-cache.c	block: Give always priority to unused entries in the qcow2 L2 cache	2015-02-06 17:24:22 +01:00
qcow2-cluster.c	qcow2: Add two more unalignment checks	2015-01-23 18:17:05 +01:00
qcow2-refcount.c	qcow2: Rewrite qcow2_alloc_bytes()	2015-02-06 17:24:22 +01:00
qcow2-snapshot.c	qcow2: Allow "full" discard	2014-11-03 11:41:47 +00:00
qcow2.c	block: fix off-by-one error in qcow and qcow2	2015-02-06 17:24:21 +01:00
qcow2.h	block/qcow2: Make get_refcount() global	2014-11-03 11:41:49 +00:00
qcow.c	block: fix off-by-one error in qcow and qcow2	2015-02-06 17:24:21 +01:00
qed-check.c	block: Use g_new() & friends to avoid multiplying sizes	2014-08-20 11:51:28 +02:00
qed-cluster.c
qed-gencb.c	block: Rename BlockDriverCompletionFunc to BlockCompletionFunc	2014-10-20 13:41:27 +02:00
qed-l2-cache.c	qed: do not evict in-use L2 table cache entries	2012-03-12 15:14:06 +01:00
qed-table.c	block: Rename BlockDriverCompletionFunc to BlockCompletionFunc	2014-10-20 13:41:27 +02:00
qed.c	qed: check for header size overflow	2015-02-06 17:24:21 +01:00
qed.h	qed: Really remove unused field QEDAIOCB.finished	2015-02-06 17:24:21 +01:00
quorum.c	block: Rename BlockDriverCompletionFunc to BlockCompletionFunc	2014-10-20 13:41:27 +02:00
raw_bsd.c	block: Make essential BlockDriver objects public	2014-12-10 10:31:19 +01:00
raw-aio.h	linux-aio: drop return code from laio_io_unplug and ioq_submit	2014-12-12 16:57:55 +00:00
raw-posix.c	block/raw-posix.c: Fix raw_getlength() on Mac OS X block devices	2015-02-06 18:00:53 +01:00
raw-win32.c	block: Make essential BlockDriver objects public	2014-12-10 10:31:19 +01:00
rbd.c	block/rbd: fix memory leak	2014-12-12 13:16:56 +00:00
sheepdog.c	block: Rename BlockDriverAIOCB* to BlockAIOCB*	2014-10-20 13:41:27 +02:00
snapshot.c	snapshot: add bdrv_drain_all() to bdrv_snapshot_delete() to avoid concurrency problem	2014-11-03 09:48:42 +00:00
ssh.c	ssh: Don't crash if either host or path is not specified.	2014-10-03 10:30:33 +01:00
stream.c	block: let stream blockjob run in BDS AioContext	2014-11-03 11:41:49 +00:00
vdi.c	block: remove BLOCK_OPT_NOCOW from vdi_create_opts	2014-12-10 10:31:20 +01:00
vhdx-endian.c	block: VHDX endian fixes	2014-08-15 15:07:14 +02:00
vhdx-log.c	block: Drop some superfluous casts from void *	2014-08-20 11:51:28 +02:00
vhdx.c	block: vhdx - force FileOffsetMB field to '0' for certain block states	2015-01-23 12:41:32 -05:00
vhdx.h	block: vhdx - update PAYLOAD_BLOCK_UNMAPPED value to match 1.00 spec	2014-12-12 15:42:22 +00:00
vmdk.c	block: vmdk - move string allocations from stack to the heap	2015-01-23 18:17:05 +01:00
vpc.c	block: remove BLOCK_OPT_NOCOW from vpc_create_opts	2014-12-10 10:31:21 +01:00
vvfat.c	block: update string sizes for filename,backing_file,exact_filename	2015-01-23 18:17:06 +01:00
win32-aio.c	block: Rename BlockDriverCompletionFunc to BlockCompletionFunc	2014-10-20 13:41:27 +02:00
write-threshold.c	block: add event when disk usage exceeds threshold	2015-02-06 17:24:21 +01:00