qemu-e2k/migration
Li Zhijian b390afd8c5 migration/rdma: Fix out of order wrid
destination:
../qemu/build/qemu-system-x86_64 -enable-kvm -netdev tap,id=hn0,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown -device e1000,netdev=hn0,mac=50:52:54:00:11:22 -boot c -drive if=none,file=./Fedora-rdma-server-migration.qcow2,id=drive-virtio-disk0 -device virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 -m 2048 -smp 2 -device piix3-usb-uhci -device usb-tablet -monitor stdio -vga qxl -spice streaming-video=filter,port=5902,disable-ticketing -incoming rdma:192.168.22.23:8888
qemu-system-x86_64: -spice streaming-video=filter,port=5902,disable-ticketing: warning: short-form boolean option 'disable-ticketing' deprecated
Please use disable-ticketing=on instead
QEMU 6.0.50 monitor - type 'help' for more information
(qemu) trace-event qemu_rdma_block_for_wrid_miss on
(qemu) dest_init RDMA Device opened: kernel name rxe_eth0 uverbs device name uverbs2, infiniband_verbs class device path /sys/class/infiniband_verbs/uverbs2, infiniband class device path /sys/class/infiniband/rxe_eth0, transport: (2) Ethernet
qemu_rdma_block_for_wrid_miss A Wanted wrid CONTROL SEND (2000) but got CONTROL RECV (4000)

source:
../qemu/build/qemu-system-x86_64 -enable-kvm -netdev tap,id=hn0,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown -device e1000,netdev=hn0,mac=50:52:54:00:11:22 -boot c -drive if=none,file=./Fedora-rdma-server.qcow2,id=drive-virtio-disk0 -device virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 -m 2048 -smp 2 -device piix3-usb-uhci -device usb-tablet -monitor stdio -vga qxl -spice streaming-video=filter,port=5901,disable-ticketing -S
qemu-system-x86_64: -spice streaming-video=filter,port=5901,disable-ticketing: warning: short-form boolean option 'disable-ticketing' deprecated
Please use disable-ticketing=on instead
QEMU 6.0.50 monitor - type 'help' for more information
(qemu)
(qemu) trace-event qemu_rdma_block_for_wrid_miss on
(qemu) migrate -d rdma:192.168.22.23:8888
source_resolve_host RDMA Device opened: kernel name rxe_eth0 uverbs device name uverbs2, infiniband_verbs class device path /sys/class/infiniband_verbs/uverbs2, infiniband class device path /sys/class/infiniband/rxe_eth0, transport: (2) Ethernet
(qemu) qemu_rdma_block_for_wrid_miss A Wanted wrid WRITE RDMA (1) but got CONTROL RECV (4000)

NOTE: we use soft RoCE as the rdma device.
[root@iaas-rpma images]# rdma link show rxe_eth0/1
link rxe_eth0/1 state ACTIVE physical_state LINK_UP netdev eth0

This migration could not be completed when out of order(OOO) CQ event occurs.
The send queue and receive queue shared a same completion queue, and
qemu_rdma_block_for_wrid() will drop the CQs it's not interested in. But
the dropped CQs by qemu_rdma_block_for_wrid() could be later CQs it wants.
So in this case, qemu_rdma_block_for_wrid() will block forever.

OOO cases will occur in both source side and destination side. And a
forever blocking happens on only SEND and RECV are out of order. OOO between
'WRITE RDMA' and 'RECV' doesn't matter.

below the OOO sequence:
       source                             destination
      rdma_write_one()                   qemu_rdma_registration_handle()
1.    S1: post_recv X                    D1: post_recv Y
2.    wait for recv CQ event X
3.                                       D2: post_send X     ---------------+
4.                                       wait for send CQ send event X (D2) |
5.    recv CQ event X reaches (D2)                                          |
6.  +-S2: post_send Y                                                       |
7.  | wait for send CQ event Y                                              |
8.  |                                    recv CQ event Y (S2) (drop it)     |
9.  +-send CQ event Y reaches (S2)                                          |
10.                                      send CQ event X reaches (D2)  -----+
11.                                      wait recv CQ event Y (dropped by (8))

Although a hardware IB works fine in my a hundred of runs, the IB specification
doesn't guaratee the CQ order in such case.

Here we introduce a independent send completion queue to distinguish
ibv_post_send completion queue from the original mixed completion queue.
It helps us to poll the specific CQE we are really interested in.

Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2021-11-01 12:49:29 +01:00
..
block-dirty-bitmap.c migration: block-dirty-bitmap: add missing qemu_mutex_lock_iothread 2021-10-05 13:10:29 +02:00
block.c migration: using trace_ to replace DPRINTF 2020-10-26 16:15:04 +00:00
block.h
channel.c migration: Introduce migration_ioc_[un]register_yank() 2021-07-26 12:44:54 +01:00
channel.h
colo-failover.c qemu/atomic.h: rename atomic_ to qatomic_ 2020-09-23 16:07:44 +01:00
colo.c Remove migrate_set_block_enabled in checkpoint 2021-06-11 10:30:13 +08:00
dirtyrate.c hmp: Add "calc_dirty_rate" and "info dirty_rate" cmds 2021-06-08 20:18:26 +01:00
dirtyrate.h migration/dirtyrate: make sample page count configurable 2021-06-08 20:18:25 +01:00
exec.c
exec.h
fd.c monitor: Use getter/setter functions for cur_mon 2020-10-09 07:08:19 +02:00
fd.h
global_state.c migration: Silence compiler warning in global_state_store_running() 2020-10-02 12:28:48 +01:00
meson.build migration: Move populate_vfio_info() into a separate file 2021-05-14 12:31:51 +02:00
migration.c migration: allow enabling mutilfd for specific protocol only 2021-10-19 08:39:04 +02:00
migration.h migration: Make from_dst_file accesses thread-safe 2021-07-26 12:44:46 +01:00
multifd-zlib.c
multifd-zstd.c
multifd.c migration: allow enabling mutilfd for specific protocol only 2021-10-19 08:39:04 +02:00
multifd.h migration: allow multifd for socket protocol only 2021-10-19 08:39:04 +02:00
page_cache.c migration: Fix cache_init()'s "Failed to allocate" error messages 2021-02-08 11:19:51 +00:00
page_cache.h migration: Clean up signed vs. unsigned XBZRLE cache-size 2021-02-08 11:19:51 +00:00
postcopy-ram.c migration/ram: Handle RAM block resizes during postcopy 2021-05-13 18:21:14 +01:00
postcopy-ram.h migration/: fix some comment spelling errors 2020-09-17 20:36:32 +02:00
qemu-file-channel.c migration: Move the yank unregister of channel_close out 2021-07-26 12:45:03 +01:00
qemu-file-channel.h
qemu-file.c migration: Teach QEMUFile to be QIOChannel-aware 2021-07-26 12:44:59 +01:00
qemu-file.h migration: Teach QEMUFile to be QIOChannel-aware 2021-07-26 12:44:59 +01:00
ram.c migration/ram: Don't passs RAMState to migration_clear_memory_region_dirty_bitmap_*() 2021-10-19 08:39:04 +02:00
ram.h migration: Pre-fault memory before starting background snasphot 2021-04-07 18:37:28 +01:00
rdma.c migration/rdma: Fix out of order wrid 2021-11-01 12:49:29 +01:00
rdma.h
savevm.c migration: Move the yank unregister of channel_close out 2021-07-26 12:45:03 +01:00
savevm.h migration: Add blocker information 2021-02-08 11:19:51 +00:00
socket.c migration/socket: Close the listener at the end 2021-06-08 19:36:19 +01:00
socket.h migration: unify the framework of socket-type channel 2020-08-28 13:34:52 +01:00
target.c migration: Move populate_vfio_info() into a separate file 2021-05-14 12:31:51 +02:00
tls.c migration/tls: Use qcrypto_tls_creds_check_endpoint() 2021-06-29 18:30:20 +01:00
tls.h migration: Fix Lesser GPL version number 2020-11-15 16:43:28 +01:00
trace-events migration/rdma: advise prefetch write for ODP region 2021-10-19 08:39:04 +02:00
trace.h trace: switch position of headers to what Meson requires 2020-08-21 06:18:24 -04:00
vmstate-types.c migration: Replace migration's JSON writer by the general one 2020-12-19 10:39:16 +01:00
vmstate.c migration: Replace migration's JSON writer by the general one 2020-12-19 10:39:16 +01:00
xbzrle.c
xbzrle.h
yank_functions.c migration: Move the yank unregister of channel_close out 2021-07-26 12:45:03 +01:00
yank_functions.h migration: Move the yank unregister of channel_close out 2021-07-26 12:45:03 +01:00