qemu-e2k

Commit Graph

Author	SHA1	Message	Date
Juan Quintela	fc6705229c	multifd: Use proper maximum compression values It happens that there are functions to calculate the worst possible compression size for a packet. Use them. Suggested-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2022-01-28 15:38:23 +01:00
Juan Quintela	47fe16ff66	migration: Move ram_release_pages() call to save_zero_page_to_file() We always need to call it when we find a zero page, so put it in a single place. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com>	2022-01-28 15:38:23 +01:00
Juan Quintela	e7f2e190e5	migration: simplify do_compress_ram_page The goto is not needed at all. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2022-01-28 15:38:23 +01:00
Juan Quintela	20d549cb0b	migration: Remove masking for compression Remove the mask in the call to ram_release_pages(). Nothing else does it, and if the offset has that bits set, we have a lot of trouble. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2022-01-28 15:38:23 +01:00
Juan Quintela	0189c72291	migration: ram_release_pages() always receive 1 page as argument Remove the pages argument. And s/pages/page/ Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> --- - Use 1LL instead of casts (philmd) - Change the whole 1ULL for TARGET_PAGE_SIZE	2022-01-28 15:38:22 +01:00
Juan Quintela	05931ec561	migration: We only need last_stage in two places We only need last_stage in two places and we are passing it all around. Just add a field to RAMState that passes it. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> --- Repeat subject (philmd suggestion)	2022-01-28 15:38:22 +01:00
Juan Quintela	04e1140494	migration: All this fields are unsigned So printing it as %d is wrong. Notice that for the channel id, that is an uint8_t, but I changed it anyways for consistency. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com>	2022-01-28 15:38:22 +01:00
Stefan Hajnoczi	826cc32423	aio-posix: split poll check from ready handler Adaptive polling measures the execution time of the polling check plus handlers called when a polled event becomes ready. Handlers can take a significant amount of time, making it look like polling was running for a long time when in fact the event handler was running for a long time. For example, on Linux the io_submit(2) syscall invoked when a virtio-blk device's virtqueue becomes ready can take 10s of microseconds. This can exceed the default polling interval (32 microseconds) and cause adaptive polling to stop polling. By excluding the handler's execution time from the polling check we make the adaptive polling calculation more accurate. As a result, the event loop now stays in polling mode where previously it would have fallen back to file descriptor monitoring. The following data was collected with virtio-blk num-queues=2 event_idx=off using an IOThread. Before: 168k IOPS, IOThread syscalls: 9837.115 ( 0.020 ms): IO iothread1/620155 io_submit(ctx_id: 140512552468480, nr: 16, iocbpp: 0x7fcb9f937db0) = 16 9837.158 ( 0.002 ms): IO iothread1/620155 write(fd: 103, buf: 0x556a2ef71b88, count: 8) = 8 9837.161 ( 0.001 ms): IO iothread1/620155 write(fd: 104, buf: 0x556a2ef71b88, count: 8) = 8 9837.163 ( 0.001 ms): IO iothread1/620155 ppoll(ufds: 0x7fcb90002800, nfds: 4, tsp: 0x7fcb9f1342d0, sigsetsize: 8) = 3 9837.164 ( 0.001 ms): IO iothread1/620155 read(fd: 107, buf: 0x7fcb9f939cc0, count: 512) = 8 9837.174 ( 0.001 ms): IO iothread1/620155 read(fd: 105, buf: 0x7fcb9f939cc0, count: 512) = 8 9837.176 ( 0.001 ms): IO iothread1/620155 read(fd: 106, buf: 0x7fcb9f939cc0, count: 512) = 8 9837.209 ( 0.035 ms): IO iothread1/620155 io_submit(ctx_id: 140512552468480, nr: 32, iocbpp: 0x7fca7d0cebe0) = 32 174k IOPS (+3.6%), IOThread syscalls: 9809.566 ( 0.036 ms): IO iothread1/623061 io_submit(ctx_id: 140539805028352, nr: 32, iocbpp: 0x7fd0cdd62be0) = 32 9809.625 ( 0.001 ms): IO iothread1/623061 write(fd: 103, buf: 0x5647cfba5f58, count: 8) = 8 9809.627 ( 0.002 ms): IO iothread1/623061 write(fd: 104, buf: 0x5647cfba5f58, count: 8) = 8 9809.663 ( 0.036 ms): IO iothread1/623061 io_submit(ctx_id: 140539805028352, nr: 32, iocbpp: 0x7fd0d0388b50) = 32 Notice that ppoll(2) and eventfd read(2) syscalls are eliminated because the IOThread stays in polling mode instead of falling back to file descriptor monitoring. As usual, polling is not implemented on Windows so this patch ignores the new io_poll_read() callback in aio-win32.c. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Message-id: 20211207132336.36627-2-stefanha@redhat.com [Fixed up aio_set_event_notifier() calls in tests/unit/test-fdmon-epoll.c added after this series was queued. --Stefan] Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2022-01-12 17:09:39 +00:00
Laurent Vivier	1b529d908d	failover: Silence warning messages during qtest virtio-net-failover test tries several device combinations that produces some expected warnings. These warning can be confusing, so we disable them during the qtest sequence. Reported-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Laurent Vivier <lvivier@redhat.com> Message-Id: <20211220145314.390697-1-lvivier@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> [thuth: Fix memory leak by using error_free()] Signed-off-by: Thomas Huth <thuth@redhat.com>	2021-12-22 08:12:45 +01:00
Juan Quintela	a5ed229488	multifd: Make zlib compression method not use iovs Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-12-15 10:38:34 +01:00
Juan Quintela	f5ff548774	multifd: Make zstd compression method not use iovs Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-12-15 10:38:17 +01:00
Rao, Lei	9c5c8ff24e	COLO: Move some trace code behind qemu_mutex_unlock_iothread() There is no need to put some trace code in the critical section. So, moving it behind qemu_mutex_unlock_iothread() can reduce the lock time. Signed-off-by: Lei Rao <lei.rao@intel.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2021-12-15 10:31:42 +01:00
Li Zhang	077fbb5942	multifd: Shut down the QIO channels to avoid blocking the send threads when they are terminated. When doing live migration with multifd channels 8, 16 or larger number, the guest hangs in the presence of the network errors such as missing TCP ACKs. At sender's side: The main thread is blocked on qemu_thread_join, migration_fd_cleanup is called because one thread fails on qio_channel_write_all when the network problem happens and other send threads are blocked on sendmsg. They could not be terminated. So the main thread is blocked on qemu_thread_join to wait for the threads terminated. (gdb) bt 0 0x00007f30c8dcffc0 in __pthread_clockjoin_ex () at /lib64/libpthread.so.0 1 0x000055cbb716084b in qemu_thread_join (thread=0x55cbb881f418) at ../util/qemu-thread-posix.c:627 2 0x000055cbb6b54e40 in multifd_save_cleanup () at ../migration/multifd.c:542 3 0x000055cbb6b4de06 in migrate_fd_cleanup (s=0x55cbb8024000) at ../migration/migration.c:1808 4 0x000055cbb6b4dfb4 in migrate_fd_cleanup_bh (opaque=0x55cbb8024000) at ../migration/migration.c:1850 5 0x000055cbb7173ac1 in aio_bh_call (bh=0x55cbb7eb98e0) at ../util/async.c:141 6 0x000055cbb7173bcb in aio_bh_poll (ctx=0x55cbb7ebba80) at ../util/async.c:169 7 0x000055cbb715ba4b in aio_dispatch (ctx=0x55cbb7ebba80) at ../util/aio-posix.c:381 8 0x000055cbb7173ffe in aio_ctx_dispatch (source=0x55cbb7ebba80, callback=0x0, user_data=0x0) at ../util/async.c:311 9 0x00007f30c9c8cdf4 in g_main_context_dispatch () at /usr/lib64/libglib-2.0.so.0 10 0x000055cbb71851a2 in glib_pollfds_poll () at ../util/main-loop.c:232 11 0x000055cbb718521c in os_host_main_loop_wait (timeout=42251070366) at ../util/main-loop.c:255 12 0x000055cbb7185321 in main_loop_wait (nonblocking=0) at ../util/main-loop.c:531 13 0x000055cbb6e6ba27 in qemu_main_loop () at ../softmmu/runstate.c:726 14 0x000055cbb6ad6fd7 in main (argc=68, argv=0x7ffc0c578888, envp=0x7ffc0c578ab0) at ../softmmu/main.c:50 To make sure that the send threads could be terminated, IO channels should be shut down to avoid waiting IO. Signed-off-by: Li Zhang <lizhang@suse.de> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2021-12-15 10:31:42 +01:00
Juan Quintela	01102a2ef6	multifd: Fill offset and block for reception We were using the iov directly, but we will need this info on the following patch. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-12-15 10:31:42 +01:00
Juan Quintela	40a4bfe9d3	multifd: remove used parameter from send_recv_pages() method It is already there as p->pages->num. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-12-15 10:31:42 +01:00
Juan Quintela	02fb81043e	multifd: remove used parameter from send_prepare() method It is already there as p->pages->num. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-12-15 10:31:42 +01:00
Juan Quintela	1943c11a62	multifd: The variable is only used inside the loop Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-12-15 10:31:42 +01:00
Juan Quintela	18ede636bc	multifd: Add missing documention Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-12-15 10:31:42 +01:00
Juan Quintela	90a3d2f9d5	multifd: Rename used field to num We will need to split it later in zero_num (number of zero pages) and normal_num (number of normal pages). This name is better. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-12-15 10:31:42 +01:00
Juan Quintela	144fa06b34	migration: Never call twice qemu_target_page_size() Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-12-15 10:31:42 +01:00
Juan Quintela	47a1782461	multifd: Delete useless operation We are dividing by page_size to multiply again in the only use. Once there, improve the comments. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-12-15 10:31:42 +01:00
Juan Quintela	bad452a77e	migration: Remove is_zero_range() It just calls buffer_is_zero(). Just change the callers. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>	2021-12-15 10:31:42 +01:00
Zhang Chen	751fe4c608	migration/colo: Optimize COLO primary node start code path Optimize COLO primary start path from: MIGRATION_STATUS_XXX --> MIGRATION_STATUS_ACTIVE --> MIGRATION_STATUS_COLO --> MIGRATION_STATUS_COMPLETED To: MIGRATION_STATUS_XXX --> MIGRATION_STATUS_COLO --> MIGRATION_STATUS_COMPLETED No need to start primary COLO through "MIGRATION_STATUS_ACTIVE". Signed-off-by: Zhang Chen <chen.zhang@intel.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2021-12-15 10:31:42 +01:00
Rao, Lei	795969ab1f	Fixed a QEMU hang when guest poweroff in COLO mode When the PVM guest poweroff, the COLO thread may wait a semaphore in colo_process_checkpoint().So, we should wake up the COLO thread before migration shutdown. Signed-off-by: Lei Rao <lei.rao@intel.com> Reviewed-by: Zhang Chen <chen.zhang@intel.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2021-12-15 10:31:42 +01:00
Zhang Chen	0e0f0479e2	migration/colo: More accurate update checkpoint time Previous operation(like vm_start and replication_start_all) will consume extra time before update the timer, so reduce time in this patch. Signed-off-by: Zhang Chen <chen.zhang@intel.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2021-12-15 10:31:42 +01:00
Rao, Lei	672159a97c	migration/ram.c: Remove the qemu_mutex_lock in colo_flush_ram_cache. The code to acquire bitmap_mutex is added in the commit of "63268c4970a5f126cc9af75f3ccb8057abef5ec0". There is no need to acquire bitmap_mutex in colo_flush_ram_cache(). This is because the colo_flush_ram_cache only be called on the COLO secondary VM, which is the destination side. On the COLO secondary VM, only the COLO thread will touch the bitmap of ram cache. Signed-off-by: Lei Rao <lei.rao@intel.com> Reviewed-by: Zhang Chen <chen.zhang@intel.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2021-12-15 10:31:42 +01:00
Rao, Lei	91fe9a8dbd	Reset the auto-converge counter at every checkpoint. if we don't reset the auto-converge counter, it will continue to run with COLO running, and eventually the system will hang due to the CPU throttle reaching DEFAULT_MIGRATE_MAX_CPU_THROTTLE. Signed-off-by: Lei Rao <lei.rao@intel.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Lukas Straub <lukasstraub2@web.de> Tested-by: Lukas Straub <lukasstraub2@web.de> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2021-11-09 08:48:36 +01:00
Rao, Lei	a6a83cef9c	Reduce the PVM stop time during Checkpoint When flushing memory from ram cache to ram during every checkpoint on secondary VM, we can copy continuous chunks of memory instead of 4096 bytes per time to reduce the time of VM stop during checkpoint. Signed-off-by: Lei Rao <lei.rao@intel.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Lukas Straub <lukasstraub2@web.de> Reviewed-by: Juan Quintela <quintela@redhat.com> Tested-by: Lukas Straub <lukasstraub2@web.de> Signed-off-by: Juan Quintela <quintela@redhat.com>	2021-11-09 08:46:30 +01:00
Juan Quintela	565599807f	migration: Check that postcopy fd's are not NULL If postcopy has finished, it frees the array. But vhost-user unregister it at cleanup time. fixes: `c4f7538` Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-11-06 12:35:29 +01:00
Lukas Straub	e5fdf92096	colo: Don't dump colo cache if dump-guest-core=off One might set dump-guest-core=off to make coredumps smaller and still allow to debug many qemu bugs. Extend this option to the colo cache. Signed-off-by: Lukas Straub <lukasstraub2@web.de> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2021-11-03 09:39:31 +01:00
Rao, Lei	2b9f6bf36c	Changed the last-mode to none of first start COLO When we first stated the COLO, the last-mode is as follows: { "execute": "query-colo-status" } {"return": {"last-mode": "primary", "mode": "primary", "reason": "none"}} The last-mode is unreasonable. After the patch, will be changed to the following: { "execute": "query-colo-status" } {"return": {"last-mode": "none", "mode": "primary", "reason": "none"}} Signed-off-by: Lei Rao <lei.rao@intel.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2021-11-03 09:38:53 +01:00
Rao, Lei	04dd89169b	Removed the qemu_fclose() in colo_process_incoming_thread After the live migration, the related fd will be cleanup in migration_incoming_state_destroy(). So, the qemu_close() in colo_process_incoming_thread is not necessary. Signed-off-by: Lei Rao <lei.rao@intel.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2021-11-03 09:38:53 +01:00
Rao, Lei	ac183dac96	colo: fixed 'Segmentation fault' when the simplex mode PVM poweroff The GDB statck is as follows: Program terminated with signal SIGSEGV, Segmentation fault. 0 object_class_dynamic_cast (class=0x55c8f5d2bf50, typename=0x55c8f2f7379e "qio-channel") at qom/object.c:832 if (type->class->interfaces && [Current thread is 1 (Thread 0x7f756e97eb00 (LWP 1811577))] (gdb) bt 0 object_class_dynamic_cast (class=0x55c8f5d2bf50, typename=0x55c8f2f7379e "qio-channel") at qom/object.c:832 1 0x000055c8f2c3dd14 in object_dynamic_cast (obj=0x55c8f543ac00, typename=0x55c8f2f7379e "qio-channel") at qom/object.c:763 2 0x000055c8f2c3ddce in object_dynamic_cast_assert (obj=0x55c8f543ac00, typename=0x55c8f2f7379e "qio-channel", file=0x55c8f2f73780 "migration/qemu-file-channel.c", line=117, func=0x55c8f2f73800 <__func__.18724> "channel_shutdown") at qom/object.c:786 3 0x000055c8f2bbc6ac in channel_shutdown (opaque=0x55c8f543ac00, rd=true, wr=true, errp=0x0) at migration/qemu-file-channel.c:117 4 0x000055c8f2bba56e in qemu_file_shutdown (f=0x7f7558070f50) at migration/qemu-file.c:67 5 0x000055c8f2ba5373 in migrate_fd_cancel (s=0x55c8f4ccf3f0) at migration/migration.c:1699 6 0x000055c8f2ba1992 in migration_shutdown () at migration/migration.c:187 7 0x000055c8f29a5b77 in main (argc=69, argv=0x7fff3e9e8c08, envp=0x7fff3e9e8e38) at vl.c:4512 The root cause is that we still want to shutdown the from_dst_file in migrate_fd_cancel() after qemu_close in colo_process_checkpoint(). So, we should set the s->rp_state.from_dst_file = NULL after qemu_close(). Signed-off-by: Lei Rao <lei.rao@intel.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2021-11-03 09:38:53 +01:00
Rao, Lei	684bfd1820	Fixed SVM hang when do failover before PVM crash This patch fixed as follows: Thread 1 (Thread 0x7f34ee738d80 (LWP 11212)): #0 __pthread_clockjoin_ex (threadid=139847152957184, thread_return=0x7f30b1febf30, clockid=<optimized out>, abstime=<optimized out>, block=<optimized out>) at pthread_join_common.c:145 #1 0x0000563401998e36 in qemu_thread_join (thread=0x563402d66610) at util/qemu-thread-posix.c:587 #2 0x00005634017a79fa in process_incoming_migration_co (opaque=0x0) at migration/migration.c:502 #3 0x00005634019b59c9 in coroutine_trampoline (i0=63395504, i1=22068) at util/coroutine-ucontext.c:115 #4 0x00007f34ef860660 in ?? () at ../sysdeps/unix/sysv/linux/x86_64/__start_context.S:91 from /lib/x86_64-linux-gnu/libc.so.6 #5 0x00007f30b21ee730 in ?? () #6 0x0000000000000000 in ?? () Thread 13 (Thread 0x7f30b3dff700 (LWP 11747)): #0 __lll_lock_wait (futex=futex@entry=0x56340218ffa0 <qemu_global_mutex>, private=0) at lowlevellock.c:52 #1 0x00007f34efa000a3 in _GI__pthread_mutex_lock (mutex=0x56340218ffa0 <qemu_global_mutex>) at ../nptl/pthread_mutex_lock.c:80 #2 0x0000563401997f99 in qemu_mutex_lock_impl (mutex=0x56340218ffa0 <qemu_global_mutex>, file=0x563401b7a80e "migration/colo.c", line=806) at util/qemu-thread-posix.c:78 #3 0x0000563401407144 in qemu_mutex_lock_iothread_impl (file=0x563401b7a80e "migration/colo.c", line=806) at /home/workspace/colo-qemu/cpus.c:1899 #4 0x00005634017ba8e8 in colo_process_incoming_thread (opaque=0x563402d664c0) at migration/colo.c:806 #5 0x0000563401998b72 in qemu_thread_start (args=0x5634039f8370) at util/qemu-thread-posix.c:519 #6 0x00007f34ef9fd609 in start_thread (arg=<optimized out>) at pthread_create.c:477 #7 0x00007f34ef924293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 The QEMU main thread is holding the lock: (gdb) p qemu_global_mutex $1 = {lock = {_data = {lock = 2, __count = 0, __owner = 11212, __nusers = 9, __kind = 0, __spins = 0, __elision = 0, __list = {_prev = 0x0, __next = 0x0}}, __size = "\002\000\000\000\000\000\000\000\314+\000\000\t", '\000' <repeats 26 times>, __align = 2}, file = 0x563401c07e4b "util/main-loop.c", line = 240, initialized = true} >From the call trace, we can see it is a deadlock bug. and the QEMU main thread holds the global mutex to wait until the COLO thread ends. and the colo thread wants to acquire the global mutex, which will cause a deadlock. So, we should release the qemu_global_mutex before waiting colo thread ends. Signed-off-by: Lei Rao <lei.rao@intel.com> Reviewed-by: Li Zhijian <lizhijian@cn.fujitsu.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2021-11-03 09:38:53 +01:00
Rao, Lei	aa505f8e0e	Fixed qemu crash when guest power off in COLO mode This patch fixes the following: qemu-system-x86_64: invalid runstate transition: 'shutdown' -> 'running' Aborted (core dumped) The gdb bt as following: 0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 1 0x00007faa3d613859 in __GI_abort () at abort.c:79 2 0x000055c5a21268fd in runstate_set (new_state=RUN_STATE_RUNNING) at vl.c:723 3 0x000055c5a1f8cae4 in vm_prepare_start () at /home/workspace/colo-qemu/cpus.c:2206 4 0x000055c5a1f8cb1b in vm_start () at /home/workspace/colo-qemu/cpus.c:2213 5 0x000055c5a2332bba in migration_iteration_finish (s=0x55c5a4658810) at migration/migration.c:3376 6 0x000055c5a2332f3b in migration_thread (opaque=0x55c5a4658810) at migration/migration.c:3527 7 0x000055c5a251d68a in qemu_thread_start (args=0x55c5a5491a70) at util/qemu-thread-posix.c:519 8 0x00007faa3d7e9609 in start_thread (arg=<optimized out>) at pthread_create.c:477 9 0x00007faa3d710293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 Signed-off-by: Lei Rao <lei.rao@intel.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2021-11-03 09:38:53 +01:00
Rao, Lei	ae4c209935	Some minor optimizations for COLO Signed-off-by: Lei Rao <lei.rao@intel.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2021-11-03 09:38:53 +01:00
Juan Quintela	02abee3d51	migration: Zero migration compression counters Based on previous patch from yuxiating <yuxiating@huawei.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2021-11-03 09:38:53 +01:00
yuxiating	fa0b31d585	migration: initialise compression_counters for a new migration If the compression migration fails or is canceled, the query for the value of compression_counters during the next compression migration is wrong. Signed-off-by: yuxiating <yuxiating@huawei.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2021-11-03 09:38:53 +01:00
Laurent Vivier	458fecca80	migration: provide an error message to migration_cancel() This avoids to call migrate_get_current() in the caller function whereas migration_cancel() already needs the pointer to the current migration state. Signed-off-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2021-11-03 09:38:53 +01:00
Hyman Huang(黄勇)	826b8bc80c	migration/dirtyrate: implement dirty-bitmap dirtyrate calculation introduce dirty-bitmap mode as the third method of calc-dirty-rate. implement dirty-bitmap dirtyrate calculation, which can be used to measuring dirtyrate in the absence of dirty-ring. introduce "dirty_bitmap:-b" option in hmp calc_dirty_rate to indicate dirty bitmap method should be used for calculation. Signed-off-by: Hyman Huang(黄勇) <huangy81@chinatelecom.cn> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2021-11-01 22:56:44 +01:00
Hyman Huang(黄勇)	4998a37e4b	memory: introduce total_dirty_pages to stat dirty pages introduce global var total_dirty_pages to stat dirty pages along with memory_global_dirty_log_sync. Signed-off-by: Hyman Huang(黄勇) <huangy81@chinatelecom.cn> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2021-11-01 22:56:44 +01:00
David Hildenbrand	6fee3a1fd9	migration/ram: Handle RAMBlocks with a RamDiscardManager on background snapshots We already don't ever migrate memory that corresponds to discarded ranges as managed by a RamDiscardManager responsible for the mapped memory region of the RAMBlock. virtio-mem uses this mechanism to logically unplug parts of a RAMBlock. Right now, we still populate zeropages for the whole usable part of the RAMBlock, which is undesired because: 1. Even populating the shared zeropage will result in memory getting consumed for page tables. 2. Memory backends without a shared zeropage (like hugetlbfs and shmem) will populate an actual, fresh page, resulting in an unintended memory consumption. Discarded ("logically unplugged") parts have to remain discarded. As these pages are never part of the migration stream, there is no need to track modifications via userfaultfd WP reliably for these parts. Further, any writes to these ranges by the VM are invalid and the behavior is undefined. Note that Linux only supports userfaultfd WP on private anonymous memory for now, which usually results in the shared zeropage getting populated. The issue will become more relevant once userfaultfd WP supports shmem and hugetlb. Acked-by: Peter Xu <peterx@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2021-11-01 22:56:44 +01:00
David Hildenbrand	f7b9dcfbcf	migration/ram: Factor out populating pages readable in ram_block_populate_pages() Let's factor out prefaulting/populating to make further changes easier to review and add a comment what we are actually expecting to happen. While at it, use the actual page size of the ramblock, which defaults to qemu_real_host_page_size for anonymous memory. Further, rename ram_block_populate_pages() to ram_block_populate_read() as well, to make it clearer what we are doing. In the future, we might want to use MADV_POPULATE_READ to speed up population. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2021-11-01 22:56:44 +01:00
David Hildenbrand	7648297d40	migration: Simplify alignment and alignment checks Let's use QEMU_ALIGN_DOWN() and friends to make the code a bit easier to read. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2021-11-01 22:56:44 +01:00
David Hildenbrand	9470c5e082	migration/postcopy: Handle RAMBlocks with a RamDiscardManager on the destination Currently, when someone (i.e., the VM) accesses discarded parts inside a RAMBlock with a RamDiscardManager managing the corresponding mapped memory region, postcopy will request migration of the corresponding page from the source. The source, however, will never answer, because it refuses to migrate such pages with undefined content ("logically unplugged"): the pages are never dirty, and get_queued_page() will consequently skip processing these postcopy requests. Especially reading discarded ("logically unplugged") ranges is supposed to work in some setups (for example with current virtio-mem), although it barely ever happens: still, not placing a page would currently stall the VM, as it cannot make forward progress. Let's check the state via the RamDiscardManager (the state e.g., of virtio-mem is migrated during precopy) and avoid sending a request that will never get answered. Place a fresh zero page instead to keep the VM working. This is the same behavior that would happen automatically without userfaultfd being active, when accessing virtual memory regions without populated pages -- "populate on demand". For now, there are valid cases (as documented in the virtio-mem spec) where a VM might read discarded memory; in the future, we will disallow that. Then, we might want to handle that case differently, e.g., warning the user that the VM seems to be mis-behaving. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2021-11-01 22:56:44 +01:00
David Hildenbrand	be39b4cd20	migration/ram: Handle RAMBlocks with a RamDiscardManager on the migration source We don't want to migrate memory that corresponds to discarded ranges as managed by a RamDiscardManager responsible for the mapped memory region of the RAMBlock. The content of these pages is essentially stale and without any guarantees for the VM ("logically unplugged"). Depending on the underlying memory type, even reading memory might populate memory on the source, resulting in an undesired memory consumption. Of course, on the destination, even writing a zeropage consumes memory, which we also want to avoid (similar to free page hinting). Currently, virtio-mem tries achieving that goal (not migrating "unplugged" memory that was discarded) by going via qemu_guest_free_page_hint() - but it's hackish and incomplete. For example, background snapshots still end up reading all memory, as they don't do bitmap syncs. Postcopy recovery code will re-add previously cleared bits to the dirty bitmap and migrate them. Let's consult the RamDiscardManager after setting up our dirty bitmap initially and when postcopy recovery code reinitializes it: clear corresponding bits in the dirty bitmaps (e.g., of the RAMBlock and inside KVM). It's important to fixup the dirty bitmap after our initial bitmap sync, such that the corresponding dirty bits in KVM are actually cleared. As colo is incompatible with discarding of RAM and inhibits it, we don't have to bother. Note: if a misbehaving guest would use discarded ranges after migration started we would still migrate that memory: however, then we already populated that memory on the migration source. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2021-11-01 22:56:44 +01:00
Peter Xu	60fd680193	migration: Add migrate_add_blocker_internal() An internal version that removes -only-migratable implications. It can be used for temporary migration blockers like dump-guest-memory. Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2021-11-01 22:56:44 +01:00
Peter Xu	4c170330aa	migration: Make migration blocker work for snapshots too save_snapshot() checks migration blocker, which looks sane. At the meantime we should also teach the blocker add helper to fail if during a snapshot, just like for migrations. Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2021-11-01 22:56:43 +01:00
Hyman Huang(é»„å‹‡)	0e21bf2460	migration/dirtyrate: implement dirty-ring dirtyrate calculation use dirty ring feature to implement dirtyrate calculation. introduce mode option in qmp calc_dirty_rate to specify what method should be used when calculating dirtyrate, either page-sampling or dirty-ring should be passed. introduce "dirty_ring:-r" option in hmp calc_dirty_rate to indicate dirty ring method should be used for calculation. Signed-off-by: Hyman Huang(é»„å‹‡) <huangy81@chinatelecom.cn> Message-Id: <7db445109bd18125ce8ec86816d14f6ab5de6a7d.1624040308.git.huangy81@chinatelecom.cn> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2021-11-01 22:56:43 +01:00
Hyman Huang(é»„å‹‡)	9865d0f68f	migration/dirtyrate: move init step of calculation to main thread since main thread may "query dirty rate" at any time, it's better to move init step into main thead so that synchronization overhead between "main" and "get_dirtyrate" can be reduced. Signed-off-by: Hyman Huang(é»„å‹‡) <huangy81@chinatelecom.cn> Message-Id: <109f8077518ed2f13068e3bfb10e625e964780f1.1624040308.git.huangy81@chinatelecom.cn> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2021-11-01 22:56:43 +01:00
Hyman Huang(é»„å‹‡)	15eb2d644c	migration/dirtyrate: adjust order of registering thread registering get_dirtyrate thread in advance so that both page-sampling and dirty-ring mode can be covered. Signed-off-by: Hyman Huang(é»„å‹‡) <huangy81@chinatelecom.cn> Message-Id: <d7727581a8e86d4a42fc3eacf7f310419b9ebf7e.1624040308.git.huangy81@chinatelecom.cn> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2021-11-01 22:56:43 +01:00
Hyman Huang(é»„å‹‡)	71864eadd9	migration/dirtyrate: introduce struct and adjust DirtyRateStat introduce "DirtyRateMeasureMode" to specify what method should be used to calculate dirty rate, introduce "DirtyRateVcpu" to store dirty rate for each vcpu. use union to store stat data of specific mode Signed-off-by: Hyman Huang(é»„å‹‡) <huangy81@chinatelecom.cn> Message-Id: <661c98c40f40e163aa58334337af8f3ddf41316a.1624040308.git.huangy81@chinatelecom.cn> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2021-11-01 22:56:43 +01:00
Hyman Huang(é»„å‹‡)	63b41db4bc	memory: make global_dirty_tracking a bitmask since dirty ring has been introduced, there are two methods to track dirty pages of vm. it seems that "logging" has a hint on the method, so rename the global_dirty_log to global_dirty_tracking would make description more accurate. dirty rate measurement may start or stop dirty tracking during calculation. this conflict with migration because stop dirty tracking make migration leave dirty pages out then that'll be a problem. make global_dirty_tracking a bitmask can let both migration and dirty rate measurement work fine. introduce GLOBAL_DIRTY_MIGRATION and GLOBAL_DIRTY_DIRTY_RATE to distinguish what current dirty tracking aims for, migration or dirty rate. Signed-off-by: Hyman Huang(é»„å‹‡) <huangy81@chinatelecom.cn> Message-Id: <9c9388657cfa0301bd2c1cfa36e7cf6da4aeca19.1624040308.git.huangy81@chinatelecom.cn> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2021-11-01 22:56:43 +01:00
Li Zhijian	b390afd8c5	migration/rdma: Fix out of order wrid destination: ../qemu/build/qemu-system-x86_64 -enable-kvm -netdev tap,id=hn0,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown -device e1000,netdev=hn0,mac=50:52:54:00:11:22 -boot c -drive if=none,file=./Fedora-rdma-server-migration.qcow2,id=drive-virtio-disk0 -device virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 -m 2048 -smp 2 -device piix3-usb-uhci -device usb-tablet -monitor stdio -vga qxl -spice streaming-video=filter,port=5902,disable-ticketing -incoming rdma:192.168.22.23:8888 qemu-system-x86_64: -spice streaming-video=filter,port=5902,disable-ticketing: warning: short-form boolean option 'disable-ticketing' deprecated Please use disable-ticketing=on instead QEMU 6.0.50 monitor - type 'help' for more information (qemu) trace-event qemu_rdma_block_for_wrid_miss on (qemu) dest_init RDMA Device opened: kernel name rxe_eth0 uverbs device name uverbs2, infiniband_verbs class device path /sys/class/infiniband_verbs/uverbs2, infiniband class device path /sys/class/infiniband/rxe_eth0, transport: (2) Ethernet qemu_rdma_block_for_wrid_miss A Wanted wrid CONTROL SEND (2000) but got CONTROL RECV (4000) source: ../qemu/build/qemu-system-x86_64 -enable-kvm -netdev tap,id=hn0,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown -device e1000,netdev=hn0,mac=50:52:54:00:11:22 -boot c -drive if=none,file=./Fedora-rdma-server.qcow2,id=drive-virtio-disk0 -device virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 -m 2048 -smp 2 -device piix3-usb-uhci -device usb-tablet -monitor stdio -vga qxl -spice streaming-video=filter,port=5901,disable-ticketing -S qemu-system-x86_64: -spice streaming-video=filter,port=5901,disable-ticketing: warning: short-form boolean option 'disable-ticketing' deprecated Please use disable-ticketing=on instead QEMU 6.0.50 monitor - type 'help' for more information (qemu) (qemu) trace-event qemu_rdma_block_for_wrid_miss on (qemu) migrate -d rdma:192.168.22.23:8888 source_resolve_host RDMA Device opened: kernel name rxe_eth0 uverbs device name uverbs2, infiniband_verbs class device path /sys/class/infiniband_verbs/uverbs2, infiniband class device path /sys/class/infiniband/rxe_eth0, transport: (2) Ethernet (qemu) qemu_rdma_block_for_wrid_miss A Wanted wrid WRITE RDMA (1) but got CONTROL RECV (4000) NOTE: we use soft RoCE as the rdma device. [root@iaas-rpma images]# rdma link show rxe_eth0/1 link rxe_eth0/1 state ACTIVE physical_state LINK_UP netdev eth0 This migration could not be completed when out of order(OOO) CQ event occurs. The send queue and receive queue shared a same completion queue, and qemu_rdma_block_for_wrid() will drop the CQs it's not interested in. But the dropped CQs by qemu_rdma_block_for_wrid() could be later CQs it wants. So in this case, qemu_rdma_block_for_wrid() will block forever. OOO cases will occur in both source side and destination side. And a forever blocking happens on only SEND and RECV are out of order. OOO between 'WRITE RDMA' and 'RECV' doesn't matter. below the OOO sequence: source destination rdma_write_one() qemu_rdma_registration_handle() 1. S1: post_recv X D1: post_recv Y 2. wait for recv CQ event X 3. D2: post_send X ---------------+ 4. wait for send CQ send event X (D2) \| 5. recv CQ event X reaches (D2) \| 6. +-S2: post_send Y \| 7. \| wait for send CQ event Y \| 8. \| recv CQ event Y (S2) (drop it) \| 9. +-send CQ event Y reaches (S2) \| 10. send CQ event X reaches (D2) -----+ 11. wait recv CQ event Y (dropped by (8)) Although a hardware IB works fine in my a hundred of runs, the IB specification doesn't guaratee the CQ order in such case. Here we introduce a independent send completion queue to distinguish ibv_post_send completion queue from the original mixed completion queue. It helps us to poll the specific CQE we are really interested in. Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2021-11-01 12:49:29 +01:00
Li Zhijian	911965ace9	migration/rdma: advise prefetch write for ODP region The responder mr registering with ODP will sent RNR NAK back to the requester in the face of the page fault. --------- ibv_poll_cq wc.status=13 RNR retry counter exceeded! ibv_poll_cq wrid=WRITE RDMA! --------- ibv_advise_mr(3) helps to make pages present before the actual IO is conducted so that the responder does page fault as little as possible. Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> Reviewed-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2021-10-19 08:39:04 +02:00
Li Zhijian	e2daccb0d0	migration/rdma: Try to register On-Demand Paging memory region Previously, for the fsdax mem-backend-file, it will register failed with Operation not supported. In this case, we can try to register it with On-Demand Paging[1] like what rpma_mr_reg() does on rpma[2]. [1]: https://community.mellanox.com/s/article/understanding-on-demand-paging--odp-x [2]: http://pmem.io/rpma/manpages/v0.9.0/rpma_mr_reg.3 CC: Marcel Apfelbaum <marcel.apfelbaum@gmail.com> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> Reviewed-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2021-10-19 08:39:04 +02:00
Li Zhijian	5ad15e8614	migration: allow enabling mutilfd for specific protocol only To: <quintela@redhat.com>, <dgilbert@redhat.com>, <qemu-devel@nongnu.org> CC: Li Zhijian <lizhijian@cn.fujitsu.com> Date: Sat, 31 Jul 2021 22:05:52 +0800 (5 weeks, 4 days, 17 hours ago) And change the default to true so that in '-incoming defer' case, user is able to change multifd capability. Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2021-10-19 08:39:04 +02:00
Li Zhijian	b7acd65707	migration: allow multifd for socket protocol only To: <quintela@redhat.com>, <dgilbert@redhat.com>, <qemu-devel@nongnu.org> CC: Li Zhijian <lizhijian@cn.fujitsu.com> Date: Sat, 31 Jul 2021 22:05:51 +0800 (5 weeks, 4 days, 17 hours ago) multifd with unsupported protocol will cause a segment fault. (gdb) bt #0 0x0000563b4a93faf8 in socket_connect (addr=0x0, errp=0x7f7f02675410) at ../util/qemu-sockets.c:1190 #1 0x0000563b4a797a03 in qio_channel_socket_connect_sync (ioc=0x563b4d16e8c0, addr=0x0, errp=0x7f7f02675410) at ../io/channel-socket.c:145 #2 0x0000563b4a797abf in qio_channel_socket_connect_worker (task=0x563b4cd86c30, opaque=0x0) at ../io/channel-socket.c:168 #3 0x0000563b4a792631 in qio_task_thread_worker (opaque=0x563b4cd86c30) at ../io/task.c:124 #4 0x0000563b4a91da69 in qemu_thread_start (args=0x563b4c44bb80) at ../util/qemu-thread-posix.c:541 #5 0x00007f7fe9b5b3f9 in ?? () #6 0x0000000000000000 in ?? () It's enough to check migrate_multifd_is_allowed() in multifd cleanup() and multifd setup() though there are so many other places using migrate_use_multifd(). Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2021-10-19 08:39:04 +02:00
David Hildenbrand	1230a25f6f	migration/ram: Don't passs RAMState to migration_clear_memory_region_dirty_bitmap_*() The parameter is unused, let's drop it. Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2021-10-19 08:39:04 +02:00
Lukas Straub	e9ab82b858	multifd: Unconditionally unregister yank function To: qemu-devel <qemu-devel@nongnu.org> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>, Juan Quintela <quintela@redhat.com>, Peter Xu <peterx@redhat.com>, Leonardo Bras Soares Passos <lsoaresp@redhat.com> Date: Wed, 4 Aug 2021 21:26:32 +0200 (5 weeks, 11 hours, 52 minutes ago) [[PGP Signed Part:No public key for 35AB0B289C5DB258 created at 2021-08-04T21:26:32+0200 using RSA]] Unconditionally unregister yank function in multifd_load_cleanup(). If it is not unregistered here, it will leak and cause a crash in yank_unregister_instance(). Now if the ioc is still in use afterwards, it will only lead to qemu not being able to recover from a hang related to that ioc. After checking the code, i am pretty sure that ref is always 1 when arriving here. So all this currently does is remove the unneeded check. Signed-off-by: Lukas Straub <lukasstraub2@web.de> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2021-10-19 08:39:04 +02:00
Lukas Straub	20171ea895	multifd: Implement yank for multifd send side To: qemu-devel <qemu-devel@nongnu.org> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>, Juan Quintela <quintela@redhat.com>, Peter Xu <peterx@redhat.com>, Leonardo Bras Soares Passos <lsoaresp@redhat.com> Date: Wed, 1 Sep 2021 17:58:57 +0200 (1 week, 15 hours, 17 minutes ago) [[PGP Signed Part:No public key for 35AB0B289C5DB258 created at 2021-09-01T17:58:57+0200 using RSA]] When introducing yank functionality in the migration code I forgot to cover the multifd send side. Signed-off-by: Lukas Straub <lukasstraub2@web.de> Tested-by: Leonardo Bras <leobras@redhat.com> Reviewed-by: Leonardo Bras <leobras@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2021-10-19 08:39:04 +02:00
Emanuele Giuseppe Esposito	68b88468f6	migration: add missing qemu_mutex_lock_iothread in migration_completion qemu_savevm_state_complete_postcopy assumes the iothread lock (BQL) to be held, but instead it isn't. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20211005080751.3797161-3-eesposit@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-10-05 13:10:29 +02:00
Emanuele Giuseppe Esposito	3c158eba1e	migration: block-dirty-bitmap: add missing qemu_mutex_lock_iothread init_dirty_bitmap_migration assumes the iothread lock (BQL) to be held, but instead it isn't. Instead of adding the lock to qemu_savevm_state_setup(), follow the same pattern as the other ->save_setup callbacks and lock+unlock inside dirty_bitmap_save_setup(). Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20211005080751.3797161-2-eesposit@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-10-05 13:10:29 +02:00
Markus Armbruster	7d6f6933aa	migration: Handle migration_incoming_setup() errors consistently Commit `b673eab4e2` "multifd: Make multifd_load_setup() get an Error parameter" changed migration_incoming_setup() to take an Error ** argument, and adjusted the callers accordingly. It neglected to change adjust multifd_load_setup(): it still exit()s on error. Clean that up. The error now gets propagated up two call chains: via migration_fd_process_incoming() to rdma_accept_incoming_migration(), and via migration_ioc_process_incoming() to migration_channel_process_incoming(). Both chain ends report the error with error_report_err(), but otherwise ignore it. Behavioral change: we no longer exit() on this error. This is consistent with how we handle other errors here, e.g. from multifd_recv_new_channel() via migration_ioc_process_incoming() to migration_channel_process_incoming(). Whether it's consistently right or consistently wrong I can't tell. Also clean up the return value from the unusual 0 on success, 1 on error to the more common true on success, false on error. Cc: Juan Quintela <quintela@redhat.com> Cc: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20210720125408.387910-11-armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Pankaj Gupta <pankaj.gupta@ionos.com> Acked-by: Michael S. Tsirkin <mst@redhat.com>	2021-08-26 17:15:28 +02:00
Markus Armbruster	f9734d5d40	error: Use error_fatal to simplify obvious fatal errors (again) We did this with scripts/coccinelle/use-error_fatal.cocci before, in commit `50beeb6809` and `007b06578a`. This commit cleans up rarer variations that don't seem worth matching with Coccinelle. Cc: Thomas Huth <thuth@redhat.com> Cc: Cornelia Huck <cornelia.huck@de.ibm.com> Cc: Peter Xu <peterx@redhat.com> Cc: Juan Quintela <quintela@redhat.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Marc-André Lureau <marcandre.lureau@redhat.com> Cc: Gerd Hoffmann <kraxel@redhat.com> Cc: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20210720125408.387910-2-armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>	2021-08-26 17:15:28 +02:00
Wei Wang	3143577d6a	migration: clear the memory region dirty bitmap when skipping free pages When skipping free pages to send, their corresponding dirty bits in the memory region dirty bitmap need to be cleared. Otherwise the skipped pages will be sent in the next round after the migration thread syncs dirty bits from the memory region dirty bitmap. Cc: David Hildenbrand <david@redhat.com> Cc: Peter Xu <peterx@redhat.com> Cc: Michael S. Tsirkin <mst@redhat.com> Reported-by: David Hildenbrand <david@redhat.com> Signed-off-by: Wei Wang <wei.w.wang@intel.com> Message-Id: <20210722083055.23352-1-wei.w.wang@intel.com> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-07-26 12:50:13 +01:00
Peter Xu	39675ffffb	migration: Move the yank unregister of channel_close out It's efficient, but hackish to call yank unregister calls in channel_close(), especially it'll be hard to debug when qemu crashed with some yank function leaked. Remove that hack, but instead explicitly unregister yank functions at the places where needed, they are: (on src) - migrate_fd_cleanup - postcopy_pause (on dst) - migration_incoming_state_destroy - postcopy_pause_incoming Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20210722175841.938739-6-peterx@redhat.com> Reviewed-by: Lukas Straub <lukasstraub2@web.de> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-07-26 12:45:03 +01:00
Peter Xu	c6ad5be7ae	migration: Teach QEMUFile to be QIOChannel-aware migration uses QIOChannel typed qemufiles. In follow up patches, we'll need the capability to identify this fact, so that we can get the backing QIOChannel from a QEMUFile. We can also define types for QEMUFile but so far since we only need to be able to identify QIOChannel, introduce a boolean which is simpler. Introduce another helper qemu_file_get_ioc() to return the ioc backend of a qemufile if has_ioc is set. No functional change. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20210722175841.938739-5-peterx@redhat.com> Reviewed-by: Lukas Straub <lukasstraub2@web.de> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-07-26 12:44:59 +01:00
Peter Xu	18711405b5	migration: Introduce migration_ioc_[un]register_yank() There're plenty of places in migration/* that checks against either socket or tls typed ioc for yank operations. Provide two helpers to hide all these information. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20210722175841.938739-4-peterx@redhat.com> Reviewed-by: Lukas Straub <lukasstraub2@web.de> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-07-26 12:44:54 +01:00
Peter Xu	43044ac0ee	migration: Make from_dst_file accesses thread-safe Accessing from_dst_file is potentially racy in current code base like below: if (s->from_dst_file) do_something(s->from_dst_file); Because from_dst_file can be reset right after the check in another thread (rp_thread). One example is migrate_fd_cancel(). Use the same qemu_file_lock to protect it too, just like to_dst_file. When it's safe to access without lock, comment it. There's one special reference in migration_thread() that can be replaced by the newly introduced rp_thread_created flag. Reported-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Lukas Straub <lukasstraub2@web.de> Message-Id: <20210722175841.938739-3-peterx@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> with Peter's fixup	2021-07-26 12:44:46 +01:00
Peter Xu	53021ea165	migration: Fix missing join() of rp_thread It's possible that the migration thread skip the join() of the rp_thread in below race and crash on src right at finishing migration: migration_thread rp_thread ---------------- --------- migration_completion() (before rp_thread quits) from_dst_file=NULL [thread got scheduled out] s->rp_state.from_dst_file==NULL (skip join() of rp_thread) migrate_fd_cleanup() qemu_fclose(s->to_dst_file) yank_unregister_instance() assert(yank_find_entry()) <------- crash It could mostly happen with postcopy, but that shouldn't be required, e.g., I think it could also trigger with MIGRATION_CAPABILITY_RETURN_PATH set. It's suspected that above race could be the root cause of a recent (but rare) migration-test break reported by either Dave or PMM: https://lore.kernel.org/qemu-devel/YPamXAHwan%2FPPXLf@work-vm/ The issue is: from_dst_file is reset in the rp_thread, so if the thread reset it to NULL fast enough then the migration thread will assume there's no rp_thread at all. This could potentially cause more severe issue (e.g. crash) after the yank code. Fix it by using a boolean to keep "whether we've created rp_thread". Cc: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20210722175841.938739-2-peterx@redhat.com> Reviewed-by: Lukas Straub <lukasstraub2@web.de> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-07-26 12:44:26 +01:00
Peter Xu	63268c4970	migration: Move bitmap_mutex out of migration_bitmap_clear_dirty() Taking the mutex every time for each dirty bit to clear is too slow, especially we'll take/release even if the dirty bit is cleared. So far it's only used to sync with special cases with qemu_guest_free_page_hint() against migration thread, nothing really that serious yet. Let's move the lock to be upper. There're two callers of migration_bitmap_clear_dirty(). For migration, move it into ram_save_iterate(). With the help of MAX_WAIT logic, we'll only run ram_save_iterate() for no more than 50ms-ish time, so taking the lock once there at the entry. It also means any call sites to qemu_guest_free_page_hint() can be delayed; but it should be very rare, only during migration, and I don't see a problem with it. For COLO, move it up to colo_flush_ram_cache(). I think COLO forgot to take that lock even when calling ramblock_sync_dirty_bitmap(), where another example is migration_bitmap_sync() who took it right. So let the mutex cover both the ramblock_sync_dirty_bitmap() and migration_bitmap_clear_dirty() calls. It's even possible to drop the lock so we use atomic operations upon rb->bmap and the variable migration_dirty_pages. I didn't do it just to still be safe, also not predictable whether the frequent atomic ops could bring overhead too e.g. on huge vms when it happens very often. When that really comes, we can keep a local counter and periodically call atomic ops. Keep it simple for now. Cc: Wei Wang <wei.w.wang@intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: Hailiang Zhang <zhang.zhanghailiang@huawei.com> Cc: Dr. David Alan Gilbert <dgilbert@redhat.com> Cc: Juan Quintela <quintela@redhat.com> Cc: Leonardo Bras Soares Passos <lsoaresp@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20210630200805.280905-1-peterx@redhat.com> Reviewed-by: Wei Wang <wei.w.wang@intel.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-07-13 16:21:57 +01:00
Peter Xu	ca7bd0821b	migration: Clear error at entry of migrate_fd_connect() For each "migrate" command, remember to clear the s->error before going on. For one reason, when there's a new error it'll be still remembered; see migrate_set_error() who only sets the error if error==NULL. Meanwhile if a failed migration completes (e.g., postcopy recovered and finished), we shouldn't dump an error when calling migrate_fd_cleanup() at last. Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20210708190653.252961-4-peterx@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-07-13 16:21:57 +01:00
Peter Xu	ca30f24d12	migration: Don't do migrate cleanup if during postcopy resume Below process could crash qemu with postcopy recovery: 1. (hmp) migrate -d .. 2. (hmp) migrate_start_postcopy 3. [network down, postcopy paused] 4. (hmp) migrate -r $WRONG_PORT when try the recover on an invalid $WRONG_PORT, cleanup_bh will be cleared 5. (hmp) migrate -r $RIGHT_PORT [qemu crash on assert(cleanup_bh)] The thing is we shouldn't cleanup if it's postcopy resume; the error is set mostly because the channel is wrong, so we return directly waiting for the user to retry. migrate_fd_cleanup() should only be called when migration is cancelled or completed. Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20210708190653.252961-3-peterx@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-07-13 16:21:57 +01:00
Peter Xu	2e3e3da3c2	migration: Release return path early for paused postcopy When postcopy pause triggered, we rely on the migration thread to cleanup the to_dst_file handle, and the return path thread to cleanup the from_dst_file handle (which is stored in the local variable "rp"). Within the process, from_dst_file cleanup (qemu_fclose) is postponed until it's setup again due to a postcopy recovery. It used to work before yank was born; after yank is introduced we rely on the refcount of IOC to correctly unregister yank function in channel_close(). If without the early and on-time release of from_dst_file handle the yank function will be leftover during paused postcopy. Without this patch, below steps (quoted from Xiaohui) could trigger qemu src crash: 1.Boot vm on src host 2.Boot vm on dst host 3.Enable postcopy on src&dst host 4.Load stressapptest in vm and set postcopy speed to 50M 5.Start migration from src to dst host, change into postcopy mode when migration is active. 6.When postcopy is active, down the network card(do migration via this network) on dst host. 7.Wait untill postcopy is paused on src&dst host. 8.Before up network card, recover migration on dst host, will get error like following. 9.Ignore the error of step 8, go on recovering migration on src host: After step 9, qemu on src host will core dump after some seconds: qemu-kvm: ../util/yank.c:107: yank_unregister_instance: Assertion `QLIST_EMPTY(&entry->yankfns)' failed. 1.sh: line 38: 44662 Aborted (core dumped) Reported-by: Li Xiaohui <xiaohli@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20210708190653.252961-2-peterx@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-07-13 16:21:57 +01:00
Laurent Vivier	a51dcef08b	migration: failover: emit a warning when the card is not fully unplugged When the migration fails or is canceled we wait the end of the unplug operation to be able to plug it back. But if the unplug operation is never finished we stop to wait and QEMU emits a warning to inform the user. Based-on: 20210629155007.629086-1-lvivier@redhat.com Signed-off-by: Laurent Vivier <lvivier@redhat.com> Message-Id: <20210701131458.112036-1-lvivier@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-07-13 16:21:57 +01:00
Li Zhijian	224f364a49	migration/rdma: prevent from double free the same mr backtrace: '0x00007ffff5f44ec2 in __ibv_dereg_mr_1_1 (mr=0x7fff1007d390) at /home/lizhijian/rdma-core/libibverbs/verbs.c:478 478 void *addr = mr->addr; (gdb) bt #0 0x00007ffff5f44ec2 in __ibv_dereg_mr_1_1 (mr=0x7fff1007d390) at /home/lizhijian/rdma-core/libibverbs/verbs.c:478 #1 0x0000555555891fcc in rdma_delete_block (block=<optimized out>, rdma=0x7fff38176010) at ../migration/rdma.c:691 #2 qemu_rdma_cleanup (rdma=0x7fff38176010) at ../migration/rdma.c:2365 #3 0x00005555558925b0 in qio_channel_rdma_close_rcu (rcu=0x555556b8b6c0) at ../migration/rdma.c:3073 #4 0x0000555555d652a3 in call_rcu_thread (opaque=opaque@entry=0x0) at ../util/rcu.c:281 #5 0x0000555555d5edf9 in qemu_thread_start (args=0x7fffe88bb4d0) at ../util/qemu-thread-posix.c:541 #6 0x00007ffff54c73f9 in start_thread () at /lib64/libpthread.so.0 #7 0x00007ffff53f3b03 in clone () at /lib64/libc.so.6 ' Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> Message-Id: <20210708144521.1959614-1-lizhijian@cn.fujitsu.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-07-13 16:21:57 +01:00
Olaf Hering	179a808045	migration: fix typo in mig_throttle_guest_down comment Fixes commit `3d0684b2ad` ("ram: Update all functions comments") Signed-off-by: Olaf Hering <olaf@aepfle.de> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20210708162159.18045-1-olaf@aepfle.de> Signed-off-by: Laurent Vivier <laurent@vivier.eu>	2021-07-09 18:42:46 +02:00
Li Zhijian	eb1960aac1	misc: Remove redundant new line in perror() Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20210706094433.1766952-1-lizhijian@cn.fujitsu.com> Signed-off-by: Laurent Vivier <laurent@vivier.eu>	2021-07-09 18:42:46 +02:00
Li Zhijian	e5f607913c	migration/rdma: Use error_report to suppress errno message Since the prior calls are successful, in this case a errno doesn't indicate a real error which would just make us confused. before: (qemu) migrate -d rdma:192.168.22.23:8888 source_resolve_host RDMA Device opened: kernel name rxe_eth0 uverbs device name uverbs2, infiniband_verbs class device path /sys/class/infiniband_verbs/uverbs2, infiniband class device path /sys/class/infiniband/rxe_eth0, transport: (2) Ethernet rdma_get_cm_event != EVENT_ESTABLISHED after rdma_connect: No space left on device Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> Message-Id: <20210628071959.23455-1-lizhijian@cn.fujitsu.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-07-05 10:51:26 +01:00
Laurent Vivier	944bc52842	migration: failover: continue to wait card unplug on error If the user cancels the migration in the unplug-wait state, QEMU will try to plug back the card and this fails because the card is partially unplugged. To avoid the problem, continue to wait the card unplug, but to allow the migration to be canceled if the card never finishes to unplug use a timeout. Bug: https://bugzilla.redhat.com/show_bug.cgi?id=1976852 Signed-off-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20210629155007.629086-3-lvivier@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-07-05 10:51:26 +01:00
Laurent Vivier	fde93d99d9	migration: move wait-unplug loop to its own function The loop is used in migration_thread() and bg_migration_thread(), so we can move it to its own function and call it from these both places. Moreover, in migration_thread() we have a wrong state transition from SETUP to ACTIVE while state could be WAIT_UNPLUG. This is correctly managed in bg_migration_thread() so use this code instead. Signed-off-by: Laurent Vivier <lvivier@redhat.com> Message-Id: <20210629155007.629086-2-lvivier@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-07-05 10:51:26 +01:00
Peter Xu	b7f9afd48e	migration: Allow reset of postcopy_recover_triggered when failed It's possible qemu_start_incoming_migration() failed at any point, when it happens we should reset postcopy_recover_triggered to false so that the user can still retry with a saner incoming port. Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20210629181356.217312-3-peterx@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-07-05 10:51:26 +01:00
Peter Xu	cc48c587d2	migration: Move yank outside qemu_start_incoming_migration() Starting from commit `b5eea99ec2`, qmp_migrate_recover() calls unregister before calling qemu_start_incoming_migration(). I believe it wanted to mitigate the next call to yank_register_instance(), but I think that's wrong. Firstly, if during recover, we should keep the yank instance there, not "quickly removing and adding it back". Meanwhile, calling qmp_migrate_recover() twice with `b5eea99ec2` will directly crash the dest qemu (right now it can't; but it'll start to work right after the next patch) because the 1st call of qmp_migrate_recover() will unregister permanently when the channel failed to establish, then the 2nd call of qmp_migrate_recover() crashes at yank_unregister_instance(). This patch fixes it by moving yank ops out of qemu_start_incoming_migration() into qmp_migrate_incoming. For qmp_migrate_recover(), drop the unregister of yank instance too since we keep it there during the recovery phase. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20210629181356.217312-2-peterx@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-07-05 10:51:26 +01:00
Feng Lin	c00d434ac6	migration: fix the memory overwriting risk in add_to_iovec When testing migration, a Segmentation fault qemu core is generated. 0 error_free (err=0x1) 1 0x00007f8b862df647 in qemu_fclose (f=f@entry=0x55e06c247640) 2 0x00007f8b8516d59a in migrate_fd_cleanup (s=s@entry=0x55e06c0e1ef0) 3 0x00007f8b8516d66c in migrate_fd_cleanup_bh (opaque=0x55e06c0e1ef0) 4 0x00007f8b8626a47f in aio_bh_poll (ctx=ctx@entry=0x55e06b5a16d0) 5 0x00007f8b8626e71f in aio_dispatch (ctx=0x55e06b5a16d0) 6 0x00007f8b8626a33d in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>, user_data=<optimized out>) 7 0x00007f8b866bdba4 in g_main_context_dispatch () 8 0x00007f8b8626cde9 in glib_pollfds_poll () 9 0x00007f8b8626ce62 in os_host_main_loop_wait (timeout=<optimized out>) 10 0x00007f8b8626cffd in main_loop_wait (nonblocking=nonblocking@entry=0) 11 0x00007f8b862ef01f in main_loop () Using gdb print the struct QEMUFile f = { ..., iovcnt = 65, last_error = 21984, last_error_obj = 0x1, shutdown = true } Well iovcnt is overflow, because the max size of MAX_IOV_SIZE is 64. struct QEMUFile { ...; struct iovec iov[MAX_IOV_SIZE]; unsigned int iovcnt; int last_error; Error *last_error_obj; bool shutdown; }; iovcnt and last_error is overwrited by add_to_iovec(). Right now, add_to_iovec() increase iovcnt before check the limit. And it seems that add_to_iovec() assumes that iovcnt will set to zero in qemu_fflush(). But qemu_fflush() will directly return when f->shutdown is true. The situation may occur when libvirtd restart during migration, after f->shutdown is set, before calling qemu_file_set_error() in qemu_file_shutdown(). So the safiest way is checking the iovcnt before increasing it. Signed-off-by: Feng Lin <linfeng23@huawei.com> Message-Id: <20210625062138.1899-1-linfeng23@huawei.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Fix typo in 'writeable' which is actually misnamed 'writable'	2021-07-05 10:51:26 +01:00
Philippe Mathieu-Daudé	5590f65fac	migration/tls: Use qcrypto_tls_creds_check_endpoint() Avoid accessing QCryptoTLSCreds internals by using the qcrypto_tls_creds_check_endpoint() helper. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>	2021-06-29 18:30:20 +01:00
David Hildenbrand	8dbe22c686	memory: Introduce RAM_NORESERVE and wire it up in qemu_ram_mmap() Let's introduce RAM_NORESERVE, allowing mmap'ing with MAP_NORESERVE. The new flag has the following semantics: " RAM is mmap-ed with MAP_NORESERVE. When set, reserving swap space (or huge pages if applicable) is skipped: will bail out if not supported. When not set, the OS will do the reservation, if supported for the memory type. " Allow passing it into: - memory_region_init_ram_nomigrate() - memory_region_init_resizeable_ram() - memory_region_init_ram_from_file() ... and teach qemu_ram_mmap() and qemu_anon_ram_alloc() about the flag. Bail out if the flag is not supported, which is the case right now for both, POSIX and win32. We will add Linux support next and allow specifying RAM_NORESERVE via memory backends. The target use case is virtio-mem, which dynamically exposes memory inside a large, sparse memory area to the VM. Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Acked-by: Eduardo Habkost <ehabkost@redhat.com> for memory backend and machine core Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20210510114328.21835-9-david@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-15 20:27:38 +02:00
Daniel P. Berrangé	85cd1cc668	migration: use GDateTime for formatting timestamp in snapshot names The GDateTime APIs provided by GLib avoid portability pitfalls, such as some platforms where 'struct timeval.tv_sec' field is still 'long' instead of 'time_t'. When combined with automatic cleanup, GDateTime often results in simpler code too. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>	2021-06-14 13:28:50 +01:00
Daniel P. Berrangé	626ff6515d	migration: add trace point when vm_stop_force_state fails This is a critical failure scenario for migration that is hard to diagnose from existing probes. Most likely it is caused by an error from bdrv_flush(), but we're not logging the errno anywhere, hence this new probe. Reviewed-by: Connor Kuehl <ckuehl@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>	2021-06-14 13:28:50 +01:00
Rao, Lei	3ba024457f	Remove migrate_set_block_enabled in checkpoint We can detect disk migration in migrate_prepare, if disk migration is enabled in COLO mode, we can directly report an error.and there is no need to disable block migration at every checkpoint. Signed-off-by: Lei Rao <lei.rao@intel.com> Signed-off-by: Zhang Chen <chen.zhang@intel.com> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com> Reviewed-by: Zhang Chen <chen.zhang@intel.com> Reviewed-by: Lukas Straub <lukasstraub2@web.de> Tested-by: Lukas Straub <lukasstraub2@web.de> Signed-off-by: Jason Wang <jasowang@redhat.com>	2021-06-11 10:30:13 +08:00
Peter Xu	a4a571d978	hmp: Add "calc_dirty_rate" and "info dirty_rate" cmds These two commands are missing when adding the QMP sister commands. Add them, so developers can play with them easier. Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Hyman Huang(黄勇) <huangy81@chinatelecom.cn> Message-Id: <4cc0039fc3ad6145136770cf3b0f056c09a2910b.1623027729.git.huangy81@chinatelecom.cn> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-06-08 20:18:26 +01:00
Hyman Huang(黄勇)	7afa08cd8f	migration/dirtyrate: make sample page count configurable introduce optional sample-pages argument in calc-dirty-rate, making sample page count per GB configurable so that more accurate dirtyrate can be calculated. Signed-off-by: Hyman Huang(黄勇) <huangy81@chinatelecom.cn> Message-Id: <3103453a3b2796f929269c99a6ad81a9a7f1f405.1623027729.git.huangy81@chinatelecom.cn> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Wrapped a couple of long lines	2021-06-08 20:18:25 +01:00
Dr. David Alan Gilbert	a59136f3b1	migration/socket: Close the listener at the end Delay closing the listener until the cleanup hook at the end; mptcp needs the listener to stay open while the other paths come in. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <20210421112834.107651-5-dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-06-08 19:36:19 +01:00
Dr. David Alan Gilbert	1df6ddb43b	migration: Add cleanup hook for inwards migration Add a cleanup hook for incoming migration that gets called at the end as a way for a transport to allow cleanup. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <20210421112834.107651-4-dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-06-08 19:36:18 +01:00
Li Zhijian	6b8c2eb5c6	migration/rdma: Fix cm event use after free Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> Message-Id: <20210602023506.3821293-1-lizhijian@cn.fujitsu.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-06-08 18:55:43 +01:00
Leonardo Bras	7de2e85653	yank: Unregister function when using TLS migration After yank feature was introduced in migration, whenever migration is started using TLS, the following error happens in both source and destination hosts: (qemu) qemu-kvm: ../util/yank.c:107: yank_unregister_instance: Assertion `QLIST_EMPTY(&entry->yankfns)' failed. This happens because of a missing yank_unregister_function() when using qio-channel-tls. Fix this by also allowing TYPE_QIO_CHANNEL_TLS object type to perform yank_unregister_function() in channel_close() and multifd_load_cleanup(). Also, inside migration_channel_connect() and migration_channel_process_incoming() move yank_register_function() so it only runs once on a TLS migration. Fixes: `b5eea99ec2` ("migration: Add yank feature", 2021-01-13) Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1964326 Signed-off-by: Leonardo Bras <leobras.c@gmail.com> Reviewed-by: Lukas Straub <lukasstraub2@web.de> Reviewed-by: Peter Xu <peterx@redhat.com> -- Changes since v2: - Dropped all references to ioc->master - yank_register_function() and yank_unregister_function() now only run once in a TLS migration. Changes since v1: - Cast p->c to QIOChannelTLS into multifd_load_cleanup() Message-Id: <20210601054030.1153249-1-leobras.c@gmail.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-06-08 18:50:03 +01:00
Stefano Garzarella	d0fb9657a3	docs: fix references to docs/devel/tracing.rst Commit `e50caf4a5c` ("tracing: convert documentation to rST") converted docs/devel/tracing.txt to docs/devel/tracing.rst. We still have several references to the old file, so let's fix them with the following command: sed -i s/tracing.txt/tracing.rst/ $(git grep -l docs/devel/tracing.txt) Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20210517151702.109066-2-sgarzare@redhat.com> Signed-off-by: Thomas Huth <thuth@redhat.com>	2021-06-02 06:51:09 +02:00
Peter Maydell	c5847f5e4e	Virtiofs, migration and hmp pull 2021-05-26 Fixes for a loadvm regression from Kevin, some virtiofsd cleanups from Vivek and Mahmoud, and some RDMA migration fixups from Li. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEERfXHG0oMt/uXep+pBRYzHrxb/ecFAmCuiMIACgkQBRYzHrxb /edPvxAAimpeeMuCBKPswmuxyAqAoes6FKWNohzCyTCWXEmhvZGFl6SEgMxW0YWh DRLfyGUp9CODqNS2oVbj3I7/R5rDvpaqWFJ8GiBH99pc5B+pxYtXjGhjR8JWZ691 ZvoQsV/nN9dC7woIG5rcus09aPg1X8n8lD2GcV007Gacj3FGGHY6+DOmpfYmrdoG QY2OTQ2lMW3l1ACOhtdpWzVKgldYwTDLZsGWuFy+z5b64ijXyImpWa39Hu0HoWdS vds5D9GGMfI8kgOlIgsCoC26kre40952VqhOS5ie3axSZt48kufLMUaspiHMiA/B mwmrqMrNXoqC7pAWnv8Ur4NjEBlTkPtL4+kIMX0zeqJsRG24Fiee0e42CvNjFixF 1F419HHKFQPYOvCb/SPLv18gmcyfHdTEbB+yjYIO48ccrPfu/LtTTKG85D1lFw7C cy3AWKz6EzYuzs0wtia+0K+5UwOhm2h9Z77kyX5MIfyrb+KWAfupFW6l9NnfmBtp +sJBSkIrph6oNmZADx2eSwev+pdlcy0uTOULOiPTF4Iu826LQxoaV9pFcvA+/HXZ EbKNrYxDNNYCugLybtULIiFJ8f1i90/UtQlrqAr21deC+gJYliDZScXnzi6lPuv7 afn4QgMmFLus4ehB0TKcA7QbWL3yyrivXO4TSQNzcZaeXAvHOw0= =dIM2 -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/dgilbert/tags/pull-migration-20210526a' into staging Virtiofs, migration and hmp pull 2021-05-26 Fixes for a loadvm regression from Kevin, some virtiofsd cleanups from Vivek and Mahmoud, and some RDMA migration fixups from Li. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> # gpg: Signature made Wed 26 May 2021 18:43:30 BST # gpg: using RSA key 45F5C71B4A0CB7FB977A9FA90516331EBC5BFDE7 # gpg: Good signature from "Dr. David Alan Gilbert (RH2) <dgilbert@redhat.com>" [full] # Primary key fingerprint: 45F5 C71B 4A0C B7FB 977A 9FA9 0516 331E BC5B FDE7 * remotes/dgilbert/tags/pull-migration-20210526a: migration/rdma: source: poll cm_event from return path migration/rdma: destination: create the return patch after the first accept migration/rdma: Fix rdma_addrinfo res leaks migration/rdma: cleanup rdma in rdma_start_incoming_migration error path migration/rdma: Fix cm_event used before being initialized tools/virtiofsd/fuse_opt.c: Replaced a malloc with GLib's g_try_malloc tools/virtiofsd/buffer.c: replaced a calloc call with GLib's g_try_new0 virtiofsd: Set req->reply_sent right after sending reply virtiofsd: Check EOF before short read virtiofsd: Simplify skip byte logic virtiofsd: get rid of in_sg_left variable virtiofsd: Use iov_discard_front() to skip bytes virtiofsd: Get rid of unreachable code in read virtiofsd: Check for EINTR in preadv() and retry hmp: Fix loadvm to resume the VM on success instead of failure Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2021-05-27 14:57:01 +01:00
Li Zhijian	e49e49dd73	migration/rdma: source: poll cm_event from return path source side always blocks if postcopy is only enabled at source side. users are not able to cancel this migration in this case. Let source side have chance to cancel this migration Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> Message-Id: <20210525080552.28259-4-lizhijian@cn.fujitsu.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Typo fix	2021-05-26 18:39:32 +01:00
Li Zhijian	44bcfd45e9	migration/rdma: destination: create the return patch after the first accept destination side: $ build/qemu-system-x86_64 -enable-kvm -netdev tap,id=hn0,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown -device e1000,netdev=hn0,mac=50:52:54:00:11:22 -boot c -drive if=none,file=./Fedora-rdma-server-migration.qcow2,id=drive-virtio-disk0 -device virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 -m 2048 -smp 2 -device piix3-usb-uhci -device usb-tablet -monitor stdio -vga qxl -spice streaming-video=filter,port=5902,disable-ticketing -incoming rdma:192.168.1.10:8888 (qemu) migrate_set_capability postcopy-ram on (qemu) dest_init RDMA Device opened: kernel name rocep1s0f0 uverbs device name uverbs0, infiniband_verbs class device path /sys/class/infiniband_verbs/uverbs0, infiniband class device path /sys/class/infiniband/rocep1s0f0, transport: (2) Ethernet Segmentation fault (core dumped) (gdb) bt #0 qemu_rdma_accept (rdma=0x0) at ../migration/rdma.c:3272 #1 rdma_accept_incoming_migration (opaque=0x0) at ../migration/rdma.c:3986 #2 0x0000563c9e51f02a in aio_dispatch_handler (ctx=ctx@entry=0x563ca0606010, node=0x563ca12b2150) at ../util/aio-posix.c:329 #3 0x0000563c9e51f752 in aio_dispatch_handlers (ctx=0x563ca0606010) at ../util/aio-posix.c:372 #4 aio_dispatch (ctx=0x563ca0606010) at ../util/aio-posix.c:382 #5 0x0000563c9e4f4d9e in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>, user_data=<optimized out>) at ../util/async.c:306 #6 0x00007fe96ef3fa9f in g_main_context_dispatch () at /lib64/libglib-2.0.so.0 #7 0x0000563c9e4ffeb8 in glib_pollfds_poll () at ../util/main-loop.c:231 #8 os_host_main_loop_wait (timeout=12188789) at ../util/main-loop.c:254 #9 main_loop_wait (nonblocking=nonblocking@entry=0) at ../util/main-loop.c:530 #10 0x0000563c9e3c7211 in qemu_main_loop () at ../softmmu/runstate.c:725 #11 0x0000563c9dfd46fe in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at ../softmmu/main.c:50 The rdma return path will not be created when qemu incoming is starting since migrate_copy() is false at that moment, then a NULL return path rdma was referenced if the user enabled postcopy later. Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> Message-Id: <20210525080552.28259-3-lizhijian@cn.fujitsu.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-05-26 18:39:32 +01:00
Li Zhijian	f53b450ada	migration/rdma: Fix rdma_addrinfo res leaks rdma_freeaddrinfo() is the reverse operation of rdma_getaddrinfo() Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20210525080552.28259-2-lizhijian@cn.fujitsu.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-05-26 18:39:32 +01:00
Li Zhijian	4e812d2338	migration/rdma: cleanup rdma in rdma_start_incoming_migration error path the error path after calling qemu_rdma_dest_init() should do rdma cleanup Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> Message-Id: <20210520081148.17001-1-lizhijian@cn.fujitsu.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-05-26 18:39:32 +01:00
Li Zhijian	efb208dc9c	migration/rdma: Fix cm_event used before being initialized A segmentation fault was triggered when i try to abort a postcopy + rdma migration. since rdma_ack_cm_event releases a uninitialized cm_event in these case. like below: 2496 ret = rdma_get_cm_event(rdma->channel, &cm_event); 2497 if (ret) { 2498 perror("rdma_get_cm_event after rdma_connect"); 2499 ERROR(errp, "connecting to destination!"); 2500 rdma_ack_cm_event(cm_event); <<<< cause segmentation fault 2501 goto err_rdma_source_connect; 2502 } Refer to the rdma_get_cm_event() code, cm_event will be updated/changed only if rdma_get_cm_event() returns 0. So it's okey to remove the ack in error patch. Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> Message-Id: <20210519064740.10828-1-lizhijian@cn.fujitsu.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-05-26 18:39:32 +01:00
Paolo Bonzini	b02629550d	replication: move include out of root directory The replication.h file is included from migration/colo.c and tests/unit/test-replication.c, so it should be in include/. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-05-26 14:49:46 +02:00
Peter Maydell	9b1e81d1c2	* Replace YAML anchors by extends in the gitlab-CI yaml files * Many small qtest fixes (e.g. to fix issues discovered by Coverity) * Poison more config switches in common code * Fix the failing Travis-CI and Cirrus-CI tasks -----BEGIN PGP SIGNATURE----- iQJFBAABCAAvFiEEJ7iIR+7gJQEY8+q5LtnXdP5wLbUFAmCeXFMRHHRodXRoQHJl ZGhhdC5jb20ACgkQLtnXdP5wLbUV7g//VRxN74v2pO6zSsSNavYTkMa3jZE1Io3l w9lsOr3DM0HIMhyEMwSJsiygj0s8TNanUtFBHfDfo1k1gmZYd5FiM1sB7ZB/sB/S 9Q9wjmeV2NMG7pcr9zLmiJaEM2LGfGbso44/m0c+gpSyxzg2TeH7sF+38AUZI9/R J6/gOzIs4WWdSV4f8kp+YPPQCyOtxZsfxDuk1z1fKsBgXE5iEBUqYyyZUm4gYJkN 4ALmqc/NVeLszt08qkPHxwXQc488nCJP31psxx6MQ7toKfCAamT07Pp3oi70cakC y+HVfsIHc7SxaZyj8sbJCU3LJHwDd8N82ydJUrv216qDffb38fNb+m6y9mcYy2MH C25uGe9mucTuTP1WIttC6nCKg/MCgi4PWIzqEhkeKj3TxpTUJJP+BUIfSubV38Gc T+XUCNkWWW8sTeRiyE3m9pEZ+gz1MFubaIr/Owephch1SjRYn4zUwJFHHi3I4PY4 7XjEo8y2M9tKckW3pCfYi+aIDzC1DcRtzvXUGtdemX5xVjTAGlKkFLWl/brdCn2U y+JrwrL8cQ3Fy7bXzmK6m9mmhcIW6PcSOBE7RnSESyKzqM5VBSBgnjMqnHcP3FrV FYzihTCDcFKy3ELUe3Kbc1TgG/07stQs9xInGXN7ZFsIwadfHgoNVQ6JpOqq1oQa fvmLPBhq+a4= =QZuG -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/thuth-gitlab/tags/pull-request-2021-05-14' into staging * Replace YAML anchors by extends in the gitlab-CI yaml files * Many small qtest fixes (e.g. to fix issues discovered by Coverity) * Poison more config switches in common code * Fix the failing Travis-CI and Cirrus-CI tasks # gpg: Signature made Fri 14 May 2021 12:17:39 BST # gpg: using RSA key 27B88847EEE0250118F3EAB92ED9D774FE702DB5 # gpg: issuer "thuth@redhat.com" # gpg: Good signature from "Thomas Huth <th.huth@gmx.de>" [full] # gpg: aka "Thomas Huth <thuth@redhat.com>" [full] # gpg: aka "Thomas Huth <huth@tuxfamily.org>" [full] # gpg: aka "Thomas Huth <th.huth@posteo.de>" [unknown] # Primary key fingerprint: 27B8 8847 EEE0 2501 18F3 EAB9 2ED9 D774 FE70 2DB5 * remotes/thuth-gitlab/tags/pull-request-2021-05-14: cirrus.yml: Fix the MSYS2 task pc-bios/s390-ccw: Fix inline assembly for older versions of Clang tests/qtest/migration-test: Use g_autofree to avoid leaks on error paths configure: Poison all current target-specific #defines migration: Move populate_vfio_info() into a separate file include/sysemu: Poison all accelerator CONFIG switches in common code tests: Avoid side effects inside g_assert() arguments tests/qtest/rtc-test: Remove pointless NULL check tests/qtest/tpm-util.c: Free memory with correct free function tests/migration-test: Fix "true" vs true tests/qtest/npcm7xx_pwm-test.c: Avoid g_assert_true() for non-test assertions tests/qtest/ahci-test.c: Calculate iso_size with 64-bit arithmetic util/compatfd.c: Replaced a malloc call with g_malloc. libqtest: refuse QTEST_QEMU_BINARY=qemu-kvm docs/devel/qgraph: add troubleshooting information libqos/qgraph: fix "UNAVAILBLE" typo gitlab-ci: Replace YAML anchors by extends (native_test_job) gitlab-ci: Replace YAML anchors by extends (native_build_job) gitlab-ci: Replace YAML anchors by extends (container_job) tests/docker/dockerfiles: Add ccache to containers where it was missing Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2021-05-14 19:33:23 +01:00
Thomas Huth	43bd0bf30f	migration: Move populate_vfio_info() into a separate file The CONFIG_VFIO switch only works in target specific code. Since migration/migration.c is common code, the #ifdef does not have the intended behavior here. Move the related code to a separate file now which gets compiled via specific_ss instead. Fixes: `3710586caa` ("qapi: Add VFIO devices migration stats in Migration stats") Message-Id: <20210414112004.943383-3-thuth@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Thomas Huth <thuth@redhat.com>	2021-05-14 12:31:51 +02:00
David Hildenbrand	542147f4e5	migration/ram: Use offset_in_ramblock() in range checks We never read or write beyond the used_length of memory blocks when migrating. Make this clearer by using offset_in_ramblock() consistently. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20210429112708.12291-11-david@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-05-13 18:21:14 +01:00
David Hildenbrand	c1668bde5c	migration/multifd: Print used_length of memory block We actually want to print the used_length, against which we check. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20210429112708.12291-10-david@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-05-13 18:21:14 +01:00
David Hildenbrand	898ba906cc	migration/ram: Handle RAM block resizes during postcopy Resizing while migrating is dangerous and does not work as expected. The whole migration code works with the usable_length of a ram block and does not expect this value to change at random points in time. In the case of postcopy, relying on used_length is racy as soon as the guest is running. Also, when used_length changes we might leave the uffd handler registered for some memory regions, reject valid pages when migrating and fail when sending the recv bitmap to the source. Resizing can be trigger after (but not during) a reset in ACPI code by the guest - hw/arm/virt-acpi-build.c:acpi_ram_update() - hw/i386/acpi-build.c:acpi_ram_update() Let's remember the original used_length in a separate variable and use it in relevant postcopy code. Make sure to update it when we resize during precopy, when synchronizing the RAM block sizes with the source. Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20210429112708.12291-9-david@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-05-13 18:21:14 +01:00
David Hildenbrand	6a23f6399a	migration/ram: Simplify host page handling in ram_load_postcopy() Add two new helper functions. This will come in come handy once we want to handle ram block resizes while postcopy is active. Note that ram_block_from_stream() will already print proper errors. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20210429112708.12291-8-david@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> dgilbert: Added brackets in host_page_from_ram_block_offset to cause uintptr_t to cast the sum, to fix armhf-cross build	2021-05-13 18:21:13 +01:00
David Hildenbrand	cc61c703b6	migration/ram: Discard RAM when growing RAM blocks after ram_postcopy_incoming_init() In case we grow our RAM after ram_postcopy_incoming_init() (e.g., when synchronizing the RAM block state with the migration source), the resized part would not get discarded. Let's perform that when being notified about a resize while postcopy has been advised, but is not listening yet. With precopy, the process is as following: 1. VM created - RAM blocks are created 2. Incomming migration started - Postcopy is advised - All pages in RAM blocks are discarded 3. Precopy starts - RAM blocks are resized to match the size on the migration source. - RAM pages from precopy stream are loaded - Uffd handler is registered, postcopy starts listening 4. Guest started, postcopy running - Pagefaults get resolved, pages get placed Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20210429112708.12291-7-david@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-05-13 18:21:13 +01:00
David Hildenbrand	c7c0e72408	migration/ram: Handle RAM block resizes during precopy Resizing while migrating is dangerous and does not work as expected. The whole migration code works on the usable_length of ram blocks and does not expect this to change at random points in time. In the case of precopy, the ram block size must not change on the source, after syncing the RAM block list in ram_save_setup(), so as long as the guest is still running on the source. Resizing can be trigger after (but not during) a reset in ACPI code by the guest - hw/arm/virt-acpi-build.c:acpi_ram_update() - hw/i386/acpi-build.c:acpi_ram_update() Use the ram block notifier to get notified about resizes. Let's simply cancel migration and indicate the reason. We'll continue running on the source. No harm done. Update the documentation. Postcopy will be handled separately. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20210429112708.12291-5-david@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Manual merge	2021-05-13 18:21:13 +01:00
Markus Armbruster	372043f389	migration: Drop redundant query-migrate result @blocked Result @blocked is redundant. Unfortunately, we realized this too close to the release to risk dropping it, so we deprecated it instead, in commit `e11ce6c06`. Since it was deprecated from the start, we can delete it without the customary grace period. Do so. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <20210429140424.2802929-1-armbru@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-05-13 18:21:13 +01:00
Kunkun Jiang	ba1b7c812c	migration/ram: Optimize ram_save_host_page() Starting from pss->page, ram_save_host_page() will check every page and send the dirty pages up to the end of the current host page or the boundary of used_length of the block. If the host page size is a huge page, the step "check" will take a lot of time. It will improve performance to use migration_bitmap_find_dirty(). Tested on Kunpeng 920; VM parameters: 1U 4G (page size 1G) The time of ram_save_host_page() in the last round of ram saving: before optimize: 9250us after optimize: 34us Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com> Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com> Reviewed-by: Peter Xu <peterx@redhat.com> Message-Id: <20210316125716.1243-3-jiangkunkun@huawei.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-05-13 18:21:13 +01:00
Kunkun Jiang	23feba906e	migration/ram: Reduce unnecessary rate limiting When the host page is a huge page and something is sent in the current iteration, migration_rate_limit() should be executed. If not, it can be omitted. Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com> Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com> Reviewed-by: David Edmondson <david.edmondson@oracle.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20210316125716.1243-2-jiangkunkun@huawei.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-05-13 18:21:13 +01:00
David Hildenbrand	1a37352277	migrate/ram: remove "ram_bulk_stage" and "fpo_enabled" The bulk stage is kind of weird: migration_bitmap_find_dirty() will indicate a dirty page, however, ram_save_host_page() will never save it, as migration_bitmap_clear_dirty() detects that it is not dirty. We already fill the bitmap in ram_list_init_bitmaps() with ones, marking everything dirty - it didn't used to be that way, which is why we needed an explicit first bulk stage. Let's simplify: make the bitmap the single source of thuth. Explicitly handle the "xbzrle_enabled after first round" case. Regarding XBZRLE (implicitly handled via "ram_bulk_stage = false" right now), there is now a slight change in behavior: - Colo: When starting, it will be disabled (was implicitly enabled) until the first round actually finishes. - Free page hinting: When starting, XBZRLE will be disabled (was implicitly enabled) until the first round actually finished. - Snapshots: When starting, XBZRLE will be disabled. We essentially only do a single run, so I guess it will never actually get disabled. Postcopy seems to indirectly disable it in ram_save_page(), so there shouldn't be really any change. Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Juan Quintela <quintela@redhat.com> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com> Cc: Andrey Gruzdev <andrey.gruzdev@virtuozzo.com> Cc: Peter Xu <peterx@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20210216105039.40680-1-david@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-05-13 18:21:13 +01:00
Thomas Huth	2068cabd3f	Do not include cpu.h if it's not really necessary Stop including cpu.h in files that don't need it. Signed-off-by: Thomas Huth <thuth@redhat.com> Message-Id: <20210416171314.2074665-4-thuth@redhat.com> Signed-off-by: Laurent Vivier <laurent@vivier.eu>	2021-05-02 17:24:51 +02:00
Thomas Huth	4c386f8064	Do not include sysemu/sysemu.h if it's not really necessary Stop including sysemu/sysemu.h in files that don't need it. Signed-off-by: Thomas Huth <thuth@redhat.com> Message-Id: <20210416171314.2074665-2-thuth@redhat.com> Signed-off-by: Laurent Vivier <laurent@vivier.eu>	2021-05-02 17:24:50 +02:00
Andrey Gruzdev	82ea3e3b99	migration: Rename 'bs' to 'block' in background snapshot code Rename 'bs' to commonly used 'block' in migration/ram.c background snapshot code. Signed-off-by: Andrey Gruzdev <andrey.gruzdev@virtuozzo.com> Reported-by: David Hildenbrand <david@redhat.com> Message-Id: <20210401092226.102804-5-andrey.gruzdev@virtuozzo.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-04-07 18:37:56 +01:00
Andrey Gruzdev	eeccb99c9d	migration: Pre-fault memory before starting background snasphot This commit solves the issue with userfault_fd WP feature that background snapshot is based on. For any never poluated or discarded memory page, the UFFDIO_WRITEPROTECT ioctl() would skip updating PTE for that page, thereby loosing WP setting for it. So we need to pre-fault pages for each RAM block to be protected before making a userfault_fd wr-protect ioctl(). Fixes: `278e2f551a` (migration: support UFFD write fault processing in ram_save_iterate()) Signed-off-by: Andrey Gruzdev <andrey.gruzdev@virtuozzo.com> Reported-by: David Hildenbrand <david@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Message-Id: <20210401092226.102804-4-andrey.gruzdev@virtuozzo.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> dgilbert: Bodged ifdef __linux__ on ram_write_tracking_prepare, should really go in a stub	2021-04-07 18:37:28 +01:00
Andrey Gruzdev	1a8e44a89f	migration: Inhibit virtio-balloon for the duration of background snapshot The same thing as for incoming postcopy - we cannot deal with concurrent RAM discards when using background snapshot feature in outgoing migration. Fixes: `8518278a6a` (migration: implementation of background snapshot thread) Signed-off-by: Andrey Gruzdev <andrey.gruzdev@virtuozzo.com> Reported-by: David Hildenbrand <david@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Message-Id: <20210401092226.102804-3-andrey.gruzdev@virtuozzo.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-04-06 18:56:01 +01:00
Andrey Gruzdev	ecb23efea0	migration: Fix missing qemu_fflush() on buffer file in bg_migration_thread Added missing qemu_fflush() on buffer file holding precopy device state. Increased initial QIOChannelBuffer allocation to 512KB to avoid reallocs. Typical configurations often require >200KB for device state and VMDESC. Fixes: `8518278a6a` (migration: implementation of background snapshot thread) Signed-off-by: Andrey Gruzdev <andrey.gruzdev@virtuozzo.com> Message-Id: <20210401092226.102804-2-andrey.gruzdev@virtuozzo.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-04-06 18:56:01 +01:00
Peter Maydell	415fa2fe91	For 6.0 misc patches under my radar. V2: - "tests: Add tests for yank with the chardev-change case" updated - drop the readthedoc theme patch -----BEGIN PGP SIGNATURE----- iQJQBAABCAA6FiEEh6m9kz+HxgbSdvYt2ujhCXWWnOUFAmBltIwcHG1hcmNhbmRy ZS5sdXJlYXVAcmVkaGF0LmNvbQAKCRDa6OEJdZac5QzCEACOWNUlT5mTylY52sB4 RXxt+vFFby6aj/M5/tXv7T4EHShWkycV5kEnGjUKiWQXfHfHRfOur6PXkbTMG5zY UBEuVAMWW50O2VQZR51W+kohZxsxNnimK2gnCTNGjDWOiofTFAcDf7Ycfxbg1TYU fsO3m/dl9cy1fBgCsm64+61T60DC5W0JRsxoRCR1qr4vbJtXjoYe9i21GMWOr548 EVZo3XQDe5WYeTRyTpf1lHU0dLPrJqZuKmF6M3IQWXG7+ns7iMA0v/STmwsBwqSr W6vygj2vPKAi1b1X1z/t/IGXP7mOtTZMUZWxhdOcxqEgYyP4rZji02U33CCd0fCi wbD8VOmwvtqPeEHXu/b/dhpacgHis1w8jyJspAcW0MIpFJ+1mn+xtWnmMUlA2cOS Vmgirinycsim9TKA+jS3vTwT+/wwzqtWUY267m09tVhJwxvGOXQH1i+mlRRLoNcs 2vf5iWanRbZgFJme8UYtqYB96pWIJjMa1FkMexJgK3VXgMA+Rjkr4MqIyuPoquyp /3PgoUU1LUmGh8F+mi8m88tpdgad6iM+UWXeRALsP7UFvP1Psjz8f6Fhh8uBeE7E wsdBsdTwwZ3zgLD4DxjpcZdLM+G7PT0nbeodnPWRuwebsYt3FymoCdmkS1CEn9ZT kbQxdeJhTa7QoacZUmQSAoXO6g== =UwXe -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/marcandre/tags/for-6.0-pull-request' into staging For 6.0 misc patches under my radar. V2: - "tests: Add tests for yank with the chardev-change case" updated - drop the readthedoc theme patch # gpg: Signature made Thu 01 Apr 2021 12:54:52 BST # gpg: using RSA key 87A9BD933F87C606D276F62DDAE8E10975969CE5 # gpg: issuer "marcandre.lureau@redhat.com" # gpg: Good signature from "Marc-André Lureau <marcandre.lureau@redhat.com>" [full] # gpg: aka "Marc-André Lureau <marcandre.lureau@gmail.com>" [full] # Primary key fingerprint: 87A9 BD93 3F87 C606 D276 F62D DAE8 E109 7596 9CE5 * remotes/marcandre/tags/for-6.0-pull-request: tests: Add tests for yank with the chardev-change case chardev: Fix yank with the chardev-change case chardev/char.c: Always pass id to chardev_new chardev/char.c: Move object_property_try_add_child out of chardev_new yank: Always link full yank code yank: Remove dependency on qiochannel docs: simplify each section title dbus-vmstate: Increase the size of input stream buffer used during load util: fix use-after-free in module_load_one Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2021-04-01 17:08:48 +01:00
Lukas Straub	1a92d6d500	yank: Remove dependency on qiochannel Remove dependency on qiochannel by removing yank_generic_iochannel and letting migration and chardev use their own yank function for iochannel. Signed-off-by: Lukas Straub <lukasstraub2@web.de> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20ff143fc2db23e27cd41d38043e481376c9cec1.1616521341.git.lukasstraub2@web.de>	2021-04-01 15:27:44 +04:00
Jessica Clarke	76f67bac79	meson: Propagate gnutls dependency to migration Commit `3eacf70bb5` neglected to fix this for softmmu configs, which pull in migration's use of gnutls. This fixes the following compilation failure on Arm-based Macs: In file included from migration/multifd.c:23: In file included from migration/tls.h:25: In file included from include/io/channel-tls.h:26: In file included from include/crypto/tlssession.h:24: include/crypto/tlscreds.h:28:10: fatal error: 'gnutls/gnutls.h' file not found #include <gnutls/gnutls.h> ^~~~~~~~~~~~~~~~~ 1 error generated. (as well as for channel.c and tls.c) Signed-off-by: Jessica Clarke <jrtc27@jrtc27.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Message-Id: <20210320171221.37437-1-jrtc27@jrtc27.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-04-01 09:40:45 +02:00
Vladimir Sementsov-Ogievskiy	4290b4834c	migration/block-dirty-bitmap: make incoming disabled bitmaps busy Incoming enabled bitmaps are busy, because we do bdrv_dirty_bitmap_create_successor() for them. But disabled bitmaps being migrated are not marked busy, and user can remove them during the incoming migration. Then we may crash in cancel_incoming_locked() when try to remove the bitmap that was already removed by user, like this: #0 qemu_mutex_lock_impl (mutex=0x5593d88c50d1, file=0x559680554b20 "../block/dirty-bitmap.c", line=64) at ../util/qemu-thread-posix.c:77 #1 bdrv_dirty_bitmaps_lock (bs=0x5593d88c0ee9) at ../block/dirty-bitmap.c:64 #2 bdrv_release_dirty_bitmap (bitmap=0x5596810e9570) at ../block/dirty-bitmap.c:362 #3 cancel_incoming_locked (s=0x559680be8208 <dbm_state+40>) at ../migration/block-dirty-bitmap.c:918 #4 dirty_bitmap_load (f=0x559681d02b10, opaque=0x559680be81e0 <dbm_state>, version_id=1) at ../migration/block-dirty-bitmap.c:1194 #5 vmstate_load (f=0x559681d02b10, se=0x559680fb5810) at ../migration/savevm.c:908 #6 qemu_loadvm_section_part_end (f=0x559681d02b10, mis=0x559680fb4a30) at ../migration/savevm.c:2473 #7 qemu_loadvm_state_main (f=0x559681d02b10, mis=0x559680fb4a30) at ../migration/savevm.c:2626 #8 postcopy_ram_listen_thread (opaque=0x0) at ../migration/savevm.c:1871 #9 qemu_thread_start (args=0x5596817ccd10) at ../util/qemu-thread-posix.c:521 #10 start_thread () at /lib64/libpthread.so.0 #11 clone () at /lib64/libc.so.6 Note bs pointer taken from bitmap: it's definitely bad aligned. That's because we are in use after free, bitmap is already freed. So, let's make disabled bitmaps (being migrated) busy during incoming migration. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20210322094906.5079-2-vsementsov@virtuozzo.com>	2021-03-24 13:41:19 +00:00
Daniel P. Berrangé	cbde7be900	migrate: remove QMP/HMP commands for speed, downtime and cache size The generic 'migrate_set_parameters' command handle all types of param. Only the QMP commands were documented in the deprecations page, but the rationale for deprecating applies equally to HMP, and the replacements exist. Furthermore the HMP commands are just shims to the QMP commands, so removing the latter breaks the former unless they get re-implemented. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>	2021-03-18 09:22:55 +00:00
Mahmoud Mandour	373969507a	migration: Replaced qemu_mutex_lock calls with QEMU_LOCK_GUARD Replaced various qemu_mutex_lock calls and their respective qemu_mutex_unlock calls with QEMU_LOCK_GUARD macro. This simplifies the code by eliminating the respective qemu_mutex_unlock calls. Signed-off-by: Mahmoud Mandour <ma.mandourr@gmail.com> Message-Id: <20210311031538.5325-7-ma.mandourr@gmail.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-03-15 20:01:55 +00:00
Hao Wang	fca676429c	migration/tls: add error handling in multifd_tls_handshake_thread If any error happens during multifd send thread creating (e.g. channel broke because new domain is destroyed by the dst), multifd_tls_handshake_thread may exit silently, leaving main migration thread hanging (ram_save_setup -> multifd_send_sync_main -> qemu_sem_wait(&p->sem_sync)). Fix that by adding error handling in multifd_tls_handshake_thread. Signed-off-by: Hao Wang <wanghao232@huawei.com> Message-Id: <20210209104237.2250941-3-wanghao232@huawei.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Chuan Zheng <zhengchuan@huawei.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-03-15 20:01:55 +00:00
Hao Wang	a339149afa	migration/tls: fix inverted semantics in multifd_channel_connect Function multifd_channel_connect() return "true" to indicate failure, which is rather confusing. Fix that. Signed-off-by: Hao Wang <wanghao232@huawei.com> Message-Id: <20210209104237.2250941-2-wanghao232@huawei.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Chuan Zheng <zhengchuan@huawei.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-03-15 20:01:55 +00:00
Peter Krempa	6e9f21a2aa	migration: dirty-bitmap: Allow control of bitmap persistence Bitmap's source persistence is transported over the migration stream and the destination mirrors it. In some cases the destination might want to persist bitmaps which are not persistent on the source (e.g. the result of merging bitmaps from a number of layers on the source when migrating into a squashed image) but currently it would need to create another set of persistent bitmaps and merge them. This patch adds a 'transform' property to the alias map which allows overriding the persistence of migrated bitmaps both on the source and destination sides. Signed-off-by: Peter Krempa <pkrempa@redhat.com> Message-Id: <b20afb675917b86f6359ac3591166ac6d4233573.1613150869.git.pkrempa@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> [eblake: grammar tweaks, drop dead conditional] Signed-off-by: Eric Blake <eblake@redhat.com>	2021-02-12 15:24:36 -06:00
Peter Krempa	0d1e450c7b	migration: dirty-bitmap: Use struct for alias map inner members Currently the alias mapping hash stores just strings of the target objects internally. In further patches we'll be adding another member which will need to be stored in the map so pass a copy of the whole BitmapMigrationBitmapAlias QAPI struct into the map. Signed-off-by: Peter Krempa <pkrempa@redhat.com> Message-Id: <fc5f27e1fe16cb75e08a248c2d938de3997b9bfb.1613150869.git.pkrempa@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> [eblake: adjust long lines] Signed-off-by: Eric Blake <eblake@redhat.com>	2021-02-12 14:50:55 -06:00
Stefan Reiter	e846b74650	migration: only check page size match if RAM postcopy is enabled Postcopy may also be advised for dirty-bitmap migration only, in which case the remote page size will not be available and we'll instead read bogus data, blocking migration with a mismatch error if the VM uses hugepages. Fixes: `58110f0acb` ("migration: split common postcopy out of ram postcopy") Signed-off-by: Stefan Reiter <s.reiter@proxmox.com> Message-Id: <20210204163522.13291-1-s.reiter@proxmox.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-02-08 11:19:52 +00:00
Daniel P. Berrangé	0f0d83a456	migration: introduce snapshot-{save, load, delete} QMP commands savevm, loadvm and delvm are some of the few HMP commands that have never been converted to use QMP. The reasons for the lack of conversion are that they blocked execution of the event thread, and the semantics around choice of disks were ill-defined. Despite this downside, however, libvirt and applications using libvirt have used these commands for as long as QMP has existed, via the "human-monitor-command" passthrough command. IOW, while it is clearly desirable to be able to fix the problems, they are not a blocker to all real world usage. Meanwhile there is a need for other features which involve adding new parameters to the commands. This is possible with HMP passthrough, but it provides no reliable way for apps to introspect features, so using QAPI modelling is highly desirable. This patch thus introduces new snapshot-{load,save,delete} commands to QMP that are intended to replace the old HMP counterparts. The new commands are given different names, because they will be using the new QEMU job framework and thus will have diverging behaviour from the HMP originals. It would thus be misleading to keep the same name. While this design uses the generic job framework, the current impl is still blocking. The intention that the blocking problem is fixed later. None the less applications using these new commands should assume that they are asynchronous and thus wait for the job status change event to indicate completion. In addition to using the job framework, the new commands require the caller to be explicit about all the block device nodes used in the snapshot operations, with no built-in default heuristics in use. Note that the existing "query-named-block-nodes" can be used to query what snapshots currently exist for block nodes. Acked-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <20210204124834.774401-13-berrange@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> dgilbert: removed tests for now, the output ordering isn't deterministic	2021-02-08 11:19:52 +00:00
Daniel P. Berrangé	bef7e9e2c7	migration: introduce a delete_snapshot wrapper Make snapshot deletion consistent with the snapshot save and load commands by using a wrapper around the blockdev layer. The main difference is that we get upfront validation of the passed in device list (if any). Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <20210204124834.774401-10-berrange@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-02-08 11:19:51 +00:00
Daniel P. Berrangé	f1a9fcdd01	migration: wire up support for snapshot device selection Modify load_snapshot/save_snapshot to accept the device list and vmstate node name parameters previously added to the block layer. Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <20210204124834.774401-9-berrange@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-02-08 11:19:51 +00:00
Daniel P. Berrangé	f781f84189	migration: control whether snapshots are ovewritten The traditional HMP "savevm" command will overwrite an existing snapshot if it already exists with the requested name. This new flag allows this to be controlled allowing for safer behaviour with a future QMP command. Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <20210204124834.774401-8-berrange@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-02-08 11:19:51 +00:00
Daniel P. Berrangé	3d3e9b1f66	block: rename and alter bdrv_all_find_snapshot semantics Currently bdrv_all_find_snapshot() will return 0 if it finds a snapshot, -1 if an error occurs, or if it fails to find a snapshot. New callers to be added want to distinguish between the error scenario and failing to find a snapshot. Rename it to bdrv_all_has_snapshot and make it return -1 on error, 0 if no snapshot is found and 1 if snapshot is found. Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <20210204124834.774401-7-berrange@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-02-08 11:19:51 +00:00
Daniel P. Berrangé	c22d644ca7	block: allow specifying name of block device for vmstate storage Currently the vmstate will be stored in the first block device that supports snapshots. Historically this would have usually been the root device, but with UEFI it might be the variable store. There needs to be a way to override the choice of block device to store the state in. Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <20210204124834.774401-6-berrange@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-02-08 11:19:51 +00:00
Daniel P. Berrangé	cf3a74c94f	block: add ability to specify list of blockdevs during snapshot When running snapshot operations, there are various rules for which blockdevs are included/excluded. While this provides reasonable default behaviour, there are scenarios that are not well handled by the default logic. Some of the conditions do not have a single correct answer. Thus there needs to be a way for the mgmt app to provide an explicit list of blockdevs to perform snapshots across. This can be achieved by passing a list of node names that should be used. Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <20210204124834.774401-5-berrange@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-02-08 11:19:51 +00:00
Daniel P. Berrangé	f61fe11aa6	migration: stop returning errno from load_snapshot() None of the callers care about the errno value since there is a full Error object populated. This gives consistency with save_snapshot() which already just returns a boolean value. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> [PMD: Return false/true instead of -1/0, document function] Acked-by: Pavel Dovgalyuk <pavel.dovgalyuk@ispras.ru> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20210204124834.774401-4-berrange@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-02-08 11:19:51 +00:00
Philippe Mathieu-Daudé	7ea14df230	migration: Make save_snapshot() return bool, not 0/-1 Just for consistency, following the example documented since commit `e3fe3988d7` ("error: Document Error API usage rules"), return a boolean value indicating an error is set or not. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Acked-by: Pavel Dovgalyuk <pavel.dovgalyuk@ispras.ru> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20210204124834.774401-3-berrange@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-02-08 11:19:51 +00:00
Daniel P. Berrangé	e26f98e209	block: push error reporting into bdrv_all__snapshot functions The bdrv_all__snapshot functions return a BlockDriverState pointer for the invalid backend, which the callers then use to report an error message. In some cases multiple callers are reporting the same error message, but with slightly different text. In the future there will be more error scenarios for some of these methods, which will benefit from fine grained error message reporting. So it is helpful to push error reporting down a level. Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> [PMD: Initialize variables] Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20210204124834.774401-2-berrange@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-02-08 11:19:51 +00:00
Dr. David Alan Gilbert	3af8554bd0	migration: Add blocker information Modify query-migrate so that it has a flag indicating if outbound migration is blocked, and if it is a list of reasons. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20210202135522.127380-2-dgilbert@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-02-08 11:19:51 +00:00
Markus Armbruster	54270c450a	migration: Fix a few absurdly defective error messages migrate_params_check() has a number of error messages of the form Parameter 'NAME' expects is invalid, it should be ... Fix them to something like Parameter 'NAME' expects a ... Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20210202141734.2488076-5-armbru@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-02-08 11:19:51 +00:00
Markus Armbruster	7bfc47936e	migration: Fix cache_init()'s "Failed to allocate" error messages cache_init() attempts to handle allocation failure. The two error messages are garbage, as untested error messages commonly are: Parameter 'cache size' expects Failed to allocate cache Parameter 'cache size' expects Failed to allocate page cache Fix them to just Failed to allocate cache Failed to allocate page cache Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20210202141734.2488076-4-armbru@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-02-08 11:19:51 +00:00
Markus Armbruster	8b9407a09f	migration: Clean up signed vs. unsigned XBZRLE cache-size `73af8dd8d7` "migration: Make xbzrle_cache_size a migration parameter" (v2.11.0) made the new parameter unsigned (QAPI type 'size', uint64_t in C). It neglected to update existing code, which continues to use int64_t. migrate_xbzrle_cache_size() returns the new parameter. Adjust its return type. QMP query-migrate-cache-size returns migrate_xbzrle_cache_size(). Adjust its return type. migrate-set-parameters passes the new parameter to xbzrle_cache_resize(). Adjust its parameter type. xbzrle_cache_resize() passes it on to cache_init(). Adjust its parameter type. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20210202141734.2488076-3-armbru@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-02-08 11:19:51 +00:00
Andrey Gruzdev	8518278a6a	migration: implementation of background snapshot thread Introducing implementation of 'background' snapshot thread which in overall follows the logic of precopy migration while internally utilizes completely different mechanism to 'freeze' vmstate at the start of snapshot creation. This mechanism is based on userfault_fd with wr-protection support and is Linux-specific. Signed-off-by: Andrey Gruzdev <andrey.gruzdev@virtuozzo.com> Acked-by: Peter Xu <peterx@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20210129101407.103458-5-andrey.gruzdev@virtuozzo.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-02-08 11:19:51 +00:00
Andrey Gruzdev	278e2f551a	migration: support UFFD write fault processing in ram_save_iterate() In this particular implementation the same single migration thread is responsible for both normal linear dirty page migration and procesing UFFD page fault events. Processing write faults includes reading UFFD file descriptor, finding respective RAM block and saving faulting page to the migration stream. After page has been saved, write protection can be removed. Since asynchronous version of qemu_put_buffer() is expected to be used to save pages, we also have to flush migraion stream prior to un-protecting saved memory range. Write protection is being removed for any previously protected memory chunk that has hit the migration stream. That's valid for pages from linear page scan along with write fault pages. Signed-off-by: Andrey Gruzdev <andrey.gruzdev@virtuozzo.com> Acked-by: Peter Xu <peterx@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20210129101407.103458-4-andrey.gruzdev@virtuozzo.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> fixup pagefault.address cast for 32bit	2021-02-08 11:19:51 +00:00
Andrey Gruzdev	6e8c25b4c6	migration: introduce 'background-snapshot' migration capability Add new capability to 'qapi/migration.json' schema. Update migrate_caps_check() to validate enabled capability set against introduced one. Perform checks for required kernel features and compatibility with guest memory backends. Signed-off-by: Andrey Gruzdev <andrey.gruzdev@virtuozzo.com> Reviewed-by: Peter Xu <peterx@redhat.com> Acked-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20210129101407.103458-2-andrey.gruzdev@virtuozzo.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-02-08 11:19:51 +00:00
Wainer dos Santos Moschetta	1dfafcbd39	migration/qemu-file: Fix maybe uninitialized on qemu_get_buffer_in_place() Fixed error when compiling migration/qemu-file.c with -Werror=maybe-uninitialized as shown here: ../migration/qemu-file.c: In function 'qemu_get_buffer_in_place': ../migration/qemu-file.c:604:18: error: 'src' may be used uninitialized in this function [-Werror=maybe-uninitialized] 604 \| *buf = src; \| ~~~~~^~~~~ cc1: all warnings being treated as errors Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com> Message-Id: <20210128130625.569900-1-wainersm@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-02-08 11:19:51 +00:00
Jinhao Gao	39f633d429	savevm: Fix memory leak of vmstate_configuration When VM migrate VMState of configuration, the fields(name and capabilities) of configuration having a flag of VMS_ALLOC need to allocate memory. If the src doesn't free memory of capabilities in SaveState after save VMState of configuration, or the dst doesn't free memory of name and capabilities in post load of configuration, it may result in memory leak of name and capabilities. We free memory in configuration_post_save and configuration_post_load func, which prevents memory leak. Reported-by: Euler Robot <euler.robot@huawei.com> Signed-off-by: Jinhao Gao <gaojinhao@huawei.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Message-Id: <20201231061020.828-3-gaojinhao@huawei.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-02-08 11:19:51 +00:00
Eric Blake	95b3a8c8a8	qapi: More complex uses of QAPI_LIST_APPEND These cases require a bit more thought to review; in each case, the code was appending to a list, but not with a FOOList **tail variable. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210113221013.390592-6-eblake@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> [Flawed change to qmp_guest_network_get_interfaces() dropped] Signed-off-by: Markus Armbruster <armbru@redhat.com>	2021-01-28 08:08:45 +01:00
Lukas Straub	b5eea99ec2	migration: Add yank feature Register yank functions on sockets to shut them down. Signed-off-by: Lukas Straub <lukasstraub2@web.de> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <484c6a14cc2506bebedd5a237259b91363ff8f88.1609167865.git.lukasstraub2@web.de> Signed-off-by: Markus Armbruster <armbru@redhat.com>	2021-01-13 10:21:17 +01:00
Peter Maydell	729cc68373	Remove superfluous timer_del() calls This commit is the result of running the timer-del-timer-free.cocci script on the whole source tree. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Acked-by: Corey Minyard <cminyard@mvista.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20201215154107.3255-4-peter.maydell@linaro.org	2021-01-08 15:13:38 +00:00
Paolo Bonzini	b1def33d19	zstd: convert to meson Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-01-06 10:21:20 +01:00
Peter Maydell	41192db338	Machine queue, 2020-12-23 Cleanup: * qdev code cleanup (Eduardo Habkost) Bug fix: * hostmem: Free host_nodes list right after visited (Keqian Zhu) -----BEGIN PGP SIGNATURE----- iQJIBAABCAAyFiEEWjIv1avE09usz9GqKAeTb5hNxaYFAl/jteYUHGVoYWJrb3N0 QHJlZGhhdC5jb20ACgkQKAeTb5hNxaZUHw//c40nRlYdGSV5j6w3ZCSlmZRFxZTU UiLK51Z3hI9Q9kyLcoIQitEYlQTIbgp0qlIJ6evDd/HvQvZ+P4P0Lzm1UGOZhD0h nJk5+bBkP/mzMh0P9oiN20DSLk6a3Wvdiu/bQR8gm/WdLvTM1Zjek1ns5tL06ZvA MziG6gIypgScu2FeNxD0zC8sDO16oVrzKq7mjZcQe6XYFRsJmYjZw84v+uu/Bdf7 MBxolkA8vYwwBJNdVsAf7I0Gw3BeArgPUOwbWyt8/tuGIOZxYjdKIj55S7j2fuju 524sg8Di+YzxmLZaNAGksEBMj9uY39nwdHGhNElMtWCM9oOPumlps9eyLtpTagfM wmiVrMGWVlXV6c4kZo8R2NSF8hcDr02S7eyrUpITrh09p4nT6fBGG2ufEYiCyNao o9ZqMf7NUO5J60zM5EOfdGxpaN2O0M5pXCCN48NtmqvO0wIAfTc9l/OkCrrfVbEO Q/X1jqbj6ZcilSIl9OeLAPi7Xjx26jMeeLPUQtoZnkqDvpk/Vz6Ka1RgGG86QA5z 2W/KCAoVrg6dO4f9vY3x84rf0Ta5kJtp2LezPgG8d++4bMSf2jN00wYvAQuCyqqW zbm8f57YST3vm8XMHPlmtnlKfiLI4wbVUmrDYu3rNI+JgdvhdXseGoErt15ejAcL B5IH2SK4AwMpSsk= =bnjc -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/ehabkost-gl/tags/machine-next-pull-request' into staging Machine queue, 2020-12-23 Cleanup: * qdev code cleanup (Eduardo Habkost) Bug fix: * hostmem: Free host_nodes list right after visited (Keqian Zhu) # gpg: Signature made Wed 23 Dec 2020 21:25:58 GMT # gpg: using RSA key 5A322FD5ABC4D3DBACCFD1AA2807936F984DC5A6 # gpg: issuer "ehabkost@redhat.com" # gpg: Good signature from "Eduardo Habkost <ehabkost@redhat.com>" [full] # Primary key fingerprint: 5A32 2FD5 ABC4 D3DB ACCF D1AA 2807 936F 984D C5A6 * remotes/ehabkost-gl/tags/machine-next-pull-request: bugfix: hostmem: Free host_nodes list right after visited qdev: Avoid unnecessary DeviceState* variable at set_prop_arraylen() qdev: Rename qdev_get_prop_ptr() to object_field_prop_ptr() qdev: Move qdev_prop_tpm declaration to tpm_prop.h qdev: Make qdev_class_add_property() more flexible qdev: Make PropertyInfo.create return ObjectProperty* qdev: Move dev->realized check to qdev_property_set() qdev: Wrap getters and setters in separate helpers qdev: Add name argument to PropertyInfo.create method qdev: Add name parameter to qdev_class_add_property() qdev: Avoid using prop->name unnecessarily qdev: Get just property name at error_set_from_qdev_prop_error() sparc: Use DEFINE_PROP for nwindows property qdev: Reuse DEFINE_PROP in all DEFINE_PROP_* macros qdev: Move softmmu properties to qdev-properties-system.h Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2021-01-01 22:57:15 +00:00
Peter Maydell	1f7c02797f	QAPI patches patches for 2020-12-19 -----BEGIN PGP SIGNATURE----- iQJGBAABCAAwFiEENUvIs9frKmtoZ05fOHC0AOuRhlMFAl/dynUSHGFybWJydUBy ZWRoYXQuY29tAAoJEDhwtADrkYZT3igP/3bWwsKR5vKVsDUTmMfrhcgaFvQiaYoG F29Bond8Xy0Zd0gl7OWh/5jKL0vGlrEVPrKfYLUjMnfkeRec/pOkIB2oOmIxpnPs 9zi4kh2hQ3dEoRBuvSnnZzedetYPTuCpWMIjlztkgfxgcimqm8TPNVSxRaSApjC3 Y8108wGwBWVf2C0rhKO9E2xA51uo6khy05i1psUtqUlC+PuDQ/OwzQHM2dnWdDB6 kUwBDK17nhL6WwsYqCyKLSiDModReYfDiY8GS5MDLo74dzwXiatEefCR7+sbM4xq eX/SBoqoeS1jLPNuCryNeGNKvNA2KAbEJTnbQA2NxBXHgZ9/1SxVZFxuPp4nDMSQ N7BDuDI8YtJE479RjT/ZzRG65xadGBSe/HXkXM9mZwh1zitop8SVZ9fArFBHvNzw Y5zAv3fQd54+87psffg4dYFK0wGmqTabLEEuVzM8KIVqcAdYA2yC2b2EHy+vsxuq GMkr0WaA6Sq2gthXmzdTjmUPuHdan/NIhuV6d66SbPNH2oH31piptFxuznyFWSKV isciFFdUrkg5QrF8DSt2nmdwMFf8QGbszqP8QIGMzhJCCS9GXIiGG8f149++q8X8 HO1lFAdLQJdrDwCYmfx36tOvi2rS/rcoTGgvg66UX3xKko1ruoxR1ZWcS54obJN6 vEQDZ+PxubDg =vGLy -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/armbru/tags/pull-qapi-2020-12-19' into staging QAPI patches patches for 2020-12-19 # gpg: Signature made Sat 19 Dec 2020 09:40:05 GMT # gpg: using RSA key 354BC8B3D7EB2A6B68674E5F3870B400EB918653 # gpg: issuer "armbru@redhat.com" # gpg: Good signature from "Markus Armbruster <armbru@redhat.com>" [full] # gpg: aka "Markus Armbruster <armbru@pond.sub.org>" [full] # Primary key fingerprint: 354B C8B3 D7EB 2A6B 6867 4E5F 3870 B400 EB91 8653 * remotes/armbru/tags/pull-qapi-2020-12-19: (33 commits) qobject: Make QString immutable block: Use GString instead of QString to build filenames keyval: Use GString to accumulate value strings json: Use GString instead of QString to accumulate strings migration: Replace migration's JSON writer by the general one qobject: Factor JSON writer out of qobject_to_json() qobject: Factor quoted_str() out of to_json() qobject: Drop qstring_get_try_str() qobject: Drop qobject_get_try_str() Revert "qobject: let object_property_get_str() use new API" block: Avoid qobject_get_try_str() qmp: Fix tracing of non-string command IDs qobject: Move internals to qobject-internal.h hw/rdma: Replace QList by GQueue Revert "qstring: add qstring_free()" qobject: Change qobject_to_json()'s value to GString qobject: Use GString instead of QString to accumulate JSON qobject: Make qobject_to_json_pretty() take a pretty argument monitor: Use GString instead of QString for output buffer hmp: Simplify how qmp_human_monitor_command() gets output ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2021-01-01 14:33:03 +00:00
Markus Armbruster	3ddba9a9e9	migration: Replace migration's JSON writer by the general one Commit `8118f0950f` "migration: Append JSON description of migration stream" needs a JSON writer. The existing qobject_to_json() wasn't a good fit, because it requires building a QObject to convert. Instead, migration got its very own JSON writer, in commit `190c882ce2` "QJSON: Add JSON writer". It tacitly limits numbers to int64_t, and strings contents to characters that don't need escaping, unlike qobject_to_json(). The previous commit factored the JSON writer out of qobject_to_json(). Replace migration's JSON writer by it. Cc: Juan Quintela <quintela@redhat.com> Cc: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20201211171152.146877-17-armbru@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-12-19 10:39:16 +01:00
Eric Blake	54aa3de72e	qapi: Use QAPI_LIST_PREPEND() where possible Anywhere we create a list of just one item or by prepending items (typically because order doesn't matter), we can use QAPI_LIST_PREPEND(). But places where we must keep the list in order by appending remain open-coded until later patches. Note that as a side effect, this also performs a cleanup of two minor issues in qga/commands-posix.c: the old code was performing new = g_malloc0(sizeof(*ret)); which 1) is confusing because you have to verify whether 'new' and 'ret' are variables with the same type, and 2) would conflict with C++ compilation (not an actual problem for this file, but makes copy-and-paste harder). Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20201113011340.463563-5-eblake@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> [Straightforward conflicts due to commit `a8aa94b5f8` "qga: update schema for guest-get-disks 'dependents' field" and commit `a10b453a52` "target/mips: Move mips_cpu_add_definition() from helper.c to cpu.c" resolved. Commit message tweaked.] Signed-off-by: Markus Armbruster <armbru@redhat.com>	2020-12-19 10:20:14 +01:00
Eric Blake	eaedde5255	migration: Refactor migrate_cap_add Instead of taking a list parameter and returning a new head at a distance, just return the new item for the caller to insert into a list via QAPI_LIST_PREPEND. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20201113011340.463563-4-eblake@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>	2020-12-19 10:15:08 +01:00
Eduardo Habkost	ce35e2295e	qdev: Move softmmu properties to qdev-properties-system.h Move the property types and property macros implemented in qdev-properties-system.c to a new qdev-properties-system.h header. Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <20201211220529.2290218-16-ehabkost@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2020-12-18 15:20:17 -05:00
Tuguoyi	36d0fe6516	migration: Don't allow migration if vm is in POSTMIGRATE The following steps will cause qemu assertion failure: - pause vm by executing 'virsh suspend' - create external snapshot of memory and disk using 'virsh snapshot-create-as' - doing the above operation again will cause qemu crash The backtrace looks like: at /build/qemu-5.0/migration/savevm.c:1401 at /build/qemu-5.0/migration/savevm.c:1453 When the first migration completes, bs->open_flags will set BDRV_O_INACTIVE flag by bdrv_inactivate_all(), and during the second migration the bdrv_inactivate_recurse assert that the bs->open_flags is already BDRV_O_INACTIVE enabled which cause crash. As Vladimir suggested, this patch makes migrate_prepare check the state of vm and return error if it is in RUN_STATE_POSTMIGRATE state. Signed-off-by: Tuguoyi <tu.guoyi@h3c.com> Message-Id: <6b704294ad2e405781c38fb38d68c744@h3c.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reported-by: Li Zhang <li.zhang@cloud.ionos.com> Reviewed-by: Pankaj Gupta <pankaj.gupta@cloud.ionos.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-12-18 10:08:25 +00:00
Tuguoyi	2a909dc430	savevm: Delete snapshots just created in case of error bdrv_all_create_snapshot() can fails with some snapshots created, so it's better to delete those snapshots before returns to the caller Signed-off-by: Tuguoyi <tu.guoyi@h3c.com> Message-Id: <1607410416-13563-3-git-send-email-tu.guoyi@h3c.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-12-18 10:08:24 +00:00
Tuguoyi	80ef0586d3	savevm: Remove dead code in save_snapshot() The snapshot in each bs is deleted at the beginning, so there is no need to find the snapshot again. Signed-off-by: Tuguoyi <tu.guoyi@h3c.com> Message-Id: <1607410416-13563-2-git-send-email-tu.guoyi@h3c.com> Reviewed-by: Denis V. Lunev <den@openvz.org> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-12-18 10:08:24 +00:00
Paolo Bonzini	e69d50d621	migration, vl: start migration via qmp_migrate_incoming Make qemu_start_incoming_migration local to migration/migration.c. By using the runstate instead of a separate flag, vl need not do anything to setup deferred incoming migration. qmp_migrate_incoming also does not need the deferred_incoming flag anymore, because "-incoming PROTOCOL" will clear the "once" flag before the main loop starts. Therefore, later invocations of the migrate-incoming command will fail with the existing "The incoming migration has already been started" error message. Reviewed-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-12-10 12:15:14 -05:00
Paolo Bonzini	e0d17dfd22	vl: move various initialization routines out of qemu_init Some very simple initialization routines can be nested in existing subsystem-level functions, do that to simplify qemu_init. Reviewed-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-12-10 12:15:11 -05:00
Chetan Pant	ef19b50d93	migration: Fix Lesser GPL version number There is no "version 2" of the "Lesser" General Public License. It is either "GPL version 2.0" or "Lesser GPL version 2.1". This patch replaces all occurrences of "Lesser GPL version 2" with "Lesser GPL version 2.1" in comment section. Signed-off-by: Chetan Pant <chetan4windows@gmail.com> Message-Id: <20201023123130.19656-1-chetan4windows@gmail.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Thomas Huth <thuth@redhat.com>	2020-11-15 16:43:28 +01:00
Longpeng(Mike)	6ba11211bd	migration: handle CANCELLING state in migration_completion() The following sequence may cause the VM abort during migration: 1. RUN_STATE_RUNNING,MIGRATION_STATUS_ACTIVE 2. before call migration_completion(), we send migrate_cancel QMP command, the state machine is changed to: RUN_STATE_RUNNING,MIGRATION_STATUS_CANCELLING 3. call migration_completion(), and the state machine is switch to: RUN_STATE_RUNNING,MIGRATION_STATUS_COMPLETED 4. call migration_iteration_finish(), because the migration status is COMPLETED, so it will try to set the runstate to POSTMIGRATE, but RUNNING-->POSTMIGRATE is an invalid transition, so abort(). The migration_completion() should not change the migration state to COMPLETED if it is already changed to CANCELLING. Signed-off-by: Longpeng(Mike) <longpeng2@huawei.com> Message-Id: <20201105091726.148-1-longpeng2@huawei.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-11-12 15:52:20 +00:00
Chuan Zheng	9e8424088c	multifd/tls: fix memoryleak of the QIOChannelSocket object when cancelling migration When creating new tls client, the tioc->master will be referenced which results in socket leaking after multifd_save_cleanup if we cancel migration. Fix it by do object_unref() after tls client creation. Suggested-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Chuan Zheng <zhengchuan@huawei.com> Message-Id: <1605104763-118687-1-git-send-email-zhengchuan@huawei.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-11-12 15:52:20 +00:00
Chuan Zheng	a18ed79b19	migration/dirtyrate: simplify includes in dirtyrate.c Remove redundant blank line which is left by Commit `662770af7c`, also take this opportunity to remove redundant includes in dirtyrate.c. Signed-off-by: Chuan Zheng <zhengchuan@huawei.com> Message-Id: <1604030281-112946-1-git-send-email-zhengchuan@huawei.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-11-12 15:52:14 +00:00
Chen Qun	a24292830b	migration: fix uninitialized variable warning in migrate_send_rp_req_pages() After the WITH_QEMU_LOCK_GUARD macro is added, the compiler cannot identify that the statements in the macro must be executed. As a result, some variables assignment statements in the macro may be considered as unexecuted by the compiler. When the -Wmaybe-uninitialized capability is enabled on GCC9,the compiler showed warning: migration/migration.c: In function ‘migrate_send_rp_req_pages’: migration/migration.c:384:8: warning: ‘received’ may be used uninitialized in this function [-Wmaybe-uninitialized] 384 \| if (received) { \| ^ Add a default value for 'received' to prevented the warning. Reported-by: Euler Robot <euler.robot@huawei.com> Signed-off-by: Chen Qun <kuhn.chenqun@huawei.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20201111142203.2359370-6-kuhn.chenqun@huawei.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-11-12 14:49:16 +00:00
Chuan Zheng	a1af605bd5	migration/multifd: fix hangup with TLS-Multifd due to blocking handshake The qemu main loop could hang up forever when we enable TLS+Multifd. The Src multifd_send_0 invokes tls handshake, it sends hello to sever and wait response. However, the Dst main qemu loop has been waiting recvmsg() for multifd_recv_1. Both of Src and Dst main qemu loop are blocking and waiting for reponse which results in hanging up forever. Src: (multifd_send_0) Dst: (multifd_recv_1) multifd_channel_connect migration_channel_process_incoming multifd_tls_channel_connect migration_tls_channel_process_incoming multifd_tls_channel_connect qio_channel_tls_handshake_task qio_channel_tls_handshake gnutls_handshake qio_channel_tls_handshake_task ... qcrypto_tls_session_handshake ... gnutls_handshake ... ... ... recvmsg (Blocking I/O waiting for response) recvmsg (Blocking I/O waiting for response) Fix this by offloadinig handshake work to a background thread. Reported-by: Yan Jin <jinyan12@huawei.com> Suggested-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Chuan Zheng <zhengchuan@huawei.com> Message-Id: <1604643893-8223-1-git-send-email-zhengchuan@huawei.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-11-12 14:35:29 +00:00
Philippe Mathieu-Daudé	af3bbbe984	migration/ram: Fix hexadecimal format string specifier The '%u' conversion specifier is for decimal notation. When prefixing a format with '0x', we want the hexadecimal specifier ('%x'). Inspired-by: Dov Murik <dovmurik@linux.vnet.ibm.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20201103112558.2554390-5-philmd@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-11-12 14:02:41 +00:00
Rao, Lei	b70cb3b485	Reduce the time of checkpoint for COLO we should set ram_bulk_stage to false after ram_state_init, otherwise the bitmap will be unused in migration_bitmap_find_dirty. all pages in ram cache will be flushed to the ram of secondary guest for each checkpoint. Signed-off-by: Lei Rao <lei.rao@intel.com> Signed-off-by: Derek Su <dereksu@qnap.com> Signed-off-by: Zhang Chen <chen.zhang@intel.com> Reviewed-by: Li Zhijian <lizhijian@cn.fujitsu.com> Reviewed-by: Zhang Chen <chen.zhang@intel.com> Signed-off-by: Jason Wang <jasowang@redhat.com>	2020-11-11 16:52:23 +08:00
Peter Xu	5e77343113	migration: Postpone the kick of the fault thread after recover The new migrate_send_rp_req_pages_pending() call should greatly improve destination responsiveness because it will resync faulted address after postcopy recovery. However it is also the 1st place to initiate the page request from the main thread. One thing is overlooked on that migrate_send_rp_message_req_pages() is not designed to be thread-safe. So if we wake the fault thread before syncing all the faulted pages in the main thread, it means they can race. Postpone the wake up operation after the sync of faulted addresses. Fixes: `0c26781c09` ("migration: Sync requested pages after postcopy recovery") Tested-by: Christian Schoenebeck <qemu_oss@crudebyte.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20201102153010.11979-3-peterx@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-11-02 18:25:48 +00:00
Peter Xu	cc5ab87200	migration: Unify reset of last_rb on destination node when recover When postcopy recover happens, we need to reset last_rb after each return of postcopy_pause_fault_thread() because that means we just got the postcopy migration continued. Unify this reset to the place right before we want to kick the fault thread again, when we get the command MIG_CMD_POSTCOPY_RESUME from source. This is actually more than that - because the main thread on destination will now be able to call migrate_send_rp_req_pages_pending() too, so the fault thread is not the only user of last_rb now. Move the reset earlier will allow the first call to migrate_send_rp_req_pages_pending() to use the reset value even if called from the main thread. (NOTE: this is not a real fix to `0c26781c09` mentioned below, however it is just a mark that when picking up `0c26781c09` we'd better have this one too; the real fix will come later) Fixes: `0c26781c09` ("migration: Sync requested pages after postcopy recovery") Tested-by: Christian Schoenebeck <qemu_oss@crudebyte.com> Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20201102153010.11979-2-peterx@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-11-02 18:25:39 +00:00
Kirti Wankhede	3710586caa	qapi: Add VFIO devices migration stats in Migration stats Added amount of bytes transferred to the VM at destination by all VFIO devices Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2020-11-01 12:30:51 -07:00
Peter Maydell	d55450df99	migration pull: 2020-10-26 Another go at Peter's postcopy fixes Cleanups from Bihong Yu and Peter Maydell. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEERfXHG0oMt/uXep+pBRYzHrxb/ecFAl+W9n8ACgkQBRYzHrxb /ef2uRAAqWTFLXuBF8+evEd1mMq2SM3ZYTuc7QKTY3MzAH6J/OMvJbZ112itqWOb iZ5NuuWH4PvzOhlR/PNNf1Yv3hTfv36HinG+OCh6s+6aqVx9yHOAfdBgmJIdYAeg Sk1jx43dvCyN2FwPs31ir3L6mwsrtfkRsS+2FeyrvRoEl4WE9mOoypCft3vdd9Dw zZea0Pw7vIs454D4n1vpJiQtq6B4eSAlQKpTLfQbglpTm4MgqLERzGvpT6hbQXJR eQyTOqRe08viIOZ+oN0B/+RVO6T9jc4Y1bEl2NSak1v4Tf7NNfDkFpLAjFm07V/1 tIhL/NOOsHdzfHQtrZpzKQgwaceb1N5qo0PfxD6/tRf9HlXY54iw6yY75+5c5Y89 UK8VSIYKnM2yXeVDLShxixIr3A1Z+zA41XydDwaLZczjeV7+nwrAXAjO8a+j6Dox zj4IyN2g5elEOmarC8qkvbDZ+TVvA2tookhWVwoz+D8ChYkcRDKP9eoYomfRwg+e NKRFuLBkyVPb0eEhyOV6HqJbMfTLpHneTM94v6HGz8tiK8IlMZfTTnC2Mr5gTXuS /cgOVhsY7+l+pKpxpGJmU3aUCYRk1CuK6MhXgjYEFMh5Siba8s0ZPZVaEm/BUyO1 rD+tVup87xMiJq3xnmLX+opblYE9G+b67hH1KuPc5vZXiSwuTkQ= =OL0Q -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/dgilbert/tags/pull-migration-20201026a' into staging migration pull: 2020-10-26 Another go at Peter's postcopy fixes Cleanups from Bihong Yu and Peter Maydell. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> # gpg: Signature made Mon 26 Oct 2020 16:17:03 GMT # gpg: using RSA key 45F5C71B4A0CB7FB977A9FA90516331EBC5BFDE7 # gpg: Good signature from "Dr. David Alan Gilbert (RH2) <dgilbert@redhat.com>" [full] # Primary key fingerprint: 45F5 C71B 4A0C B7FB 977A 9FA9 0516 331E BC5B FDE7 * remotes/dgilbert/tags/pull-migration-20201026a: migration-test: Only hide error if !QTEST_LOG migration/postcopy: Release fd before going into 'postcopy-pause' migration: Sync requested pages after postcopy recovery migration: Maintain postcopy faulted addresses migration: Introduce migrate_send_rp_message_req_pages() migration: Pass incoming state into qemu_ufd_copy_ioctl() migration: using trace_ to replace DPRINTF migration: Delete redundant spaces migration: Open brace '{' following function declarations go on the next line migration: Do not initialise statics and globals to 0 or NULL migration: Add braces {} for if statement migration: Open brace '{' following struct go on the same line migration: Add spaces around operator migration: Don't use '#' flag of printf format migration: Do not use C99 // comments migration: Drop unused VMSTATE_FLOAT64 support Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2020-10-27 10:25:42 +00:00
Peter Maydell	091e3e3dbc	bitmaps patches for 2020-10-26 - fix infloop on large bitmap granularity - silence compiler warning -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEccLMIrHEYCkn0vOqp6FrSiUnQ2oFAl+WuYYACgkQp6FrSiUn Q2owKwgAhXf5jJEoUzSMFIIczxST9eaokL4Jweb65bp2y4Ii014/kzAnwyU+7B1Y DMniSrpySi1F74zng2eOKijgNDgQ1INm1v5P/qeoMa4UpPwL3eaNQl3FwMUCJMdN WVo3jl1wxzqcRDVUWDWiAMD0C03nzRKI8i2uw4USUDXiW12bLC0YQPzOCofZagld xC1HAnqRMCnK10lj4aoY5jOHQL9R05TLtsEcQCdI73q3c87p+sxI1eOlqJcZw1KC vgb6uMOXQrkTPKfVsUK5bJ+aJh97LHhWb/o/Jj65f5IMbBQzaebRby3m38Pgph2W +AVuxe2FBAo7o+heucbOrrFwAUcofw== =ArcV -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/ericb/tags/pull-bitmaps-2020-10-26' into staging bitmaps patches for 2020-10-26 - fix infloop on large bitmap granularity - silence compiler warning # gpg: Signature made Mon 26 Oct 2020 11:56:54 GMT # gpg: using RSA key 71C2CC22B1C4602927D2F3AAA7A16B4A2527436A # gpg: Good signature from "Eric Blake <eblake@redhat.com>" [full] # gpg: aka "Eric Blake (Free Software Programmer) <ebb9@byu.net>" [full] # gpg: aka "[jpeg image of size 6874]" [full] # Primary key fingerprint: 71C2 CC22 B1C4 6029 27D2 F3AA A7A1 6B4A 2527 436A * remotes/ericb/tags/pull-bitmaps-2020-10-26: migration/block-dirty-bitmap: fix uninitialized variable warning migration/block-dirty-bitmap: fix larger granularity bitmaps Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2020-10-26 22:36:35 +00:00
Peter Xu	d246ea5039	migration/postcopy: Release fd before going into 'postcopy-pause' Logically below race could trigger with the old code: test program migration thread ------------ ---------------- wait_until('postcopy-pause') postcopy_pause() set_state('postcopy-pause') do_postcopy_recover() arm s->to_dst_file with new fd release s->to_dst_file [1] Here [1] could have released the just-installed recoverying channel. Then the migration could hang without really resuming. Instead, it should be very safe to release the fd before setting the state into 'postcopy-pause', because there's no reason for any other thread to touch it during 'postcopy-active'. Dave reported a very rare postcopy recovery hang that the migration-test program waited for the migration to complete in migrate_postcopy_complete(). We do suspect it's the same thing that we're gonna fix here. Hard to tell. However since we've noticed this, fix this irrelevant of the hang report. Cc: Dr. David Alan Gilbert <dgilbert@redhat.com> Cc: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20201021212721.440373-6-peterx@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-10-26 16:15:04 +00:00
Peter Xu	0c26781c09	migration: Sync requested pages after postcopy recovery We synchronize the requested pages right after a postcopy recovery happens. This helps to synchronize the prioritized pages on source so that the faulted threads can be served faster. Reported-by: Xiaohui Li <xiaohli@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20201021212721.440373-5-peterx@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-10-26 16:15:04 +00:00
Peter Xu	8f8bfffcf1	migration: Maintain postcopy faulted addresses Maintain a list of faulted addresses on the destination host for which we're waiting on. This is implemented using a GTree rather than a real list to make sure even there're plenty of vCPUs/threads that are faulting, the lookup will still be fast with O(log(N)) (because we'll do that after placing each page). It should bring a slight overhead, but ideally that shouldn't be a big problem simply because in most cases the requested page list will be short. Actually we did similar things for postcopy blocktime measurements. This patch didn't use that simply because: (1) blocktime measurement is towards vcpu threads only, but here we need to record all faulted addresses, including main thread and external thread (like, DPDK via vhost-user). (2) blocktime measurement will require UFFD_FEATURE_THREAD_ID, but here we don't want to add that extra dependency on the kernel version since not necessary. E.g., we don't need to know which thread faulted on which page, we also don't care about multiple threads faulting on the same page. But we only care about what addresses are faulted so waiting for a page copying from src. (3) blocktime measurement is not enabled by default. However we need this by default especially for postcopy recover. Another thing to mention is that this patch introduced a new mutex to serialize the receivedmap and the page_requested tree, however that serialization does not cover other procedures like UFFDIO_COPY. Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20201021212721.440373-4-peterx@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-10-26 16:15:04 +00:00
Peter Xu	7a267fc49b	migration: Introduce migrate_send_rp_message_req_pages() This is another layer wrapper for sending a page request to the source VM. The new migrate_send_rp_message_req_pages() will be used elsewhere in coming patches. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20201021212721.440373-3-peterx@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-10-26 16:15:04 +00:00
Peter Xu	eef621c4e6	migration: Pass incoming state into qemu_ufd_copy_ioctl() It'll be used in follow up patches to access more fields out of it. Meanwhile fetch the userfaultfd inside the function. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20201021212721.440373-2-peterx@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-10-26 16:15:04 +00:00
Bihong Yu	fe80c0241d	migration: using trace_ to replace DPRINTF Signed-off-by: Bihong Yu <yubihong@huawei.com> Message-Id: <1603179176-5360-1-git-send-email-yubihong@huawei.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-10-26 16:15:04 +00:00
Bihong Yu	0bcae62333	migration: Delete redundant spaces Signed-off-by: Bihong Yu <yubihong@huawei.com> Reviewed-by: Chuan Zheng <zhengchuan@huawei.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <1603163448-27122-9-git-send-email-yubihong@huawei.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-10-26 16:15:04 +00:00
Bihong Yu	cbfc71b52b	migration: Open brace '{' following function declarations go on the next line Signed-off-by: Bihong Yu <yubihong@huawei.com> Reviewed-by: Chuan Zheng <zhengchuan@huawei.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <1603163448-27122-8-git-send-email-yubihong@huawei.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-10-26 16:15:04 +00:00
Bihong Yu	49324e939c	migration: Do not initialise statics and globals to 0 or NULL Signed-off-by: Bihong Yu <yubihong@huawei.com> Reviewed-by: Chuan Zheng <zhengchuan@huawei.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <1603163448-27122-7-git-send-email-yubihong@huawei.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-10-26 16:15:04 +00:00
Bihong Yu	f4c51a6bfd	migration: Add braces {} for if statement Signed-off-by: Bihong Yu <yubihong@huawei.com> Reviewed-by: Chuan Zheng <zhengchuan@huawei.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <1603163448-27122-6-git-send-email-yubihong@huawei.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-10-26 16:15:04 +00:00
Bihong Yu	f16aee44b4	migration: Open brace '{' following struct go on the same line Signed-off-by: Bihong Yu <yubihong@huawei.com> Reviewed-by: Chuan Zheng <zhengchuan@huawei.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <1603163448-27122-5-git-send-email-yubihong@huawei.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-10-26 16:15:04 +00:00
Bihong Yu	395cb45009	migration: Add spaces around operator Signed-off-by: Bihong Yu <yubihong@huawei.com> Reviewed-by: Chuan Zheng <zhengchuan@huawei.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <1603163448-27122-4-git-send-email-yubihong@huawei.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-10-26 16:15:04 +00:00
Bihong Yu	29fccade10	migration: Don't use '#' flag of printf format Signed-off-by: Bihong Yu <yubihong@huawei.com> Reviewed-by: Chuan Zheng <zhengchuan@huawei.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Message-Id: <1603163448-27122-3-git-send-email-yubihong@huawei.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-10-26 16:15:04 +00:00
Bihong Yu	01371c5821	migration: Do not use C99 // comments Signed-off-by: Bihong Yu <yubihong@huawei.com> Reviewed-by: Chuan Zheng <zhengchuan@huawei.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Message-Id: <1603163448-27122-2-git-send-email-yubihong@huawei.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-10-26 16:15:04 +00:00
Peter Maydell	9fe7ef8b66	migration: Drop unused VMSTATE_FLOAT64 support Commit `ef96e3ae96` in January 2019 removed the last user of the VMSTATE_FLOAT64* macros. These were used by targets which defined their floating point register file as an array of 'float64'. We used to try to maintain a stricter distinction between 'float64' (a type for holding an integer representing an IEEE float) and 'uint64_t', including having a debug option for 'float64' being a struct and supposedly mandatory macros for converting between float64 and uint64_t. We no longer think that's a usefully strong distinction to draw and we allow ourselves to freely assume that float64 really is just a 64-bit integer type, so for new targets we would simply recommend use of the uint64_t type for a floating point register file. The float64 type remains as a useful way of documenting in the type signature of helper functions and the like that they expect to receive an IEEE float from the TCG generated code rather than an arbitrary integer. Since the VMSTATE_FLOAT64* macros have no remaining users and we don't recommend new code uses them, delete them. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Message-Id: <20201022120830.5938-1-peter.maydell@linaro.org> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-10-26 16:15:04 +00:00
Chen Qun	a024890a64	migration/block-dirty-bitmap: fix uninitialized variable warning A default value is provided for the variable 'bitmap_name' to avoid a compiler warning. The compiler showed the warning: migration/block-dirty-bitmap.c:1090:13: warning: ‘bitmap_name’ may be used uninitialized in this function [-Wmaybe-uninitialized] g_strlcpy(s->bitmap_name, bitmap_name, sizeof(s->bitmap_name)); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Reported-by: Euler Robot <euler.robot@huawei.com> Signed-off-by: Chen Qun <kuhn.chenqun@huawei.com> Message-Id: <20201014114430.1898684-1-kuhn.chenqun@huawei.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> [eblake: commit message grammar tweaks] Signed-off-by: Eric Blake <eblake@redhat.com>	2020-10-26 06:56:24 -05:00
Stefan Reiter	ed7b70c27b	migration/block-dirty-bitmap: fix larger granularity bitmaps sectors_per_chunk is a 64 bit integer, but the calculation is done in 32 bits, leading to an overflow for coarse bitmap granularities. If that results in the value 0, it leads to a hang where no progress is made but send_bitmap_bits is constantly called with nr_sectors being 0. Signed-off-by: Stefan Reiter <s.reiter@proxmox.com> Message-Id: <20201021144456.1072-1-s.reiter@proxmox.com> Fixes: `b35ebdf07` migration: add postcopy migration of dirty bitmaps Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> [eblake: Use correct type for 8ULL, use () to avoid overflow] Signed-off-by: Eric Blake <eblake@redhat.com>	2020-10-26 06:55:37 -05:00
Paolo Bonzini	9f2931bc65	machine: remove deprecated -machine enforce-config-section option Deprecated since 3.1 and complicates the initialization sequence, remove it. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-10-26 07:08:39 -04:00
Philippe Mathieu-Daudé	28af9ba260	qapi: Restrict Xen migration commands to migration.json Restricting xen-set-global-dirty-log and xen-load-devices-state commands migration.json pulls slightly less QAPI-generated code into user-mode and tools. Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20201012121536.3381997-6-philmd@redhat.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>	2020-10-21 05:00:44 +02:00
Peter Maydell	96292515c0	Trivial Patches Pull request 20201013 -----BEGIN PGP SIGNATURE----- iQJGBAABCAAwFiEEzS913cjjpNwuT1Fz8ww4vT8vvjwFAl+FlGcSHGxhdXJlbnRA dml2aWVyLmV1AAoJEPMMOL0/L748eJgQALXBL9j942qnCFfwntG/lm1ZABAjHxMb 6oRt7J6+iKImOuZrZ8m87ET90sO+dPl3u+L9mq7BfaM2vewarwSSIYFrfyTO2Acf sFzKCfBC/mJJBzI0AqvV3caHGCxFhhzCxPO25JPC2yxgyHxTvG3k0krs5C6Wv9BF nHQPE9PZHguLpvJSH8Wmr70rdCbAOHwaIxkVIN9au/1nVktvPk9vPSjZVFpoMQRA gwIRb+Lo0Chqb9DY2Ino/0AFAMV8CbfopLZt8r8pg3mGFdh2U/KEkmtxDTYdVdbr d2LDAhiNP7C9SNRF7VyFcW21YpWOjWG+vzYnPl5KKces1fAbrseTD8fcNrqf5JZc ont2DdpmGZ+reE3ekyoT2YBk4tz3wGtCDoN19QAwFIvVYRyZW52HLg5zCNb2hq4T 1/J5BQvhPxpbY7hFN6QkQa6i2e6EB2kqtrL/H3pjrw8CLAhQ2ZviCZvyLRpv26w/ OzY5+u2GMdo27+EBqxkbpgZO86GWAhPPzqq4Rnd+wMcttTIHbyqALHIdAaqKz29T 4fVF/nULBJcZL8srz+QdU7xlW9ETbZ+fyTWEo0ZZZnntNlD4ZzQB7HVt8d/kNoC6 wR0I+gt/QAvlPIIP8pWa3MXKYcFNNXGWH2p0WK+jNUDVk7NnNkYkgBQigLPZw973 uOHb/cR9lyrN =qJdy -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/vivier2/tags/trivial-branch-for-5.2-pull-request' into staging Trivial Patches Pull request 20201013 # gpg: Signature made Tue 13 Oct 2020 12:49:59 BST # gpg: using RSA key CD2F75DDC8E3A4DC2E4F5173F30C38BD3F2FBE3C # gpg: issuer "laurent@vivier.eu" # gpg: Good signature from "Laurent Vivier <lvivier@redhat.com>" [full] # gpg: aka "Laurent Vivier <laurent@vivier.eu>" [full] # gpg: aka "Laurent Vivier (Red Hat) <lvivier@redhat.com>" [full] # Primary key fingerprint: CD2F 75DD C8E3 A4DC 2E4F 5173 F30C 38BD 3F2F BE3C * remotes/vivier2/tags/trivial-branch-for-5.2-pull-request: meson.build: drop duplicate 'sparc64' entry mingw: fix error __USE_MINGW_ANSI_STDIO redefined target/sparc/int32_helper: Remove duplicated 'Tag Overflow' entry goldfish_rtc: change MemoryRegionOps endianness to DEVICE_NATIVE_ENDIAN hw/char/serial: remove duplicate .class_init in serial_mm_info block/blkdebug: fix memory leak hw/pci: Fix typo in PCI hot-plug error message softmmu/memory: Log invalid memory accesses hw/acpi/piix4: Rename piix4_pm_add_propeties() to piix4_pm_add_properties() vmdk: fix maybe uninitialized warnings tests/test-char: Use a proper fallthrough comment hw/block/nvme: Simplify timestamp sum target/i386/cpu: Update comment that mentions Texinfo qemu-img-cmds.hx: Update comment that mentions Texinfo Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2020-10-13 14:06:22 +01:00
Marc-André Lureau	662770af7c	mingw: fix error __USE_MINGW_ANSI_STDIO redefined Always put osdep.h first, and remove redundant stdlib.h include. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Stefan Weil <sw@weilnetz.de> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-Id: <20201008165953.884599-1-marcandre.lureau@redhat.com> Signed-off-by: Laurent Vivier <laurent@vivier.eu>	2020-10-13 13:33:46 +02:00
Philippe Mathieu-Daudé	7e6edef3f8	migration: Move the creation of the library to the main meson.build Be consistent creating all the libraries in the main meson.build file. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20201006125602.2311423-6-philmd@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-10-12 11:50:20 -04:00
Chuan Zheng	b1a859cfb0	migration/dirtyrate: present dirty rate only when querying the rate has completed Make dirty_rate field optional, present dirty rate only when querying the rate has completed. The qmp results is shown as follow: @unstarted: {"return":{"status":"unstarted","start-time":0,"calc-time":0},"id":"libvirt-12"} @measuring: {"return":{"status":"measuring","start-time":102931,"calc-time":1},"id":"libvirt-85"} @measured: {"return":{"status":"measured","dirty-rate":4,"start-time":150146,"calc-time":1},"id":"libvirt-15"} Signed-off-by: Chuan Zheng <zhengchuan@huawei.com> Reviewed-by: David Edmondson <david.edmondson@oracle.com> Message-Id: <1601350938-128320-3-git-send-email-zhengchuan@huawei.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-10-12 12:39:38 +01:00
Chuan Zheng	aa84b506f7	migration/dirtyrate: record start_time and calc_time while at the measuring state Querying could include both the start-time and the calc-time while at the measuring state, allowing a caller to determine when they should expect to come back looking for a result. Signed-off-by: Chuan Zheng <zhengchuan@huawei.com> Message-Id: <1601350938-128320-2-git-send-email-zhengchuan@huawei.com> Reviewed-by: David Edmondson <david.edmondson@oracle.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-10-12 12:39:38 +01:00
Peter Maydell	e1c30c43cd	Error reporting patches for 2020-10-09 -----BEGIN PGP SIGNATURE----- iQJFBAABCAAwFiEENUvIs9frKmtoZ05fOHC0AOuRhlMFAl+ABv0SHGFybWJydUBy ZWRoYXQuY29tAAoJEDhwtADrkYZT+5EP9ibVF/L1Nu7jnqMt1kRn1BU308XUoGL7 2hAZ7ys+agQd6DUttVBSwi7apZueXznV32E27FggncvCWqh2PeHn+6oCdBpSIKEV 8AX8igui8DNsh0xd2l1JjVj0qNWYU1ZK9vTsMXdP3Ha+eqzDpoINOQKWtLYGKAxX mefk6wldGo2bvSw0kkXQ6fgYHz0stztKF1YCpTmktqZIiLEvLZ+clKCsTWXOpGZc sJhZnMBGejETNrPMevwhcy+BWAOp6k14fFlM4adfJHbwxkLYv2jr36MPAqAo894w KJK7tYWhGdrFSvx6e6jMdoRynJ8R5uHbw2Z4Xadx8VT3h+I+hm4AXjAiIdWIxlGn lNxzGJPvhXJC0uEOOeQthL+//IGbbkvo7dvEMLirZf6IT/Lbcyp5p1eyv409ShwQ KL3OOYRj3YMxDKe/Vxdt3B9Q/B2NXcQmuCF29eZMa1RwCQuEqHIXfpIy7HAQNPxH bAlGa+THnHrdWeir6F1tNABFjwDMHLep+HlnikCEL1DB25iDVKloary77VL2GSt3 A4BQsikYwyUa7Cvok5r/rUgQ4akSegt28s6VQlIcocSf9tf91wTo2OaVWEXbDXdQ M7V27usT38x0qiZkUvajdfZ1erfXZ7p3/xnJmdg3BtaiB83VOEjz7VT2P5+beF7Y HtDs58b+wcE= =e5I5 -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/armbru/tags/pull-error-2020-10-09' into staging Error reporting patches for 2020-10-09 # gpg: Signature made Fri 09 Oct 2020 07:45:17 BST # gpg: using RSA key 354BC8B3D7EB2A6B68674E5F3870B400EB918653 # gpg: issuer "armbru@redhat.com" # gpg: Good signature from "Markus Armbruster <armbru@redhat.com>" [full] # gpg: aka "Markus Armbruster <armbru@pond.sub.org>" [full] # Primary key fingerprint: 354B C8B3 D7EB 2A6B 6867 4E5F 3870 B400 EB91 8653 * remotes/armbru/tags/pull-error-2020-10-09: error: Use error_fatal to simplify obvious fatal errors (again) error: Remove NULL checks on error_propagate() calls (again) Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2020-10-09 14:47:45 +01:00
Markus Armbruster	2155ceaf25	error: Remove NULL checks on error_propagate() calls (again) Patch created mechanically by rerunning: $ spatch --sp-file scripts/coccinelle/error_propagate_null.cocci \ --macro-file scripts/cocci-macro-file.h \ --use-gitgrep . Cc: Jens Freimann <jfreimann@redhat.com> Cc: Hailiang Zhang <zhang.zhanghailiang@huawei.com> Cc: Juan Quintela <quintela@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20200722084048.1726105-4-armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2020-10-09 08:36:23 +02:00
Kevin Wolf	947e47448d	monitor: Use getter/setter functions for cur_mon cur_mon really needs to be coroutine-local as soon as we move monitor command handlers to coroutines and let them yield. As a first step, just remove all direct accesses to cur_mon so that we can implement this in the getter function later. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20201005155855.256490-4-kwolf@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>	2020-10-09 07:08:19 +02:00
Pavel Dovgalyuk	f9a9fb6516	replay: flush rr queue before loading the vmstate Non-empty record/replay queue prevents saving and loading the VM state, because it includes pending bottom halves and block coroutines. But when the new VM state is loaded, we don't have to preserve the consistency of the current state anymore. Therefore this patch just flushes the queue allowing the coroutines to finish and removes checking for empty rr queue for load_snapshot function. Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgalyuk@ispras.ru> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Message-Id: <160174521762.12451.15752448887893855757.stgit@pasha-ThinkPad-X280> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-10-06 08:34:49 +02:00
Pavel Dovgalyuk	b39847a505	migration: introduce icount field for snapshots Saving icount as a parameters of the snapshot allows navigation between them in the execution replay scenario. This information can be used for finding a specific snapshot for proceeding the recorded execution to the specific moment of the time. E.g., 'reverse step' action (introduced in one of the following patches) needs to load the nearest snapshot which is prior to the current moment of time. This patch also updates snapshot test which verifies qemu monitor output. Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgalyuk@ispras.ru> Acked-by: Markus Armbruster <armbru@redhat.com> Acked-by: Kevin Wolf <kwolf@redhat.com> -- v4 changes: - squashed format update with test output update v7 changes: - introduced the spaces between the fields in snapshot info output - updated the test to match new field widths Message-Id: <160174518865.12451.14327573383978752463.stgit@pasha-ThinkPad-X280> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-10-06 08:34:49 +02:00
Thomas Huth	043c2c1a5d	migration: Silence compiler warning in global_state_store_running() GCC 9.3.0 on Ubuntu complains: In file included from /usr/include/string.h:495, from /home/travis/build/huth/qemu/include/qemu/osdep.h:87, from ../migration/global_state.c:13: In function ‘strncpy’, inlined from ‘global_state_store_running’ at ../migration/global_state.c:47:5: /usr/include/x86_64-linux-gnu/bits/string_fortified.h:106:10: error: ‘__builtin_strncpy’ specified bound 100 equals destination size [-Werror=stringop-truncation] 106 \| return __builtin___strncpy_chk (__dest, __src, __len, __bos (__dest)); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ... but we apparently really want to do a strncpy here - the size is already checked with the assert() statement right in front of it. To silence the warning, simply replace it with our strpadcpy() function. Suggested-by: Philippe Mathieu-Daudé <philmd@redhat.com> (two years ago) Signed-off-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200918103430.297167-4-thuth@redhat.com> Message-Id: <20200925154027.12672-5-alex.bennee@linaro.org>	2020-10-02 12:28:48 +01:00
Peter Maydell	23290e8070	Migration: Revert one patch for 068 fix One patch in the last pull broke test 068 which does a pair of vmload's. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEERfXHG0oMt/uXep+pBRYzHrxb/ecFAl9x/vUACgkQBRYzHrxb /edt6Q/9E7P3CTZWy2JHKcHPY4rV4NfJEP/bY947Z7nd2+jaCx7ObLOGtWpuzd+i +4NsuuI54DeYaK6HNZPC84KTxBkF8P70lh4EUkzTdn8sDrrYDcmDx6JmdxnAwGQ7 GNS5Uzi/BUVJlw+IkKxj/nIlZBjc3EUr7uYboqIKKArun/4z/hflndYtzB7lLxig DkVBTbtMoE0LhRt6tx9s3GzqC0vZPpISZF/mPEOptlsx09oWqfzilyenrCfB+icE ZuCzKeVkpWer10sWj2xbaI+Ep9wb5Q81qvtIoIcqRspbGz4XENwqHAuDWcpnniEL nlhjunziLIzAbHiis/FnLUUzPs/Q/Q2eRZc3MziSoU7AKHZ/2MCdYRph427tR8Q5 xE3V8UHKjAGnigO88InAp2fS8Z///p+c49Xx4zJaXAQTGzhbTni5ng0kcakYWbp+ SNpL8Yto38OfqdIe8cBy+At5DeXfkBsy50vIvSHKcG01DRt2Ty8G0Ai23AImTNgJ zt+Q6y09TuRF07Ao48wMVYW4XRcXmW8eVZzWFHPBBG65AfeC6zGCqzX2VmPlV2fm qGjv8zINnDlaFX9j/z3Z4zdf66LQcMpD6IFB+POlcyEG22MtHSxfXQDGNKSNplSr v1ae9wwl5wXRAXSrvIlQhyw/VVYOdD3ZGmgGnXnzLikp+h3NrSY= =eW0Z -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/dgilbert/tags/pull-migration-20200928a' into staging Migration: Revert one patch for 068 fix One patch in the last pull broke test 068 which does a pair of vmload's. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> # gpg: Signature made Mon 28 Sep 2020 16:19:17 BST # gpg: using RSA key 45F5C71B4A0CB7FB977A9FA90516331EBC5BFDE7 # gpg: Good signature from "Dr. David Alan Gilbert (RH2) <dgilbert@redhat.com>" [full] # Primary key fingerprint: 45F5 C71B 4A0C B7FB 977A 9FA9 0516 331E BC5B FDE7 * remotes/dgilbert/tags/pull-migration-20200928a: Revert "migration: Properly destroy variables on incoming side" Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2020-09-29 10:29:46 +01:00
Dr. David Alan Gilbert	1783c00fc9	Revert "migration: Properly destroy variables on incoming side" This reverts commit `c02039a6f3`. This is breaking test 068 that does a loadvm twice. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-09-28 16:18:02 +01:00
Peter Maydell	92d0950267	Trivial Patches Pull request 20200928 -----BEGIN PGP SIGNATURE----- iQJGBAABCAAwFiEEzS913cjjpNwuT1Fz8ww4vT8vvjwFAl9xqZQSHGxhdXJlbnRA dml2aWVyLmV1AAoJEPMMOL0/L748nFQQAJL/5ydfeo+t9RVTfatMI6bRP+loqQzc Y/t/xYgOfCQsMjXRU5HHNre+FtltqP0lVU2Ey2zAon7MjJDXfQOCv6aOK+vsOGXi FITC6vZ050VrVy7iPfXaJR5aIbkJme4NXLgJj9mqaFZELgoTAMCuCkGN+km3n/Uw pf0lI43VSkLt3pvHvGjy2UT51OjH6/LxaXcgY2w67nIH+KcLgxdh7fXlD+5Gxdug 458gbMbtqAPb6qNV7jBbrgUMRx5hpUKoa5QvL0DWkIsboemPJGsTlw0nhON5ZPQ7 XYNPyb9ELPYE5V8I9Ki+ESzsWFMVMdu0Hj/MNbnSDdg+2uR0xgCSIRnnuEwpSmLB jbhKa8b3B3nPFFNQAqQ6FPpOW76PpwFVKRAPT3p3rnDqrtXLUJpGdGzTz8ltMZry pOxdbSkuEl+79D1i5Lt5mfCqRNqOjYk1awPO4K/JdmhJxk9dFmY5X21edaIe6lN+ GiZlE43fF+GM3HelplnTCIwlRAjhUX/PRSDBkeLuPYj0qFob27MFauMcsGvC28FI CQIY3CmIFCmzf8c1DUE3TVYWpJj0e+OnKU02D89/FF4M4TOGTa1/CtpNcpSDgE5A TCEw4cyEG1LEvDtw4DRdLBKVDnFW8XiUPz2xVC87/dZSC88CTllLnxtsaSfDHuui 0Y0BBZ3MJxgs =t5Q1 -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/vivier2/tags/trivial-branch-for-5.2-pull-request' into staging Trivial Patches Pull request 20200928 # gpg: Signature made Mon 28 Sep 2020 10:15:00 BST # gpg: using RSA key CD2F75DDC8E3A4DC2E4F5173F30C38BD3F2FBE3C # gpg: issuer "laurent@vivier.eu" # gpg: Good signature from "Laurent Vivier <lvivier@redhat.com>" [full] # gpg: aka "Laurent Vivier <laurent@vivier.eu>" [full] # gpg: aka "Laurent Vivier (Red Hat) <lvivier@redhat.com>" [full] # Primary key fingerprint: CD2F 75DD C8E3 A4DC 2E4F 5173 F30C 38BD 3F2F BE3C * remotes/vivier2/tags/trivial-branch-for-5.2-pull-request: docs/system/deprecated: Move lm32 and unicore32 to the right section migration/multifd: Remove superfluous semicolons timer: Fix timer_mod_anticipate() documentation vhost-vdpa: remove useless variable Add *.pyc back to the .gitignore file virtio: vdpa: omit check return of g_malloc meson: fix static flag summary vhost-vdpa: fix indentation in vdpa_ops Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2020-09-28 14:03:09 +01:00
Chuan Zheng	894f021411	migration/tls: add trace points for multifd-tls add trace points for multifd-tls for debug. Signed-off-by: Chuan Zheng <zhengchuan@huawei.com> Signed-off-by: Yan Jin <jinyan12@huawei.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <1600139042-104593-7-git-send-email-zhengchuan@huawei.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-09-25 12:45:58 +01:00
Chuan Zheng	2964714015	migration/tls: add support for multifd tls-handshake Similar like migration main thread, we need to do handshake for each multifd thread. Signed-off-by: Chuan Zheng <zhengchuan@huawei.com> Signed-off-by: Yan Jin <jinyan12@huawei.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <1600139042-104593-6-git-send-email-zhengchuan@huawei.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-09-25 12:45:58 +01:00
Chuan Zheng	03c7a42d0d	migration/tls: extract cleanup function for common-use multifd channel cleanup is need if multifd handshake failed, let's extract it. Signed-off-by: Chuan Zheng <zhengchuan@huawei.com> Signed-off-by: Yan Jin <jinyan12@huawei.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <1600139042-104593-5-git-send-email-zhengchuan@huawei.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-09-25 12:45:58 +01:00
Chuan Zheng	8e5fa05932	migration/tls: add tls_hostname into MultiFDSendParams Since multifd creation is async with migration_channel_connect, we should pass the hostname from MigrationState to MultiFDSendParams. Signed-off-by: Chuan Zheng <zhengchuan@huawei.com> Signed-off-by: Yan Jin <jinyan12@huawei.com> Message-Id: <1600139042-104593-4-git-send-email-zhengchuan@huawei.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-09-25 12:45:58 +01:00
Chuan Zheng	bfb790e7b2	migration/tls: extract migration_tls_client_create for common-use migration_tls_client_create will be used in multifd-tls, let's extract it. Signed-off-by: Chuan Zheng <zhengchuan@huawei.com> Signed-off-by: Yan Jin <jinyan12@huawei.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <1600139042-104593-3-git-send-email-zhengchuan@huawei.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-09-25 12:45:58 +01:00
Chuan Zheng	d8053e73fb	migration/tls: save hostname into MigrationState hostname is need in multifd-tls, save hostname into MigrationState. Signed-off-by: Chuan Zheng <zhengchuan@huawei.com> Signed-off-by: Yan Jin <jinyan12@huawei.com> Message-Id: <1600139042-104593-2-git-send-email-zhengchuan@huawei.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-09-25 12:45:58 +01:00
Laurent Vivier	7590a2ae09	migration: increase max-bandwidth to 128 MiB/s (1 Gib/s) max-bandwidth is set by default to 32 MiB/s (256 Mib/s) since 2008 (`5bb7910af0`). Most of the CPUs can dirty memory faster than that now, and this is clearly a problem with POWER where the page size is 64 kiB and not 4 KiB. Signed-off-by: Laurent Vivier <lvivier@redhat.com> Message-Id: <20200921144957.979989-1-lvivier@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-09-25 12:45:58 +01:00
Dov Murik	b4deb9bf8d	migration: Truncate state file in xen-save-devices-state When running the xen-save-devices-state QMP command, if the filename already exists it will be truncated before dumping the devices' state into it. Signed-off-by: Dov Murik <dovmurik@linux.vnet.ibm.com> Message-Id: <20200921094830.114028-1-dovmurik@linux.vnet.ibm.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-09-25 12:45:58 +01:00
Chuan Zheng	3c0b5dffc1	migration/dirtyrate: Add trace_calls to make it easier to debug Add trace_calls to make it easier to debug Signed-off-by: Chuan Zheng <zhengchuan@huawei.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: David Edmondson <david.edmondson@oracle.com> Message-Id: <1600237327-33618-13-git-send-email-zhengchuan@huawei.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-09-25 12:45:58 +01:00
Chuan Zheng	4c437254b8	migration/dirtyrate: Implement qmp_cal_dirty_rate()/qmp_get_dirty_rate() function Implement qmp_cal_dirty_rate()/qmp_get_dirty_rate() function which could be called Signed-off-by: Chuan Zheng <zhengchuan@huawei.com> Message-Id: <1600237327-33618-12-git-send-email-zhengchuan@huawei.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> atomic function fixup Wording fixup in migration.json based on Eric's review	2020-09-25 12:45:58 +01:00
Chuan Zheng	cf0bbb49d8	migration/dirtyrate: Implement calculate_dirtyrate() function Implement calculate_dirtyrate() function. Signed-off-by: Chuan Zheng <zhengchuan@huawei.com> Signed-off-by: YanYing Zhuang <ann.zhuangyanying@huawei.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Li Qiang <liq3ea@gmail.com> Message-Id: <1600237327-33618-11-git-send-email-zhengchuan@huawei.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-09-25 12:45:58 +01:00
Chuan Zheng	eca582249c	migration/dirtyrate: Implement set_sample_page_period() and is_sample_period_valid() Implement is_sample_period_valid() to check if the sample period is vaild and do set_sample_page_period() to sleep specific time between sample actions. Signed-off-by: Chuan Zheng <zhengchuan@huawei.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: David Edmondson <david.edmondson@oracle.com> Reviewed-by: Li Qiang <liq3ea@gmail.com> Message-Id: <1600237327-33618-10-git-send-email-zhengchuan@huawei.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-09-25 12:45:58 +01:00
Chuan Zheng	f82583cdc0	migration/dirtyrate: skip sampling ramblock with size below MIN_RAMBLOCK_SIZE In order to sample real RAM, skip ramblock with size below MIN_RAMBLOCK_SIZE which is set as 128M. Signed-off-by: Chuan Zheng <zhengchuan@huawei.com> Reviewed-by: David Edmondson <david.edmondson@oracle.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Li Qiang <liq3ea@gmail.com> Message-Id: <1600237327-33618-9-git-send-email-zhengchuan@huawei.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-09-25 12:45:57 +01:00
Chuan Zheng	9c04387b88	migration/dirtyrate: Compare page hash results for recorded sampled page Compare page hash results for recorded sampled page. Signed-off-by: Chuan Zheng <zhengchuan@huawei.com> Signed-off-by: YanYing Zhuang <ann.zhuangyanying@huawei.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Li Qiang <liq3ea@gmail.com> Message-Id: <1600237327-33618-8-git-send-email-zhengchuan@huawei.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-09-25 12:45:57 +01:00
Chuan Zheng	ba0e519f95	migration/dirtyrate: Record hash results for each sampled page Record hash results for each sampled page, crc32 is taken to calculate hash results for each sampled length in TARGET_PAGE_SIZE. Signed-off-by: Chuan Zheng <zhengchuan@huawei.com> Signed-off-by: YanYing Zhuang <ann.zhuangyanying@huawei.com> Reviewed-by: David Edmondson <david.edmondson@oracle.com> Reviewed-by: Li Qiang <liq3ea@gmail.com> Message-Id: <1600237327-33618-7-git-send-email-zhengchuan@huawei.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-09-25 12:45:57 +01:00
Chuan Zheng	3ded54b1bd	migration/dirtyrate: move RAMBLOCK_FOREACH_MIGRATABLE into ram.h RAMBLOCK_FOREACH_MIGRATABLE is need in dirtyrate measure, move the existing definition up into migration/ram.h Signed-off-by: Chuan Zheng <zhengchuan@huawei.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: David Edmondson <david.edmondson@oracle.com> Reviewed-by: Li Qiang <liq3ea@gmail.com> Message-Id: <1600237327-33618-6-git-send-email-zhengchuan@huawei.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-09-25 12:45:57 +01:00
Chuan Zheng	c9a58d719b	migration/dirtyrate: Add dirtyrate statistics series functions Add dirtyrate statistics functions to record/update dirtyrate info. Signed-off-by: Chuan Zheng <zhengchuan@huawei.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Li Qiang <liq3ea@gmail.com> Message-Id: <1600237327-33618-5-git-send-email-zhengchuan@huawei.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-09-25 12:45:57 +01:00
Chuan Zheng	a2635f0a75	migration/dirtyrate: Add RamblockDirtyInfo to store sampled page info Add RamblockDirtyInfo to store sampled page info of each ramblock. Signed-off-by: Chuan Zheng <zhengchuan@huawei.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: David Edmondson <david.edmondson@oracle.com> Reviewed-by: Li Qiang <liq3ea@gmail.com> Message-Id: <1600237327-33618-4-git-send-email-zhengchuan@huawei.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-09-25 12:45:57 +01:00
Chuan Zheng	7df3aa3083	migration/dirtyrate: add DirtyRateStatus to denote calculation status add DirtyRateStatus to denote calculating status. Signed-off-by: Chuan Zheng <zhengchuan@huawei.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Li Qiang <liq3ea@gmail.com> Message-Id: <1600237327-33618-3-git-send-email-zhengchuan@huawei.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> atomic name fixup	2020-09-25 12:45:57 +01:00
Chuan Zheng	4240dceeb3	migration/dirtyrate: setup up query-dirtyrate framwork Add get_dirtyrate_thread() functions to setup query-dirtyrate framework. Signed-off-by: Chuan Zheng <zhengchuan@huawei.com> Signed-off-by: YanYing Zhuang <ann.zhuangyanying@huawei.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: David Edmondson <david.edmondson@oracle.com> Reviewed-by: Li Qiang <liq3ea@gmail.com> Message-Id: <1600237327-33618-2-git-send-email-zhengchuan@huawei.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-09-25 12:45:57 +01:00
Peter Xu	2e2bce167e	migration: Rework migrate_send_rp_req_pages() function We duplicated the logic of maintaining the last_rb variable at both callers of this function. Pass *rb pointer into the function so that we can avoid duplicating the logic. Also, when we have the rb pointer, it's also easier to remove the original 2nd & 4th parameters, because both of them (name of the ramblock when needed, or the page size) can be fetched from the ramblock pointer too. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20200908203022.341615-3-peterx@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-09-25 11:11:01 +01:00
Peter Xu	c02039a6f3	migration: Properly destroy variables on incoming side In migration_incoming_state_destroy(), we've got a few variables that aren't destroyed properly, namely: main_thread_load_event postcopy_pause_sem_dst postcopy_pause_sem_fault rp_mutex Destroy them properly. Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20200908203022.341615-2-peterx@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-09-25 11:11:01 +01:00
Philippe Mathieu-Daudé	df55509470	migration/multifd: Remove superfluous semicolons checkpatch.pl report superfluous semicolons since commit `ee0f3c09e0`, but this one was missed: scripts/checkpatch.pl d32ca5ad798~..d32ca5ad798 ERROR: superfluous trailing semicolon #498: FILE: migration/multifd.c:308: + ram_counters.transferred += transferred;; total: 1 errors, 1 warnings, 2073 lines checked Fixes: `d32ca5ad79` ("multifd: Split multifd code into its own file") Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Li Qiang <liq3ea@gmail.com> Message-Id: <20200921040231.437653-1-f4bug@amsat.org> Signed-off-by: Laurent Vivier <laurent@vivier.eu>	2020-09-23 19:13:56 +02:00
Stefan Hajnoczi	d73415a315	qemu/atomic.h: rename atomic_ to qatomic_ clang's C11 atomic_fetch_() functions only take a C11 atomic type pointer argument. QEMU uses direct types (int, etc) and this causes a compiler error when a QEMU code calls these functions in a source file that also included <stdatomic.h> via a system header file: $ CC=clang CXX=clang++ ./configure ... && make ../util/async.c:79:17: error: address argument to atomic operation must be a pointer to _Atomic type ('unsigned int ' invalid) Avoid using atomic_*() names in QEMU's atomic.h since that namespace is used by <stdatomic.h>. Prefix QEMU's APIs with 'q' so that atomic.h and <stdatomic.h> can co-exist. I checked /usr/include on my machine and searched GitHub for existing "qatomic_" users but there seem to be none. This patch was generated using: $ git grep -h -o '\<atomic$64$\?_[a-z0-9_]\+' include/qemu/atomic.h \| \ sort -u >/tmp/changed_identifiers $ for identifier in $(</tmp/changed_identifiers); do sed -i "s%\<$identifier\>%q$identifier%g" \ $(git grep -I -l "\<$identifier\>") done I manually fixed line-wrap issues and misaligned rST tables. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20200923105646.47864-1-stefanha@redhat.com>	2020-09-23 16:07:44 +01:00
Peter Maydell	834b9273d5	Pull request trivial patches 20200919 -----BEGIN PGP SIGNATURE----- iQJGBAABCAAwFiEEzS913cjjpNwuT1Fz8ww4vT8vvjwFAl9mUVcSHGxhdXJlbnRA dml2aWVyLmV1AAoJEPMMOL0/L748c5IP/2Jh7HuM5LpGuhca81zCnUxIHnnfXLpR YXbRsD/q4VrCe9WxFZeyul1zcCpV4BnLNqsWA2PH44at+vcvCuXLU9vVzar1SMTh pAwuXc4qGkV4zttLzzYwkimQLxHl1Cy7RtoLJB7GjLj0A/VBvD7Z2cO2KSF4EOzU KQAHcIm8WYWjZy8lx5ZrCvq5KkPHMK+XvVxD+v/gXVWzU23wFMVJwhzi2PXqetRe RnAFA8tF3xlvXTJmeqqN277Otv6WLnANe1rjr/w4j5tUINaaiAX/gWkrwcFZprjo 1p0E3o8ztrtql7B8DWH+xWLeFUpq3Qd9Ztp4ujFmpWQysbCZ6BWFocAz+v4Dd0F3 luJP0e8X5hQAzJiu9aucOKpnUHaieWamo5J+5pWezTGB0wNYgnhRDp2LAefadV+I WmDjIWtZZ3Je48qT0bGzh+p8ZSqGQx/a5xx6eXr7MdlNhiWIV/evqotU2MoLnO7d QhQevHlk7nxayk3laVA4nTwJRdtEN8zfbuAB+gMZZvR11yBNrBm6q7oMNhkuP0QV glcta70RE7Nfa4TZaFzEzrjiF6V0k0+TtGY0VPB/0xjtCepiwOuoVbEjSe4arJ7Z 1LkGY45Rdaas8yqWwZGAjbFWTkke85v+S8g2lCj/HihgfPf585uRZVPhJ9sIGc9w JcWyaIFsgHh8 =MxMx -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/vivier2/tags/trivial-branch-for-5.2-pull-request' into staging Pull request trivial patches 20200919 # gpg: Signature made Sat 19 Sep 2020 19:43:35 BST # gpg: using RSA key CD2F75DDC8E3A4DC2E4F5173F30C38BD3F2FBE3C # gpg: issuer "laurent@vivier.eu" # gpg: Good signature from "Laurent Vivier <lvivier@redhat.com>" [full] # gpg: aka "Laurent Vivier <laurent@vivier.eu>" [full] # gpg: aka "Laurent Vivier (Red Hat) <lvivier@redhat.com>" [full] # Primary key fingerprint: CD2F 75DD C8E3 A4DC 2E4F 5173 F30C 38BD 3F2F BE3C * remotes/vivier2/tags/trivial-branch-for-5.2-pull-request: contrib/: fix some comment spelling errors qapi/: fix some comment spelling errors disas/: fix some comment spelling errors linux-user/: fix some comment spelling errors util/: fix some comment spelling errors scripts/: fix some comment spelling errors docs/: fix some comment spelling errors migration/: fix some comment spelling errors qemu/: fix some comment spelling errors scripts/git.orderfile: Display meson files along with buildsys ones hw/timer/hpet: Fix debug format strings hw/timer/hpet: Remove unused functions hpet_ram_readb, hpet_ram_readw meson: remove empty else and duplicated gio deps manual: escape backslashes in "parsed-literal" blocks ui/spice-input: Remove superfluous forward declaration hw/ppc/ppc4xx_pci: Replace magic value by the PCI_NUM_PINS definition hw/gpio/max7310: Remove impossible check Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2020-09-22 15:42:23 +01:00
Eduardo Habkost	8063396bf3	Use OBJECT_DECLARE_SIMPLE_TYPE when possible This converts existing DECLARE_INSTANCE_CHECKER usage to OBJECT_DECLARE_SIMPLE_TYPE when possible. $ ./scripts/codeconverter/converter.py -i \ --pattern=AddObjectDeclareSimpleType $(git grep -l '' -- '*.[ch]') Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Acked-by: Paul Durrant <paul@xen.org> Message-Id: <20200916182519.415636-6-ehabkost@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2020-09-18 14:12:32 -04:00
zhaolichang	3a4452d896	migration/: fix some comment spelling errors I found that there are many spelling errors in the comments of qemu, so I used the spellcheck tool to check the spelling errors and finally found some spelling errors in the migration folder. Signed-off-by: zhaolichang <zhaolichang@huawei.com> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Message-Id: <20200917075029.313-3-zhaolichang@huawei.com> Signed-off-by: Laurent Vivier <laurent@vivier.eu>	2020-09-17 20:36:32 +02:00
Peter Maydell	f4ef8c9cc1	QOM boilerplate cleanup Documentation build fix: * memory: Remove kernel-doc comment marker (Eduardo Habkost) QOM cleanups: * Rename QOM macros for consistency between TYPE_* and type checking constants (Eduardo Habkost) QOM new macros: * OBJECT_DECLARE_* and OBJECT_DEFINE_* macros (Daniel P. Berrangé) * DECLARE__CHECKER macros (Eduardo Habkost) Automated QOM boilerplate changes: Automated changes to use DECLARE__CHECKER (Eduardo Habkost Automated changes to use OBJECT_DECLARE* (Eduardo Habkost) -----BEGIN PGP SIGNATURE----- iQJIBAABCAAyFiEEWjIv1avE09usz9GqKAeTb5hNxaYFAl9abc0UHGVoYWJrb3N0 QHJlZGhhdC5jb20ACgkQKAeTb5hNxaYU9Q/8CyK1w2SlItxBhos7zojqnZ9TP1Jt b1YCApQJ+bKSPAUDyefajQA0D9HeR9bFlreiOprQnmZWOqeOvnRIxNGvelJRqRRu KcIA5DIfVMJRkKJQEXairrGdnPmFLWSLEb7AmwxyAhp5G51PCP/3kbudi3T/vrNr OaccUejs5UgImPfO8Fm+0zqZPmblq/xmtU0p77FvDxGNFPPG8ddpu7eKksGD7FYd 5bTJTtUhONYG9EJMUD2TBxnJoy1pi6AYUu4+2T211RpBcxeiyNSSitI8fZTk6BGl 33VwQib9SXjGaE8VsSvHDHhLLec7sqqr2JH3rfvyKF6BOptKWzmSzFdbo2mrRkSy 8jfCImQgTBBMAHBWP+MFTeKuzfhikZx2DbBLzpppHMMvCca6Zc+oYgR2FbVwuPsw H2YL+8Wx4Ws6RXe147toNDRbv75vnS7F3fU800Pcur5VHJWTgSpT/tggzmVPWsdU GeUgceYlXyVk5/fC89ZhhtD9eurfBSzQR4eN7/nie2wD6PFMpZkOjHwLn40uWsyq xRO0F4uYghNU1N8z6NBhEYLTBtEcS1HFEisSLQrnTQH9W0I7mBx3MaZib/uK7NLC b2gT0hossTT8Z46Z8ynoZarwO5EquAMWEQtc9hfZGWacrQEpjVm2DMYMfu83krWb xhgl+mpKqVasAPk= =RjXc -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/ehabkost/tags/machine-next-pull-request' into staging QOM boilerplate cleanup Documentation build fix: * memory: Remove kernel-doc comment marker (Eduardo Habkost) QOM cleanups: * Rename QOM macros for consistency between TYPE_* and type checking constants (Eduardo Habkost) QOM new macros: * OBJECT_DECLARE_* and OBJECT_DEFINE_* macros (Daniel P. Berrangé) * DECLARE__CHECKER macros (Eduardo Habkost) Automated QOM boilerplate changes: Automated changes to use DECLARE__CHECKER (Eduardo Habkost Automated changes to use OBJECT_DECLARE* (Eduardo Habkost) # gpg: Signature made Thu 10 Sep 2020 19:17:49 BST # gpg: using RSA key 5A322FD5ABC4D3DBACCFD1AA2807936F984DC5A6 # gpg: issuer "ehabkost@redhat.com" # gpg: Good signature from "Eduardo Habkost <ehabkost@redhat.com>" [full] # Primary key fingerprint: 5A32 2FD5 ABC4 D3DB ACCF D1AA 2807 936F 984D C5A6 * remotes/ehabkost/tags/machine-next-pull-request: (33 commits) virtio-vga: Use typedef name for instance_size vhost-user-vga: Use typedef name for instance_size xilinx_axienet: Use typedef name for instance_size lpc_ich9: Use typedef name for instance_size omap_intc: Use typedef name for instance_size xilinx_axidma: Use typedef name for instance_size tusb6010: Rename TUSB to TUSB6010 pc87312: Rename TYPE_PC87312_SUPERIO to TYPE_PC87312 vfio: Rename PCI_VFIO to VFIO_PCI usb: Rename USB_SERIAL_DEV to USB_SERIAL sabre: Rename SABRE_DEVICE to SABRE rs6000_mc: Rename RS6000MC_DEVICE to RS6000MC filter-rewriter: Rename FILTER_COLO_REWRITER to FILTER_REWRITER esp: Rename ESP_STATE to ESP ahci: Rename ICH_AHCI to ICH9_AHCI vmgenid: Rename VMGENID_DEVICE to TYPE_VMGENID vfio: Rename VFIO_AP_DEVICE_TYPE to TYPE_VFIO_AP_DEVICE dev-smartcard-reader: Rename CCID_DEV_NAME to TYPE_USB_CCID_DEV ap-device: Rename AP_DEVICE_TYPE to TYPE_AP_DEVICE gpex: Fix type checking function name ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2020-09-11 19:26:51 +01:00
Peter Maydell	2499453eb1	Block layer patches: - qemu-img create: Fail gracefully when backing file is an empty string - Fixes related to filter block nodes ("Deal with filters" series) - block/nvme: Various cleanups required to use multiple queues - block/nvme: Use NvmeBar structure from "block/nvme.h" - file-win32: Fix "locking" option - iotests: Allow running from different directory -----BEGIN PGP SIGNATURE----- iQJFBAABCAAvFiEE3D3rFZqa+V09dFb+fwmycsiPL9YFAl9Z7bcRHGt3b2xmQHJl ZGhhdC5jb20ACgkQfwmycsiPL9ZS6w//bos+A0RfRRF0YFWkIBLQWxqzKcGvMJ8W XWv3mFzd47UaDgRYwVnCC3CR6bLYEINISngZ3geA4jI1+w7AtYKDOO0HN32dUg+D ZrNMn02701CA6qkmpxJ+yjsrl9ltR3jYe0me4Wr39Pvdexa2pl/e+M4Vas6FhkYL ghAwNThypscGCrFjAlz3ru2Sc/K+sPWrGoqkzr+SWvsm9wy4vb8aLxr8Yy50x/zc CqALS9SQ/YA93BCVi9CzPkVyV3ioA0kg/y38WvLtAQ9GZ3m/ekMro3WvdYsRsFCN LGXsuwFig+U7Kd7lJrCS9TLnlTJstNGqPq9jEoV5cThPvGknFfMvVOzRmmP7tzqT YRcPRy39z44OoLKa3kyg3aF38BTxt+9gPqBnivKMr9j9EecMvPsXXHRvF+lP+LsP j753Ih561hX6FurcjX8pc9GOM2cQA0GjlyL77UTTAmLZyFXP/8e55oQbBuYTylc/ Xlvmc/T+yEGiEGTnK+FxgDAiUaxbCCM9cDVStJjTvsIq43dwXb48g1onDsGZ5eDf j9lmAD6TJxHNOB5ErNsDPODf4/D1wJ9t9WVF8UZp9ArfPHRdxMzT7Q4LvetaDmVl +hQC9cgTq8Qd8LwSqbKEYua4L6iGbmLAT7/N6htq5L1eVLg76/tLg/tKSwh/vKAY yzPmyHaVK84= =gaaW -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging Block layer patches: - qemu-img create: Fail gracefully when backing file is an empty string - Fixes related to filter block nodes ("Deal with filters" series) - block/nvme: Various cleanups required to use multiple queues - block/nvme: Use NvmeBar structure from "block/nvme.h" - file-win32: Fix "locking" option - iotests: Allow running from different directory # gpg: Signature made Thu 10 Sep 2020 10:11:19 BST # gpg: using RSA key DC3DEB159A9AF95D3D7456FE7F09B272C88F2FD6 # gpg: issuer "kwolf@redhat.com" # gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" [full] # Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6 * remotes/kevin/tags/for-upstream: (65 commits) block/qcow2-cluster: Add missing "fallthrough" annotation block/nvme: Pair doorbell registers block/nvme: Use generic NvmeBar structure block/nvme: Group controller registers in NVMeRegs structure file-win32: Fix "locking" option iotests: Allow running from different directory iotests: Test committing to overridden backing iotests: Add test for commit in sub directory iotests: Add filter mirror test cases iotests: Add filter commit test cases iotests: Let complete_and_wait() work with commit iotests: Test that qcow2's data-file is flushed block: Leave BDS.backing_{file,format} constant block: Inline bdrv_co_block_status_from_*() blockdev: Fix active commit choice block: Drop backing_bs() qemu-img: Use child access functions nbd: Use CAF when looking for dirty bitmap commit: Deal with filters backup: Deal with filters ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2020-09-11 14:47:49 +01:00
Markus Armbruster	b15e402fc8	trace-events: Fix attribution of trace points to source Some trace points are attributed to the wrong source file. Happens when we neglect to update trace-events for code motion, or add events in the wrong place, or misspell the file name. Clean up with help of scripts/cleanup-trace-events.pl. Funnies requiring manual post-processing: * accel/tcg/cputlb.c trace points are in trace-events. * block.c and blockdev.c trace points are in block/trace-events. * hw/block/nvme.c uses the preprocessor to hide its trace point use from cleanup-trace-events.pl. * hw/tpm/tpm_spapr.c uses pseudo trace point tpm_spapr_show_buffer to guard debug code. * include/hw/xen/xen_common.h trace points are in hw/xen/trace-events. * linux-user/trace-events abbreviates a tedious list of filenames to /signal.c. net/colo-compare and net/filter-rewriter.c use pseudo trace points colo_compare_miscompare and colo_filter_rewriter_debug to guard debug code. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20200806141334.3646302-5-armbru@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-09-09 17:17:58 +01:00
Markus Armbruster	6ec9379870	trace-events: Delete unused trace points Tracked down with the help of scripts/cleanup-trace-events.pl. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-id: 20200806141334.3646302-4-armbru@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-09-09 17:17:02 +01:00
Eduardo Habkost	8110fa1d94	Use DECLARE_CHECKER macros Generated using: $ ./scripts/codeconverter/converter.py -i \ --pattern=TypeCheckMacro $(git grep -l '' -- '*.[ch]') Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Message-Id: <20200831210740.126168-12-ehabkost@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Message-Id: <20200831210740.126168-13-ehabkost@redhat.com> Message-Id: <20200831210740.126168-14-ehabkost@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2020-09-09 09:27:09 -04:00
Eduardo Habkost	db1015e92e	Move QOM typedefs and add missing includes Some typedefs and macros are defined after the type check macros. This makes it difficult to automatically replace their definitions with OBJECT_DECLARE_TYPE. Patch generated using: $ ./scripts/codeconverter/converter.py -i \ --pattern=QOMStructTypedefSplit $(git grep -l '' -- '.[ch]') which will split "typdef struct { ... } TypedefName" declarations. Followed by: $ ./scripts/codeconverter/converter.py -i --pattern=MoveSymbols \ $(git grep -l '' -- '.[ch]') which will: - move the typedefs and #defines above the type check macros - add missing #include "qom/object.h" lines if necessary Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Message-Id: <20200831210740.126168-9-ehabkost@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Message-Id: <20200831210740.126168-10-ehabkost@redhat.com> Message-Id: <20200831210740.126168-11-ehabkost@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2020-09-09 09:26:43 -04:00
Max Reitz	93393e698c	block: Use bdrv_filter_(bs\|child) where obvious Places that use patterns like if (bs->drv->is_filter && bs->file) { ... something about bs->file->bs ... } should be BlockDriverState *filtered = bdrv_filter_bs(bs); if (filtered) { ... something about @filtered ... } instead. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com>	2020-09-07 12:31:31 +02:00
Zhenyu Ye	a9e80a5f0c	migration: tls: fix memory leak in migration_tls_get_creds Currently migration_tls_get_creds() adds the reference of creds but there was no place to unref it. So the OBJECT(creds) will never be freed and result in memory leak. The leak stack: Direct leak of 104 byte(s) in 1 object(s) allocated from: #0 0xffffa88bd20b in __interceptor_malloc (/usr/lib64/libasan.so.4+0xd320b) #1 0xffffa7f0cb1b in g_malloc (/usr/lib64/libglib-2.0.so.0+0x58b1b) #2 0x14b58cb in object_new_with_type qom/object.c:634 #3 0x14b597b in object_new qom/object.c:645 #4 0x14c0e4f in user_creatable_add_type qom/object_interfaces.c:59 #5 0x141c78b in qmp_object_add qom/qom-qmp-cmds.c:312 #6 0x140e513 in qmp_marshal_object_add qapi/qapi-commands-qom.c:279 #7 0x176ba97 in do_qmp_dispatch qapi/qmp-dispatch.c:165 #8 0x176bee7 in qmp_dispatch qapi/qmp-dispatch.c:208 #9 0x136e337 in monitor_qmp_dispatch monitor/qmp.c:150 #10 0x136eae3 in monitor_qmp_bh_dispatcher monitor/qmp.c:239 #11 0x1852e93 in aio_bh_call util/async.c:89 #12 0x18531b7 in aio_bh_poll util/async.c:117 #13 0x18616bf in aio_dispatch util/aio-posix.c:459 #14 0x1853f37 in aio_ctx_dispatch util/async.c:268 #15 0xffffa7f06a7b in g_main_context_dispatch (/usr/lib64/libglib-2.0.so.0+0x52a7b) Since we're fine to use the borrowed reference when using the creds, so just remove the object_ref() in migration_tls_get_creds(). Signed-off-by: Zhenyu Ye <yezhenyu2@huawei.com> Message-Id: <20200722033228.71-1-yezhenyu2@huawei.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-08-28 13:34:52 +01:00
Daniel P. Berrangé	aa8a926d3c	migration: improve error reporting of block driver state name With blockdev, a BlockDriverState may not have a device name, so using a node name is required as an alternative. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <20200827111606.1408275-2-berrange@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-08-28 13:34:52 +01:00
Longpeng(Mike)	9ba3b2baa1	migration: add vsock as data channel support The vsock channel is more widely use in some new features, for example, the Nitro/Enclave. It can also be used as the migration channel. Signed-off-by: Longpeng(Mike) <longpeng2@huawei.com> Message-Id: <20200806074030.174-3-longpeng2@huawei.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-08-28 13:34:52 +01:00

... 3 4 5 6 7 ...

1700 Commits