Functions are long enough even without this.
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231011203527.9061-10-quintela@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231011203527.9061-9-quintela@redhat.com>
The only user of ram_control_save_page() and save_page() hook was
rdma. Just move the function to rdma.c, rename it to
rdma_control_save_page().
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231011203527.9061-7-quintela@redhat.com>
There is only one flag called with: RAM_CONTROL_BLOCK_REG.
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231011203527.9061-6-quintela@redhat.com>
Instead of going through ram_control_load_hook(), call
qemu_rdma_registration_handle() directly.
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231011203527.9061-5-quintela@redhat.com>
Once there:
- Remove unused data parameter
- unfold it in its callers
- change all callers to call qemu_rdma_registration_stop()
- We need to call QIO_CHANNEL_RDMA() after we check for migrate_rdma()
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231011203527.9061-4-quintela@redhat.com>
Once there:
- Remove unused data parameter
- unfold it in its callers.
- change all callers to call qemu_rdma_registration_start()
- We need to call QIO_CHANNEL_RDMA() after we check for migrate_rdma()
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231011203527.9061-3-quintela@redhat.com>
RDMA was having trouble because
migrate_multifd_flush_after_each_section() can only be true or false,
but we don't want to send any flush when we are not in multifd
migration.
CC: Fabiano Rosas <farosas@suse.de
Fixes: 294e5a4034 ("multifd: Only flush once each full round of memory")
Reported-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231011205548.10571-2-quintela@redhat.com>
This is intended to be a semantic revert of commit 9b09503752
("migration: run setup callbacks out of big lock"). There have been so
many changes since that commit (e.g. a new setup callback
dirty_bitmap_save_setup() that also needs to be adapted now), it's
easier to do the revert manually.
For snapshots, the bdrv_writev_vmstate() function is used during setup
(in QIOChannelBlock backing the QEMUFile), but not holding the BQL
while calling it could lead to an assertion failure. To understand
how, first note the following:
1. Generated coroutine wrappers for block layer functions spawn the
coroutine and use AIO_WAIT_WHILE()/aio_poll() to wait for it.
2. If the host OS switches threads at an inconvenient time, it can
happen that a bottom half scheduled for the main thread's AioContext
is executed as part of a vCPU thread's aio_poll().
An example leading to the assertion failure is as follows:
main thread:
1. A snapshot-save QMP command gets issued.
2. snapshot_save_job_bh() is scheduled.
vCPU thread:
3. aio_poll() for the main thread's AioContext is called (e.g. when
the guest writes to a pflash device, as part of blk_pwrite which is a
generated coroutine wrapper).
4. snapshot_save_job_bh() is executed as part of aio_poll().
3. qemu_savevm_state() is called.
4. qemu_mutex_unlock_iothread() is called. Now
qemu_get_current_aio_context() returns 0x0.
5. bdrv_writev_vmstate() is executed during the usual savevm setup
via qemu_fflush(). But this function is a generated coroutine wrapper,
so it uses AIO_WAIT_WHILE. There, the assertion
assert(qemu_get_current_aio_context() == qemu_get_aio_context());
will fail.
To fix it, ensure that the BQL is held during setup. While it would
only be needed for snapshots, adapting migration too avoids additional
logic for conditional locking/unlocking in the setup callbacks.
Writing the header could (in theory) also trigger qemu_fflush() and
thus bdrv_writev_vmstate(), so the locked section also covers the
qemu_savevm_state_header() call, even for migration for consistency.
The section around multifd_send_sync_main() needs to be unlocked to
avoid a deadlock. In particular, the multifd_save_setup() function calls
socket_send_channel_create() using multifd_new_send_channel_async() as a
callback and then waits for the callback to signal via the
channels_ready semaphore. The connection happens via
qio_task_run_in_thread(), but the callback is only executed via
qio_task_thread_result() which is scheduled for the main event loop.
Without unlocking the section, the main thread would never get to
process the task result and the callback meaning there would be no
signal via the channels_ready semaphore.
The comment in ram_init_bitmaps() was introduced by 4987783400
("migration: fix incorrect memory_global_dirty_log_start outside BQL")
and is removed, because it referred to the qemu_mutex_lock_iothread()
call.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231013105839.415989-1-f.ebner@proxmox.com>
qemu_ram_block_from_host() may return NULL, which will be dereferenced w/o
check. Usualy return value is checked for this function.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Signed-off-by: Dmitry Frolov <frolov@swemel.ru>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231010104851.802947-1-frolov@swemel.ru>
It's just a simple wrapper for rp_sem on either wait() or kick(), make it
even clearer on how it is used. Prepared to be used even for other things.
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-ID: <20231004220240.167175-8-peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Instead of only relying on the count of rp_sem, make the counter be part of
RAMState so it can be used in both threads to synchronize on the process.
rp_sem will be further reused in follow up patches, as a way to kick the
main thread, e.g., on recovery failures.
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231004220240.167175-7-peterx@redhat.com>
When we sent a page through QEMUFile hooks (RDMA) there are three
posiblities:
- We are not using RDMA. return RAM_SAVE_CONTROL_DELAYED and
control_save_page() returns false to let anything else to proceed.
- There is one error but we are using RDMA. Then we return a negative
value, control_save_page() needs to return true.
- Everything goes well and RDMA start the sent of the page
asynchronously. It returns RAM_SAVE_CONTROL_DELAYED and we need to
return 1 for ram_save_page_legacy.
Clear?
I know, I know, the interface is as bad as it gets. I think that now
it is a bit clearer, but this needs to be done some other way.
Reviewed-by: Leonardo Bras <leobras@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-Id: <20230515195709.63843-16-quintela@redhat.com>
RDMA protocol is completely asynchronous, so in qemu_rdma_save_page()
they "invent" that a byte has been transferred. And then they call
qemu_file_credit_transfer() and ram_transferred_add() with that byte.
Just remove that calls as nothing has been sent.
Reviewed-by: Leonardo Bras <leobras@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-Id: <20230515195709.63843-14-quintela@redhat.com>
Remove the one in control_save_page().
Reviewed-by: Leonardo Bras <leobras@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-Id: <20230515195709.63843-12-quintela@redhat.com>
Local variables shadowing other local variables or parameters make the
code needlessly hard to understand. Tracked down with -Wshadow=local.
Clean up: delete inner declarations when they are actually redundant,
else rename variables.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Message-ID: <20230921121312.1301864-3-armbru@redhat.com>
Implement dirty-limit convergence algo for live migration,
which is kind of like auto-converge algo but using dirty-limit
instead of cpu throttle to make migration convergent.
Enable dirty page limit if dirty_rate_high_cnt greater than 2
when dirty-limit capability enabled, Disable dirty-limit if
migration be canceled.
Note that "set_vcpu_dirty_limit", "cancel_vcpu_dirty_limit"
commands are not allowed during dirty-limit live migration.
Signed-off-by: Hyman Huang(黄勇) <yong.huang@smartx.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Message-ID: <168733225273.5845.15871826788879741674-7@git.sr.ht>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
This commit is prepared for the implementation of dirty-limit
convergence algo.
The detection logic of throttling condition can apply to both
auto-converge and dirty-limit algo, putting it's position
before the checking logic for auto-converge feature.
Signed-off-by: Hyman Huang(黄勇) <yong.huang@smartx.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-ID: <168733225273.5845.15871826788879741674-6@git.sr.ht>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Check if block migration is running before throttling
guest down in auto-converge way.
Note that this modification is kind of like code clean,
because block migration does not depend on auto-converge
capability, so the order of checks can be adjusted.
Signed-off-by: Hyman Huang(黄勇) <yong.huang@smartx.com>
Acked-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-Id: <168618975839.6361.17407633874747688653-5@git.sr.ht>
Signed-off-by: Juan Quintela <quintela@redhat.com>
virtio-mem wants to know whether it should not mess with the RAMBlock
content (e.g., discard RAM, preallocate memory) on incoming migration.
So let's expose that function as migrate_ram_is_ignored() in
migration/misc.h
Message-ID: <20230706075612.67404-4-david@redhat.com>
Acked-by: Peter Xu <peterx@redhat.com>
Tested-by: Mario Casquero <mcasquer@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
migrate_ignore_shared() is an optimization that avoids copying memory
that is visible and can be mapped on the target. However, a
memory-backend-ram or a memory-backend-memfd block with the RAM_SHARED
flag set is not migrated when migrate_ignore_shared() is true. This is
wrong, because the block has no named backing store, and its contents will
be lost. To fix, ignore shared memory iff it is a named file. Define a
new flag RAM_NAMED_FILE to distinguish this case.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Message-Id: <1686151116-253260-1-git-send-email-steven.sistare@oracle.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Perform the function selection once, and only if CONFIG_AVX512_OPT
is enabled. Centralize the selection to xbzrle.c, instead of
spreading the init across 3 files.
Remove xbzrle-bench.c. The benefit of being able to benchmark
the different implementations is less important than not peeking
into the internals of the implementation.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Hi
Based on latest reviewed parts of migration:
- Disable colo (vladimir)
- Migration atomic counters (juan)
Please apply.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEGJn/jt6/WMzuA0uC9IfvGFhy1yMFAmRmXJUACgkQ9IfvGFhy
1yNRAxAAjDYJELL34Qovt/WE9qKhYJEvIUGTl1IMWJ22YMFnqIFKRdka57dWoU3P
7EK1BHmokEEtzGT7Fe1ecERXsOwQIJDIkDTJ5g8Oc8Jt1iqY1AC8h5T+LghijCar
mbZ6qWHaSjsg2lmek/xc9quymzFGGK36PSyB5WkaLRviKQn4RIkEDpUaWny7nDbA
Q8zJJpBqNFqKfC5/DN0ePa3QQscXQJhey3nxqFd8hYp8RFNIV5UJVW5Lf6ombtK7
atgdWC4ckkfO2z3OsghKeo/UiMFWpPktgBVVMhDLmk+P/E6czc2gfzD6SCvrPKTj
XowI8hro22HVmq9bEY8PtbjMOfpxrAxer+tM2KR/0O9l3UzUacFsi7KGqCJ1/trQ
1tSDjlgyczb8GOgLwwxj8XE+jPHPfVrzCNfDqrBKBNxz6nnZSdZUwhV5mG8FdVtm
oVVV96BIrNXLl/lIxYIFD/Zyvl8/lrSWQdLkEHTzihYQeXaQfyvPVbV/dOLT4sii
YUuGCuEhF+DW/qz43G1krwq5/bfxsiZoQzrMV/Odtf0wYQKkabA3KNBIda/vxBCR
dsLQ7QtmOwKmCzjqw4LUov9vDNYOYr98o7ZqwJ3qeKL4QgFwtEZUFO3VW6UR8fnF
arVXiTn9wVlkTpu4sT5hLm9400iadhX4Fppji7Ce0tUpLbWbghA=
=3x32
-----END PGP SIGNATURE-----
Merge tag 'migration-20230518-pull-request' of https://gitlab.com/juan.quintela/qemu into staging
Migration Pull request
Hi
Based on latest reviewed parts of migration:
- Disable colo (vladimir)
- Migration atomic counters (juan)
Please apply.
# -----BEGIN PGP SIGNATURE-----
#
# iQIzBAABCAAdFiEEGJn/jt6/WMzuA0uC9IfvGFhy1yMFAmRmXJUACgkQ9IfvGFhy
# 1yNRAxAAjDYJELL34Qovt/WE9qKhYJEvIUGTl1IMWJ22YMFnqIFKRdka57dWoU3P
# 7EK1BHmokEEtzGT7Fe1ecERXsOwQIJDIkDTJ5g8Oc8Jt1iqY1AC8h5T+LghijCar
# mbZ6qWHaSjsg2lmek/xc9quymzFGGK36PSyB5WkaLRviKQn4RIkEDpUaWny7nDbA
# Q8zJJpBqNFqKfC5/DN0ePa3QQscXQJhey3nxqFd8hYp8RFNIV5UJVW5Lf6ombtK7
# atgdWC4ckkfO2z3OsghKeo/UiMFWpPktgBVVMhDLmk+P/E6czc2gfzD6SCvrPKTj
# XowI8hro22HVmq9bEY8PtbjMOfpxrAxer+tM2KR/0O9l3UzUacFsi7KGqCJ1/trQ
# 1tSDjlgyczb8GOgLwwxj8XE+jPHPfVrzCNfDqrBKBNxz6nnZSdZUwhV5mG8FdVtm
# oVVV96BIrNXLl/lIxYIFD/Zyvl8/lrSWQdLkEHTzihYQeXaQfyvPVbV/dOLT4sii
# YUuGCuEhF+DW/qz43G1krwq5/bfxsiZoQzrMV/Odtf0wYQKkabA3KNBIda/vxBCR
# dsLQ7QtmOwKmCzjqw4LUov9vDNYOYr98o7ZqwJ3qeKL4QgFwtEZUFO3VW6UR8fnF
# arVXiTn9wVlkTpu4sT5hLm9400iadhX4Fppji7Ce0tUpLbWbghA=
# =3x32
# -----END PGP SIGNATURE-----
# gpg: Signature made Thu 18 May 2023 10:12:53 AM PDT
# gpg: using RSA key 1899FF8EDEBF58CCEE034B82F487EF185872D723
# gpg: Good signature from "Juan Quintela <quintela@redhat.com>" [undefined]
# gpg: aka "Juan Quintela <quintela@trasno.org>" [undefined]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 1899 FF8E DEBF 58CC EE03 4B82 F487 EF18 5872 D723
* tag 'migration-20230518-pull-request' of https://gitlab.com/juan.quintela/qemu:
migration: Fix duplicated included in meson.build
migration/multifd: Compute transferred bytes correctly
migration: We don't need the field rate_limit_used anymore
migration: Use migration_transferred_bytes() to calculate rate_limit
migration: Add a trace for migration_transferred_bytes
migration: Move migration_total_bytes() to migration-stats.c
migration: Move rate_limit_max and rate_limit_used to migration_stats
qemu-file: Account for rate_limit usage on qemu_fflush()
migration: Don't use INT64_MAX for unlimited rate
migration: process_incoming_migration_co(): move colo part to colo
migration: split migration_incoming_co
configure: add --disable-colo-proxy option
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
These way we can make them atomic and use this functions from any
place. I also moved all functions that use rate_limit to
migration-stats.
Functions got renamed, they are not qemu_file anymore.
qemu_file_rate_limit -> migration_rate_exceeded
qemu_file_set_rate_limit -> migration_rate_set
qemu_file_get_rate_limit -> migration_rate_get
qemu_file_reset_rate_limit -> migration_rate_reset
qemu_file_acct_rate_limit -> migration_rate_account.
Reviewed-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-Id: <20230515195709.63843-6-quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
The global dirty log synchronization is used when KVM and dirty ring
are enabled. There is a particularity for ARM64 where the backup
bitmap is used to track dirty pages in non-running-vcpu situations.
It means the dirty ring works with the combination of ring buffer
and backup bitmap. The dirty bits in the backup bitmap needs to
collected in the last stage of live migration.
In order to identify the last stage of live migration and pass it
down, an extra parameter is added to the relevant functions and
callbacks. This last stage indicator isn't used until the dirty
ring is enabled in the subsequent patches.
No functional change intended.
Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Tested-by: Zhenyu Zhang <zhenyzha@redhat.com>
Message-Id: <20230509022122.20888-2-gshan@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This is not required, colo_flush_ram_cache does not run concurrently
with the multifd threads since the cache is only flushed after
everything has been received. But it makes me more comfortable.
This will be used in the next commits to add colo support to multifd.
Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-Id: <35cb23ba854151d38a31e3a5c8a1020e4283cb4a.1683572883.git.lukasstraub2@web.de>
Signed-off-by: Juan Quintela <quintela@redhat.com>
The overhead of the mutex in non-multifd mode is negligible,
because in that case its just the single thread taking the mutex.
This will be used in the next commits to add colo support to multifd.
Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-Id: <22d83cb428f37929563155531bfb69fd8953cc61.1683572883.git.lukasstraub2@web.de>
Signed-off-by: Juan Quintela <quintela@redhat.com>
This fixes compress with colo.
Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Before this series, "nothing to send" was handled by the file buffer
being empty. Now it is tracked via param->result.
Assert that the file buffer state matches the result.
Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
No functional changes intended.
Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Make compression interfaces take send_queued_data() as an argument.
Remove save_page_use_compression() from flush_compressed_data().
This removes the last ram.c dependency from the core compress code.
Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
This makes the core compress code more independend from ram.c.
Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
save_page_header() accesses several global variables, so calling it
from multiple threads is pretty ugly.
Instead, call save_page_header() before writing out the compressed
data from the compress buffer to the migration stream.
This also makes the core compress code more independend from ram.c.
Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
And take the param->mutex lock for the whole section to ensure
thread-safety.
Now, it is explicitly clear if there is no queued data to send.
Before, this was handled by param->file stream being empty and thus
qemu_put_qemu_file() not sending anything.
This will be used in the next commits to move save_page_header()
out of compress code.
Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Instead introduce a extra parameter to trigger the compress thread.
Now, when the compress thread is done, we know what RAMBlock and
offset it did compress.
This will be used in the next commits to move save_page_header()
out of compress code.
Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
This will be used in the next commits to move save_page_header()
out of compress code.
Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Fixes this commit, clearly a bad merge after a rebase or similar, it
should have been its own case since that point.
commit 5b0e9dd46f
Author: Peter Lieven <pl@kamp.de>
Date: Tue Jun 24 11:32:36 2014 +0200
migration: catch unknown flag combinations in ram_load
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20230504114443.23891-2-quintela@redhat.com>
Otherwise it is confusing with the function xbzrle_enabled().
Suggested-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-Id: <20230504115323.24407-1-quintela@redhat.com>
Now that we have atomic counters, we can do it on the place that we
need it, no need to do it inside ram.c.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Lukas Straub <lukasstraub2@web.de>
migration_stats is just too long, and it is going to have more than
ram counters in the near future.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Lukas Straub <lukasstraub2@web.de>
There is already include/qemu/stats.h, so stats.h was a bad idea.
We want this file to not depend on anything else, we will move all the
migration counters/stats to this struct.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Lukas Straub <lukasstraub2@web.de>
As we set its value, it needs to be operated with atomics.
We rename it from remaining to better reflect its meaning.
Statistics always return the real reamaining bytes. This was used to
store how much pages where dirty on the previous generation, so we can
calculate the expected downtime as: dirty_bytes_last_sync /
current_bandwith.
If we use the actual remaining bytes, we would see a very small value
at the end of the iteration.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
I am open to use ram_bytes_remaining() in its only use and be more
"optimistic" about the downtime.
Don't use __nocheck() functions.
Use stat64_get() now that it exists.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Peter Xu <peterx@redhat.com>
---
Don't use __nocheck() variants
Use stat64_get()
We need to add a new flag to mean to flush at that point.
Notice that we still flush at the end of setup and at the end of
complete stages.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Acked-by: Peter Xu <peterx@redhat.com>
---
Add missing qemu_fflush(), now it passes all tests always.
In the previous version, the check that changes the default value to
false got lost in some rebase. Get it back.
We only need to do that on the ram_save_iterate() call on sending and
on destination when we get a RAM_SAVE_FLAG_EOS.
In setup() and complete() we need to synch in both new and old cases,
so don't add a check there.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Acked-by: Peter Xu <peterx@redhat.com>
---
Remove the wrappers that we take out on patch 5.
Once that we are there, we rename the function to migrate_xbzrle()
to be consistent with all other capabilities.
We change the type to return bool also for consistency.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Once that we are there, we rename the function to migrate_multifd()
to be consistent with all other capabilities.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Once that we are there, we rename the function to migrate_events()
to be consistent with all other capabilities.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Once that we are there, we rename the function to migrate_compress()
to be consistent with all other capabilities.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
We move there all capabilities helpers from migration.c.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
Following David advise:
- looked through the history, capabilities are newer than 2012, so we
can remove that bit of the header.
- This part is posterior to Anthony.
Original Author is Orit. Once there,
I put myself. Peter Xu also did quite a bit of work here.
Anyone else wants/needs to be there? I didn't search too hard
because nobody asked before to be added.
What do you think?
Rest of counters that refer to pages has a _pages suffix.
And historically, this showed the number of full pages transferred.
The name "normal" refered to the fact that they were sent without any
optimization (compression, xbzrle, zero_page, ...).
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Rest of counters that refer to pages has a _pages suffix.
And historically, this showed the number of pages composed of the same
character, here comes the name "duplicated". But since years ago, it
refers to the number of zero_pages.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Using MgrationStats as type for ram_counters mean that we didn't have
to re-declare each value in another struct. The need of atomic
counters have make us to create MigrationAtomicStats for this atomic
counters.
Create RAMStats type which is a merge of MigrationStats and
MigrationAtomicStats removing unused members.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
---
Fix typos found by David Edmondson
I removed that bit on commit:
commit c8df4a7aef
Author: Juan Quintela <quintela@redhat.com>
Date: Mon Oct 3 02:00:03 2022 +0200
migration: Split save_live_pending() into state_pending_*
Fixes: c8df4a7aef
Suggested-by: Nina Schoetterl-Glausch <nsg@linux.ibm.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Since ec6f3ab9, migration with compress enabled was broken, because
the compress threads use a dummy QEMUFile which just acts as a
buffer and that commit accidentally changed it to use the outgoing
migration channel instead.
Fix this by using the dummy file again in the compress threads.
Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Once that res_compatible is removed, they don't make sense anymore.
We remove the _only preffix. And to make things clearer we rename
them to must_precopy and can_postcopy.
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Nothing assigns to it after previous commit.
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Signed-off-by: Juan Quintela <quintela@redhat.com>
So remove last assignation of res_compatible.
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Signed-off-by: Juan Quintela <quintela@redhat.com>
0x80 is RAM_SAVE_FLAG_HOOK, it is in qemu-file now.
Bigger usable flag is 0x200, noticing that.
We can reuse RAM_SAVe_FLAG_FULL.
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
This commit is the same with [PATCH v6 1/2], and provides avx512 support for xbzrle_encode_buffer
function to accelerate xbzrle encoding speed. Runtime check of avx512
support and benchmark for this feature are added. Compared with C
version of xbzrle_encode_buffer function, avx512 version can achieve
50%-70% performance improvement on benchmarking. In addition, if dirty
data is randomly located in 4K page, the avx512 version can achieve
almost 140% performance gain.
Signed-off-by: ling xu <ling1.xu@intel.com>
Co-authored-by: Zhou Zhao <zhou.zhao@intel.com>
Co-authored-by: Jun Jin <jun.i.jin@intel.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
We are going to create a new function for multifd latest in the series.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
We are recalculating ram size continously, when we know that it don't
change during migration. Create a field in RAMState to track it.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
It is just a big if in the middle of the function, and we need two
functions anways.
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Juan Quintela <quintela@redhat.com>
---
Reindent to make Phillipe happy (and CODING_STYLE)
We used to return two bools, just return a single int with the
following meaning:
old return / again / new return
false false PAGE_ALL_CLEAN
false true PAGE_TRY_AGAIN
true true PAGE_DIRTY_FOUND /* We don't care about again at all */
Signed-off-by: Juan Quintela <quintela@redhat.com>
We will need later that find_dirty_block() return errors, so
simplify the loop.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Let's factor out this check, to be used in virtio-mem context next.
While at it, fix a spelling error in a related comment.
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>S
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
ram_block_populate_read() already optimizes for RamDiscardManager.
However, ram_write_tracking_start() will still try protecting discarded
memory ranges.
Let's optimize, because discarded ranges don't map any pages and
(1) For anonymous memory, trying to protect using uffd-wp without a mapped
page is ignored by the kernel and consequently a NOP.
(2) For shared/file-backed memory, we will fill present page tables in the
range with PTE markers. However, we will even allocate page tables
just to fill them with unnecessary PTE markers and effectively
waste memory.
So let's exclude these ranges, just like ram_block_populate_read()
already does.
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
ram_mig_ram_block_resized() will abort migration (including background
snapshots) when resizing a RAMBlock. ram_block_populate_read() will only
populate RAM up to used_length, so at least for anonymous memory
protecting everything between used_length and max_length won't
actually be protected and is just a NOP.
So let's only protect everything up to used_length.
Note: it still makes sense to register uffd-wp for max_length, such
that RAM_UF_WRITEPROTECT is independent of a changing used_length.
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
When unregistering uffd-wp, older kernels before commit f369b07c86143
("mm/uffd:reset write protection when unregister with wp-mode") won't
clear the uffd-wp PTE bit. When re-registering uffd-wp, the previous
uffd-wp PTE bits would trigger again. With above commit, the kernel will
clear the uffd-wp PTE bits when unregistering itself.
Consequently, we'll clear the uffd-wp PTE bits now twice -- whereby we
don't care about clearing them at all: a new background snapshot will
re-register uffd-wp and re-protect all memory either way.
So let's skip the manual clearing of uffd-wp. If ever relevant, we
could clear conditionally in uffd_unregister_memory() -- we just need a
way to figure out more recent kernels.
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
If something goes wrong during uffd_change_protection(), we would miss
to unregister uffd-wp and not release our reference. Fix it by
performing the uffd_change_protection(true) last.
Note that a uffd_change_protection(false) on the recovery path without a
prior uffd_change_protection(false) is fine.
Fixes: 278e2f551a ("migration: support UFFD write fault processing in ram_save_iterate()")
Cc: qemu-stable@nongnu.org
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Unfortunately, commit f7b9dcfbcf broke populate_read_range(): the loop
end condition is very wrong, resulting in that function not populating the
full range. Lets' fix that.
Fixes: f7b9dcfbcf ("migration/ram: Factor out populating pages readable in ram_block_populate_pages()")
Cc: qemu-stable@nongnu.org
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Until previous commit, save_live_pending() was used for ram. Now with
the split into state_pending_estimate() and state_pending_exact() it
is not needed anymore, so remove them.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
We split the function into to:
- state_pending_estimate: We estimate the remaining state size without
stopping the machine.
- state pending_exact: We calculate the exact amount of remaining
state.
The only "device" that implements different functions for _estimate()
and _exact() is ram.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Commit d9e474ea56 overlooked the case where the target psize is even larger
than the host psize. One example is Alpha has 8K page size and migration
will start to crash the source QEMU when running Alpha migration on x86.
Fix it by detecting that case and set host start/end just to cover the
single page to be migrated.
This will slightly optimize the common case where host psize equals to
guest psize so we don't even need to do the roundups, but that's trivial.
Cc: qemu-stable@nongnu.org
Reported-by: Thomas Huth <thuth@redhat.com>
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1456
Fixes: d9e474ea56 ("migration: Teach PSS about host page")
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Now with rs->pss we can already cache channels in pss->pss_channels. That
pss_channel contains more infromation than rs->f because it's per-channel.
So rs->f could be replaced by rss->pss[RAM_CHANNEL_PRECOPY].pss_channel,
while rs->f itself is a bit vague now.
Note that vanilla postcopy still send pages via pss[RAM_CHANNEL_PRECOPY],
that's slightly confusing but it reflects the reality.
Then, after the replacement we can safely drop rs->f.
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
With the new code to send pages in rp-return thread, there's little help to
keep lots of the old code on maintaining the preempt state in migration
thread, because the new way should always be faster..
Then if we'll always send pages in the rp-return thread anyway, we don't
need those logic to maintain preempt state anymore because now we serialize
things using the mutex directly instead of using those fields.
It's very unfortunate to have those code for a short period, but that's
still one intermediate step that we noticed the next bottleneck on the
migration thread. Now what we can do best is to drop unnecessary code as
long as the new code is stable to reduce the burden. It's actually a good
thing because the new "sending page in rp-return thread" model is (IMHO)
even cleaner and with better performance.
Remove the old code that was responsible for maintaining preempt states, at
the meantime also remove x-postcopy-preempt-break-huge parameter because
with concurrent sender threads we don't really need to break-huge anymore.
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
With all the facilities ready, send the requested page directly in the
rp-return thread rather than queuing it in the request queue, if and only
if postcopy preempt is enabled. It can achieve so because it uses separate
channel for sending urgent pages. The only shared data is bitmap and it's
protected by the bitmap_mutex.
Note that since we're moving the ownership of the urgent channel from the
migration thread to rp thread it also means the rp thread is responsible
for managing the qemufile, e.g. properly close it when pausing migration
happens. For this, let migration_release_from_dst_file to cover shutdown
of the urgent channel too, renaming it as migration_release_dst_files() to
better show what it does.
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Since we use PageSearchStatus to represent a channel, it makes perfect
sense to keep last_sent_block (aka, leverage RAM_SAVE_FLAG_CONTINUE) to be
per-channel rather than global because each channel can be sending
different pages on ramblocks.
Hence move it from RAMState into PageSearchStatus.
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
We used to allocate PSS structure on the stack for precopy when sending
pages. Make it static, so as to describe per-channel ram migration status.
Here we declared RAM_CHANNEL_MAX instances, preparing for postcopy to use
it, even though this patch has not yet to start using the 2nd instance.
This should not have any functional change per se, but it already starts to
export PSS information via the RAMState, so that e.g. one PSS channel can
start to reference the other PSS channel.
Always protect PSS access using the same RAMState.bitmap_mutex. We already
do so, so no code change needed, just some comment update. Maybe we should
consider renaming bitmap_mutex some day as it's going to be a more commonly
and big mutex we use for ram states, but just leave it for later.
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Helper to init PSS structures.
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Introduce pss_channel for PageSearchStatus, define it as "the migration
channel to be used to transfer this host page".
We used to have rs->f, which is a mirror to MigrationState.to_dst_file.
After postcopy preempt initial version, rs->f can be dynamically changed
depending on which channel we want to use.
But that later work still doesn't grant full concurrency of sending pages
in e.g. different threads, because rs->f can either be the PRECOPY channel
or POSTCOPY channel. This needs to be per-thread too.
PageSearchStatus is actually a good piece of struct which we can leverage
if we want to have multiple threads sending pages. Sending a single guest
page may not make sense, so we make the granule to be "host page", and in
the PSS structure we allow specify a QEMUFile* to migrate a specific host
page. Then we open the possibility to specify different channels in
different threads with different PSS structures.
The PSS prefix can be slightly misleading here because e.g. for the
upcoming usage of postcopy channel/thread it's not "searching" (or,
scanning) at all but sending the explicit page that was requested. However
since PSS existed for some years keep it as-is until someone complains.
This patch mostly (simply) replace rs->f with pss->pss_channel only. No
functional change intended for this patch yet. But it does prepare to
finally drop rs->f, and make ram_save_guest_page() thread safe.
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Migration code has a lot to do with host pages. Teaching PSS core about
the idea of host page helps a lot and makes the code clean. Meanwhile,
this prepares for the future changes that can leverage the new PSS helpers
that this patch introduces to send host page in another thread.
Three more fields are introduced for this:
(1) host_page_sending: this is set to true when QEMU is sending a host
page, false otherwise.
(2) host_page_{start|end}: these point to the start/end of host page
we're sending, and it's only valid when host_page_sending==true.
For example, when we look up the next dirty page on the ramblock, with
host_page_sending==true, we'll not try to look for anything beyond the
current host page boundary. This can be slightly efficient than current
code because currently we'll set pss->page to next dirty bit (which can be
over current host page boundary) and reset it to host page boundary if we
found it goes beyond that.
With above, we can easily make migration_bitmap_find_dirty() self contained
by updating pss->page properly. rs* parameter is removed because it's not
even used in old code.
When sending a host page, we should use the pss helpers like this:
- pss_host_page_prepare(pss): called before sending host page
- pss_within_range(pss): whether we're still working on the cur host page?
- pss_host_page_finish(pss): called after sending a host page
Then we can use ram_save_target_page() to save one small page.
Currently ram_save_host_page() is still the only user. If there'll be
another function to send host page (e.g. in return path thread) in the
future, it should follow the same style.
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
To prepare for thread-safety on page accountings, at least below counters
need to be accessed only atomically, they are:
ram_counters.transferred
ram_counters.duplicate
ram_counters.normal
ram_counters.postcopy_bytes
There are a lot of other counters but they won't be accessed outside
migration thread, then they're still safe to be accessed without atomic
ops.
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Don't take the bitmap mutex when sending pages, or when being throttled by
migration_rate_limit() (which is a bit tricky to call it here in ram code,
but seems still helpful).
It prepares for the possibility of concurrently sending pages in >1 threads
using the function ram_save_host_page() because all threads may need the
bitmap_mutex to operate on bitmaps, so that either sendmsg() or any kind of
qemu_sem_wait() blocking for one thread will not block the other from
progressing.
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>