Expose qemu_cpu_list_lock globally so that we can use
WITH_QEMU_LOCK_GUARD and QEMU_LOCK_GUARD to simplify a few code paths
now and in future.
Signed-off-by: Jamie Iles <quic_jiles@quicinc.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20230427020925.51003-2-quic_jiles@quicinc.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
We generally require same set of capabilities on source and target.
Let's require x-colo capability to use COLO on target.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Lukas Straub <lukasstraub2@web.de>
Reviewed-by: Zhang Chen <chen.zhang@intel.com>
Message-Id: <20230428194928.1426370-11-vsementsov@yandex-team.ru>
Signed-off-by: Juan Quintela <quintela@redhat.com>
COLO is not listed as running state in migrate_is_running(), so, it's
theoretically possible to disable colo capability in COLO state and the
unexpected error in migration_iteration_finish() is reachable.
Let's disallow that in qmp_migrate_set_capabilities. Than the error
becomes absolutely unreachable: we can get into COLO state only with
enabled capability and can't disable it while we are in COLO state. So
substitute the error by simple assertion.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Message-Id: <20230428194928.1426370-10-vsementsov@yandex-team.ru>
Signed-off-by: Juan Quintela <quintela@redhat.com>
have_colo_incoming_thread variable is unused. colo_incoming_thread can
be local.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Zhang Chen <chen.zhang@intel.com>
Message-Id: <20230428194928.1426370-6-vsementsov@yandex-team.ru>
Signed-off-by: Juan Quintela <quintela@redhat.com>
We don't allow to use x-colo capability when replication is not
configured. So, no reason to build COLO when replication is disabled,
it's unusable in this case.
Note also that the check in migrate_caps_check() is not the only
restriction: some functions in migration/colo.c will just abort if
called with not defined CONFIG_REPLICATION, for example:
migration_iteration_finish()
case MIGRATION_STATUS_COLO:
migrate_start_colo_process()
colo_process_checkpoint()
abort()
It could probably make sense to have possibility to enable COLO without
REPLICATION, but this requires deeper audit of colo & replication code,
which may be done later if needed.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Acked-by: Dr. David Alan Gilbert <dave@treblig.org>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-Id: <20230428194928.1426370-4-vsementsov@yandex-team.ru>
Signed-off-by: Juan Quintela <quintela@redhat.com>
colo_checkpoint_notify() is mostly used in colo.c. Outside we use it
once when x-checkpoint-delay migration parameter is set. So, let's
simplify the external API to only that function - notify COLO that
parameter was set. This make external API more robust and hides
implementation details from external callers. Also this helps us to
make COLO module optional in further patch (i.e. we are going to add
possibility not build the COLO module).
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Zhang Chen <chen.zhang@intel.com>
Message-Id: <20230428194928.1426370-3-vsementsov@yandex-team.ru>
Signed-off-by: Juan Quintela <quintela@redhat.com>
This will be used in the next commits to add colo support to multifd.
Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-Id: <88135197411df1a71d7832962b39abf60faf0021.1683572883.git.lukasstraub2@web.de>
Signed-off-by: Juan Quintela <quintela@redhat.com>
This is not required, colo_flush_ram_cache does not run concurrently
with the multifd threads since the cache is only flushed after
everything has been received. But it makes me more comfortable.
This will be used in the next commits to add colo support to multifd.
Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-Id: <35cb23ba854151d38a31e3a5c8a1020e4283cb4a.1683572883.git.lukasstraub2@web.de>
Signed-off-by: Juan Quintela <quintela@redhat.com>
The overhead of the mutex in non-multifd mode is negligible,
because in that case its just the single thread taking the mutex.
This will be used in the next commits to add colo support to multifd.
Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-Id: <22d83cb428f37929563155531bfb69fd8953cc61.1683572883.git.lukasstraub2@web.de>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Commit fe904ea824 added a fail_inactivate label, which tries to
reactivate disks on the source after a failure while s->state ==
MIGRATION_STATUS_ACTIVE, but didn't actually use the label if
qemu_savevm_state_complete_precopy() failed. This failure to
reactivate is also present in commit 6039dd5b1c (also covering the new
s->state == MIGRATION_STATUS_DEVICE state) and 403d18ae (ensuring
s->block_inactive is set more reliably).
Consolidate the two labels back into one - no matter HOW migration is
failed, if there is any chance we can reach vm_start() after having
attempted inactivation, it is essential that we have tried to restart
disks before then. This also makes the cleanup more like
migrate_fd_cancel().
Suggested-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20230502205212.134680-1-eblake@redhat.com>
Acked-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This fixes compress with colo.
Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Make ram-compress.c target independent.
Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Before this series, "nothing to send" was handled by the file buffer
being empty. Now it is tracked via param->result.
Assert that the file buffer state matches the result.
Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
No functional changes intended.
Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Make compression interfaces take send_queued_data() as an argument.
Remove save_page_use_compression() from flush_compressed_data().
This removes the last ram.c dependency from the core compress code.
Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
This makes the core compress code more independend from ram.c.
Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
save_page_header() accesses several global variables, so calling it
from multiple threads is pretty ugly.
Instead, call save_page_header() before writing out the compressed
data from the compress buffer to the migration stream.
This also makes the core compress code more independend from ram.c.
Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
And take the param->mutex lock for the whole section to ensure
thread-safety.
Now, it is explicitly clear if there is no queued data to send.
Before, this was handled by param->file stream being empty and thus
qemu_put_qemu_file() not sending anything.
This will be used in the next commits to move save_page_header()
out of compress code.
Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Instead introduce a extra parameter to trigger the compress thread.
Now, when the compress thread is done, we know what RAMBlock and
offset it did compress.
This will be used in the next commits to move save_page_header()
out of compress code.
Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
This will be used in the next commits to move save_page_header()
out of compress code.
Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Message-Id: <20230504113841.23130-9-quintela@redhat.com>
Change all the functions that use it. It was already passed as
uint64_t.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20230504113841.23130-8-quintela@redhat.com>
The first thing that we do after setting the shutdown value is set the
error as -EIO if there is not a previous error.
So this value is redundant. Just remove it and use
qemu_file_get_error() in the places that it was tested.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Message-Id: <20230504113841.23130-7-quintela@redhat.com>
After calling qemu_file_shutdown() we set the error as -EIO if there
is no another previous error, so no need to check it here.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20230504113841.23130-6-quintela@redhat.com>
So delta_bytes can only be greater or equal to zero. Never negative.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20230504113841.23130-3-quintela@redhat.com>
So make everything that uses it uint64_t no int64_t.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20230504113841.23130-2-quintela@redhat.com>
It makes no sense first try to see if there is an rdma error and then
do nothing on postcopy stage. Change it so we check we are in
postcopy before doing anything.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20230504114443.23891-6-quintela@redhat.com>
This could only happen if the source sent
RAM_SAVE_FLAG_HOOK (i.e. rdma) and destination don't have CONFIG_RDMA.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20230504114443.23891-5-quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20230504114443.23891-4-quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20230504114443.23891-3-quintela@redhat.com>
Fixes this commit, clearly a bad merge after a rebase or similar, it
should have been its own case since that point.
commit 5b0e9dd46f
Author: Peter Lieven <pl@kamp.de>
Date: Tue Jun 24 11:32:36 2014 +0200
migration: catch unknown flag combinations in ram_load
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20230504114443.23891-2-quintela@redhat.com>
Otherwise it is confusing with the function xbzrle_enabled().
Suggested-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-Id: <20230504115323.24407-1-quintela@redhat.com>
I forgot to move it when I rename it from duplicated_pages.
Message-Id: <20230504103357.22130-3-quintela@redhat.com>
Reviewed-by: David Edmondson <david.edmondson@oracle.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-Id: <20230504103357.22130-2-quintela@redhat.com>
Reviewed-by: David Edmondson <david.edmondson@oracle.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
We can calculate it from the QEMUFile like the caller.
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20230503131847.11603-6-quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
It is valid that params->has_block_bitmap_mapping is true and
params->block_bitmap_mapping is NULL. So we can't use the trick of
having a single function.
Move to two functions one for each value and the tests are fixed.
Fixes: b804b35b1c
migration: Create migrate_block_bitmap_mapping() function
Reported-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Message-Id: <20230503181036.14890-1-quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
It is not needed since we moved the accessor for tls properties to
options.c.
Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
It is not needed since we moved the accessor for tls properties to
options.c.
Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Now that we have atomic counters, we can do it on the place that we
need it, no need to do it inside ram.c.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Lukas Straub <lukasstraub2@web.de>
It is lousely based on MigrationStats, but that name is taken, so this
is the best one that I came with.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Lukas Straub <lukasstraub2@web.de>
---
If you have any good suggestion for the name, I am all ears.
migration_stats is just too long, and it is going to have more than
ram counters in the near future.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Lukas Straub <lukasstraub2@web.de>
There is already include/qemu/stats.h, so stats.h was a bad idea.
We want this file to not depend on anything else, we will move all the
migration counters/stats to this struct.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Lukas Straub <lukasstraub2@web.de>
Use the attribute, which is supported by clang, instead of
the #pragma, which is not supported and, for some reason,
also not detected by the meson probe, so we fail by -Werror.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-Id: <20230501210555.289806-1-richard.henderson@linaro.org>
As we set its value, it needs to be operated with atomics.
We rename it from remaining to better reflect its meaning.
Statistics always return the real reamaining bytes. This was used to
store how much pages where dirty on the previous generation, so we can
calculate the expected downtime as: dirty_bytes_last_sync /
current_bandwith.
If we use the actual remaining bytes, we would see a very small value
at the end of the iteration.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
I am open to use ram_bytes_remaining() in its only use and be more
"optimistic" about the downtime.
Don't use __nocheck() functions.
Use stat64_get() now that it exists.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Peter Xu <peterx@redhat.com>
---
Don't use __nocheck() variants
Use stat64_get()
We need to add a new flag to mean to flush at that point.
Notice that we still flush at the end of setup and at the end of
complete stages.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Acked-by: Peter Xu <peterx@redhat.com>
---
Add missing qemu_fflush(), now it passes all tests always.
In the previous version, the check that changes the default value to
false got lost in some rebase. Get it back.