Functions that use an Error **errp parameter to return errors should
not also report them to the user, because reporting is the caller's
job. The principle is violated by machine_run_board_init() because
it calls error_report(), error_printf(), and exit(1) when the machine
doesn't support the requested CPU type.
Clean this up by using error_setg() and error_append_hint() instead.
No functional change, as the only caller passes &error_fatal.
Suggested-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Message-ID: <20231204004726.483558-2-gshan@redhat.com>
[PMD: Correct error_append_hint() argument]
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Add a helper to return a machine default CPU type.
If this machine is restricted to a single CPU type,
use it as default, obviously.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20231116163726.28952-1-philmd@linaro.org>
Use generic cpu_model_from_type() when the CPU model name needs to
be extracted from the CPU type name.
Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-ID: <20231114235628.534334-23-gshan@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
No changes in the output from the following command.
[gshan@gshan q]$ ./build/qemu-system-tricore -cpu ?
Available CPUs:
tc1796
tc1797
tc27x
tc37x
Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20231114235628.534334-21-gshan@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Before it's applied:
[gshan@gshan q]$ ./build/qemu-or1k -cpu ?
Available CPUs:
or1200
any
After it's applied:
[gshan@gshan q]$ ./build/qemu-or1k -cpu ?
Available CPUs:
any
or1200
Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-ID: <20231114235628.534334-17-gshan@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
No changes in the output from the following command.
[gshan@gshan q]$ ./build/qemu-system-hppa -cpu ?
Available CPUs:
hppa
hppa64
Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-ID: <20231114235628.534334-13-gshan@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
No changes in the output from the following command.
[gshan@gshan q]$ ./build/qemu-hexagon -cpu ?
Available CPUs:
v67
v68
v69
v71
v73
Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20231114235628.534334-12-gshan@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Add generic cpu_list() to replace the individual target's implementation
in the subsequent commits. Currently, there are 3 targets with no cpu_list()
implementation: microblaze and nios2. With this applied, those two targets
switch to the generic cpu_list().
[gshan@gshan q]$ ./build/qemu-system-microblaze -cpu ?
Available CPUs:
microblaze-cpu
[gshan@gshan q]$ ./build/qemu-system-nios2 -cpu ?
Available CPUs:
nios2-cpu
Suggested-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20231114235628.534334-7-gshan@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Add helper cpu_model_from_type() to extract the CPU model name from
the CPU type name in two circumstances: (1) The CPU type name is the
combination of the CPU model name and suffix. (2) The CPU type name
is same to the CPU model name.
The helper will be used in the subsequent commits to conver the
CPU type name to the CPU model name.
Suggested-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-ID: <20231114235628.534334-6-gshan@redhat.com>
[PMD: Mention returned string must be released with g_free()]
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
For all targets, the CPU class returned from CPUClass::class_by_name()
and object_class_dynamic_cast(oc, CPU_RESOLVING_TYPE) need to be
compatible. Lets apply the check in cpu_class_by_name() for once,
instead of having the check in CPUClass::class_by_name() for individual
target.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Gavin Shan <gshan@redhat.com>
Message-ID: <20231114235628.534334-4-gshan@redhat.com>
Since commit 3a9d0d7b64 ("hw/cpu: Call object_class_is_abstract()
once in cpu_class_by_name()"), there is no need to check if @oc is
abstract because it has been covered by cpu_class_by_name().
Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20231114235628.534334-3-gshan@redhat.com>
[PMD: Mention commit 3a9d0d7b64]
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
'ev67' CPU class will be returned to match everything, which makes
no sense as mentioned in the comments. Remove the logic to fall
back to 'ev67' CPU class to match everything.
Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20231114235628.534334-2-gshan@redhat.com>
[PMD: Reword subject, replace 'any' -> 'ev67' on linux-user]
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Allow building a qemu-system-foo binary with target-agnostic
only HW models.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20231121203129.67999-1-philmd@linaro.org>
- We lost Juan and Leo in the maintainers file
- Steven's suspend state fix
- Steven's fix for coverity on migrate_mode
- Avihai's migration cleanup series
-----BEGIN PGP SIGNATURE-----
iIgEABYKADAWIQS5GE3CDMRX2s990ak7X8zN86vXBgUCZZY0TxIccGV0ZXJ4QHJl
ZGhhdC5jb20ACgkQO1/MzfOr1wbSxgEAoM5g3wkc22lpAlRpU+hJUqT9NVOVQSK+
Fk7XJYTdSgABAKzykA6hAmU5Kj+yVI6jI874SVZbs2FWpFs4osvsKk4D
=sfuM
-----END PGP SIGNATURE-----
Merge tag 'migration-20240104-pull-request' of https://gitlab.com/peterx/qemu into staging
migration 1st pull for 9.0
- We lost Juan and Leo in the maintainers file
- Steven's suspend state fix
- Steven's fix for coverity on migrate_mode
- Avihai's migration cleanup series
# -----BEGIN PGP SIGNATURE-----
#
# iIgEABYKADAWIQS5GE3CDMRX2s990ak7X8zN86vXBgUCZZY0TxIccGV0ZXJ4QHJl
# ZGhhdC5jb20ACgkQO1/MzfOr1wbSxgEAoM5g3wkc22lpAlRpU+hJUqT9NVOVQSK+
# Fk7XJYTdSgABAKzykA6hAmU5Kj+yVI6jI874SVZbs2FWpFs4osvsKk4D
# =sfuM
# -----END PGP SIGNATURE-----
# gpg: Signature made Thu 04 Jan 2024 04:30:07 GMT
# gpg: using EDDSA key B9184DC20CC457DACF7DD1A93B5FCCCDF3ABD706
# gpg: issuer "peterx@redhat.com"
# gpg: Good signature from "Peter Xu <xzpeter@gmail.com>" [unknown]
# gpg: aka "Peter Xu <peterx@redhat.com>" [unknown]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: B918 4DC2 0CC4 57DA CF7D D1A9 3B5F CCCD F3AB D706
* tag 'migration-20240104-pull-request' of https://gitlab.com/peterx/qemu: (26 commits)
migration: fix coverity migrate_mode finding
migration/multifd: Remove unnecessary usage of local Error
migration: Remove unnecessary usage of local Error
migration: Fix migration_channel_read_peek() error path
migration/multifd: Remove error_setg() in migration_ioc_process_incoming()
migration/multifd: Fix leaking of Error in TLS error flow
migration/multifd: Simplify multifd_channel_connect() if else statement
migration/multifd: Fix error message in multifd_recv_initial_packet()
migration: Remove errp parameter in migration_fd_process_incoming()
migration: Refactor migration_incoming_setup()
migration: Remove nulling of hostname in migrate_init()
migration: Remove migrate_max_downtime() declaration
tests/qtest: postcopy migration with suspend
tests/qtest: precopy migration with suspend
tests/qtest: option to suspend during migration
tests/qtest: migration events
migration: preserve suspended for bg_migration
migration: preserve suspended for snapshot
migration: preserve suspended runstate
migration: propagate suspended runstate
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Coverity diagnoses a possible out-of-range array index here ...
static GSList *migration_blockers[MIG_MODE__MAX];
fill_source_migration_info() {
GSList *cur_blocker = migration_blockers[migrate_mode()];
... because it does not know that MIG_MODE__MAX will never be returned as
a migration mode. To fix, assert so in migrate_mode().
Fixes: fa3673e497 ("migration: per-mode blockers")
Reported-by: Peter Maydell <peter.maydell@linaro.org>
Suggested-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/1699907025-215450-1-git-send-email-steven.sistare@oracle.com
Signed-off-by: Peter Xu <peterx@redhat.com>
According to Error API, usage of ERRP_GUARD() or a local Error instead
of errp is needed if errp is passed to void functions, where it is later
dereferenced to see if an error occurred.
There are several places in multifd.c that use local Error although it
is not needed. Change these places to use errp directly.
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Link: https://lore.kernel.org/r/20231231093016.14204-12-avihaih@nvidia.com
Signed-off-by: Peter Xu <peterx@redhat.com>
According to Error API, usage of ERRP_GUARD() or a local Error instead
of errp is needed if errp is passed to void functions, where it is later
dereferenced to see if an error occurred.
There are several places in migration.c that use local Error although it
is not needed. Change these places to use errp directly.
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20231231093016.14204-11-avihaih@nvidia.com
Signed-off-by: Peter Xu <peterx@redhat.com>
migration_channel_read_peek() calls qio_channel_readv_full() and handles
both cases of return value == 0 and return value < 0 the same way, by
calling error_setg() with errp. However, if return value < 0, errp is
already set, so calling error_setg() with errp will lead to an assert.
Fix it by handling these cases separately, calling error_setg() with
errp only in return value == 0 case.
Fixes: 6720c2b327 ("migration: check magic value for deciding the mapping of channels")
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Link: https://lore.kernel.org/r/20231231093016.14204-10-avihaih@nvidia.com
Signed-off-by: Peter Xu <peterx@redhat.com>
If multifd_load_setup() fails in migration_ioc_process_incoming(),
error_setg() is called with errp. This will lead to an assert because in
that case errp already contains an error.
Fix it by removing the redundant error_setg().
Fixes: 6720c2b327 ("migration: check magic value for deciding the mapping of channels")
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20231231093016.14204-9-avihaih@nvidia.com
Signed-off-by: Peter Xu <peterx@redhat.com>
If there is an error in multifd TLS handshake task,
multifd_tls_outgoing_handshake() retrieves the error with
qio_task_propagate_error() but never frees it.
Fix it by freeing the obtained Error.
In addition, the error is not reported at all, so report it with
migrate_set_error().
Fixes: 2964714015 ("migration/tls: add support for multifd tls-handshake")
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20231231093016.14204-8-avihaih@nvidia.com
Signed-off-by: Peter Xu <peterx@redhat.com>
The else branch in multifd_channel_connect() is redundant because when
the if branch is taken the function returns.
Simplify the code by removing the else branch.
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Link: https://lore.kernel.org/r/20231231093016.14204-7-avihaih@nvidia.com
Signed-off-by: Peter Xu <peterx@redhat.com>
In multifd_recv_initial_packet(), if MultiFDInit_t->id is greater than
the configured number of multifd channels, an irrelevant error message
about multifd version is printed.
Change the error message to a relevant one about the channel id.
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20231231093016.14204-6-avihaih@nvidia.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Commit 6720c2b327 ("migration: check magic value for deciding the
mapping of channels") extracted the only code that could fail in
migration_incoming_setup().
Now migration_incoming_setup() can't fail, so refactor it to return void
and remove errp parameter.
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Link: https://lore.kernel.org/r/20231231093016.14204-4-avihaih@nvidia.com
Signed-off-by: Peter Xu <peterx@redhat.com>
MigrationState->hostname is set to NULL in migrate_init(). This is
redundant because it is already freed and set to NULL in
migrade_fd_cleanup(). Remove it.
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20231231093016.14204-3-avihaih@nvidia.com
Signed-off-by: Peter Xu <peterx@redhat.com>
migrate_max_downtime() has been removed long ago, but its declaration
was mistakenly left. Remove it.
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20231231093016.14204-2-avihaih@nvidia.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Add a test case to verify that the suspended state is handled correctly by
live migration postcopy. The test suspends the src, migrates, then wakes
the dest.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/r/1704312341-66640-13-git-send-email-steven.sistare@oracle.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Add a test case to verify that the suspended state is handled correctly
during live migration precopy. The test suspends the src, migrates, then
wakes the dest.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/r/1704312341-66640-12-git-send-email-steven.sistare@oracle.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Add an option to suspend the src in a-b-bootblock.S, which puts the guest
in S3 state after one round of writing to memory. The option is enabled by
poking a 1 into the suspend_me word in the boot block prior to starting the
src vm. Generate symbol offsets in a-b-bootblock.h so that the suspend_me
offset is known. Generate the bootblock for each test, because suspend_me
may differ for each.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Acked-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/1704312341-66640-11-git-send-email-steven.sistare@oracle.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Define a state object to capture events seen by migration tests, to allow
more events to be captured in a subsequent patch, and simplify event
checking in wait_for_migration_pass. No functional change.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: "Daniel P. Berrangé" <berrange@redhat.com>
Link: https://lore.kernel.org/r/1704312341-66640-10-git-send-email-steven.sistare@oracle.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Do not wake a suspended guest during bg_migration, and restore the prior
state at finish rather than unconditionally running. Allow the additional
state transitions that occur.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/r/1704312341-66640-9-git-send-email-steven.sistare@oracle.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Restoring a snapshot can break a suspended guest. Snapshots suffer from
the same suspended-state issues that affect live migration, plus they must
handle an additional problematic scenario, which is that a running vm must
remain running if it loads a suspended snapshot.
To save, the existing vm_stop call now completely stops the suspended
state. Finish with vm_resume to leave the vm in the state it had prior
to the save, correctly restoring the suspended state.
To load, if the snapshot is not suspended, then vm_stop + vm_resume
correctly handles all states, and leaves the vm in the state it had prior
to the load. However, if the snapshot is suspended, restoration is
trickier. First, call vm_resume to restore the state to suspended so the
current state matches the saved state. Then, if the pre-load state is
running, call wakeup to resume running.
Prior to these changes, the vm_stop to RUN_STATE_SAVE_VM and
RUN_STATE_RESTORE_VM did not change runstate if the current state was
suspended, but now it does, so allow these transitions.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/r/1704312341-66640-8-git-send-email-steven.sistare@oracle.com
Signed-off-by: Peter Xu <peterx@redhat.com>
A guest that is migrated in the suspended state automaticaly wakes and
continues execution. This is wrong; the guest should end migration in
the same state it started. The root cause is that the outgoing migration
code automatically wakes the guest, then saves the RUNNING runstate in
global_state_store(), hence the incoming migration code thinks the guest is
running and continues the guest if autostart is true.
On the outgoing side, delete the call to qemu_system_wakeup_request().
Now that vm_stop completely stops a vm in the suspended state (from the
preceding patches), the existing call to vm_stop_force_state is sufficient
to correctly migrate all vmstate.
On the incoming side, call vm_start if the pre-migration state was running
or suspended. For the latter, vm_start correctly restores the suspended
state, and a future system_wakeup monitor request will cause the vm to
resume running.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/r/1704312341-66640-7-git-send-email-steven.sistare@oracle.com
Signed-off-by: Peter Xu <peterx@redhat.com>
If the outgoing machine was previously suspended, propagate that to the
incoming side via global_state, so a subsequent vm_start restores the
suspended state. To maintain backward and forward compatibility, reclaim
some space from the runstate member.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/r/1704312341-66640-6-git-send-email-steven.sistare@oracle.com
Signed-off-by: Peter Xu <peterx@redhat.com>
When a vm transitions from running to suspended, runstate notifiers are
not called, so the notifiers still think the vm is running. Hence, when
we call vm_start to restore the suspended state, we call vm_state_notify
with running=1. However, some notifiers check for RUN_STATE_RUNNING.
They must check the running boolean instead.
No functional change.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/r/1704312341-66640-4-git-send-email-steven.sistare@oracle.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Currently, a vm in the suspended state is not completely stopped. The VCPUs
have been paused, but the cpu clock still runs, and runstate notifiers for
the transition to stopped have not been called. This causes problems for
live migration. Stale cpu timers_state is saved to the migration stream,
causing time errors in the guest when it wakes from suspend, and state that
would have been modified by runstate notifiers is wrong.
Modify vm_stop to completely stop the vm if the current state is suspended,
transition to RUN_STATE_PAUSED, and remember that the machine was suspended.
Modify vm_start to restore the suspended state.
This affects all callers of vm_stop and vm_start, notably, the qapi stop and
cont commands:
old behavior:
RUN_STATE_SUSPENDED --> stop --> RUN_STATE_SUSPENDED
new behavior:
RUN_STATE_SUSPENDED --> stop --> RUN_STATE_PAUSED
RUN_STATE_PAUSED --> cont --> RUN_STATE_SUSPENDED
For example:
(qemu) info status
VM status: paused (suspended)
(qemu) stop
(qemu) info status
VM status: paused
(qemu) system_wakeup
Error: Unable to wake up: guest is not in suspended state
(qemu) cont
(qemu) info status
VM status: paused (suspended)
(qemu) system_wakeup
(qemu) info status
VM status: running
Suggested-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/r/1704312341-66640-3-git-send-email-steven.sistare@oracle.com
Signed-off-by: Peter Xu <peterx@redhat.com>