qemu-e2k

Commit Graph

Author	SHA1	Message	Date
Stefan Hajnoczi	019f8c19df	Migration patches for rc3: - One more memleak regression fix from Het -----BEGIN PGP SIGNATURE----- iIgEABYKADAWIQS5GE3CDMRX2s990ak7X8zN86vXBgUCZWoLbRIccGV0ZXJ4QHJl ZGhhdC5jb20ACgkQO1/MzfOr1wahYwD+OsD7CaZYjkl9KSooRfblEenD6SdfhAdC oZc07f2UxocA/0s1keDBZUUcZOiGYPDFV5his4Jw4F+RRD1YIpVWZg4J =T0/r -----END PGP SIGNATURE----- Merge tag 'migration-20231201-pull-request' of https://github.com/xzpeter/qemu into staging Migration patches for rc3: - One more memleak regression fix from Het # -----BEGIN PGP SIGNATURE----- # # iIgEABYKADAWIQS5GE3CDMRX2s990ak7X8zN86vXBgUCZWoLbRIccGV0ZXJ4QHJl # ZGhhdC5jb20ACgkQO1/MzfOr1wahYwD+OsD7CaZYjkl9KSooRfblEenD6SdfhAdC # oZc07f2UxocA/0s1keDBZUUcZOiGYPDFV5his4Jw4F+RRD1YIpVWZg4J # =T0/r # -----END PGP SIGNATURE----- # gpg: Signature made Fri 01 Dec 2023 11:35:57 EST # gpg: using EDDSA key B9184DC20CC457DACF7DD1A93B5FCCCDF3ABD706 # gpg: issuer "peterx@redhat.com" # gpg: Good signature from "Peter Xu <xzpeter@gmail.com>" [full] # gpg: aka "Peter Xu <peterx@redhat.com>" [full] # Primary key fingerprint: B918 4DC2 0CC4 57DA CF7D D1A9 3B5F CCCD F3AB D706 * tag 'migration-20231201-pull-request' of https://github.com/xzpeter/qemu: migration: Plug memory leak with migration URIs Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2023-12-04 08:01:24 -05:00
Het Gala	bc1d54ee51	migration: Plug memory leak with migration URIs migrate_uri_parse() allocates memory to 'channel' if the user opts for old syntax - uri, which is leaked because there is no code for freeing 'channel'. So, free channel to avoid memory leak in case where 'channels' is empty and uri parsing is required. Fixes: `5994024f` ("migration: Implement MigrateChannelList to qmp migration flow") Signed-off-by: Het Gala <het.gala@nutanix.com> Suggested-by: Markus Armbruster <armbru@redhat.com> Tested-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Link: https://lore.kernel.org/r/20231129204301.131228-1-het.gala@nutanix.com Signed-off-by: Peter Xu <peterx@redhat.com>	2023-12-01 11:01:28 -05:00
Zongmin Zhou	41581265aa	migration: free 'saddr' since be no longer used Since socket_parse() will allocate memory for 'saddr',and its value will pass to 'addr' that allocated by migrate_uri_parse(), then 'saddr' will no longer used,need to free. But due to 'saddr->u' is shallow copying the contents of the union, the members of this union containing allocated strings,and will be used after that. So just free 'saddr' itself without doing a deep free on the contents of the SocketAddress. Fixes: `72a8192e22` ("migration: convert migration 'uri' into 'MigrateAddress'") Signed-off-by: Zongmin Zhou<zhouzongmin@kylinos.cn> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-ID: <20231120031428.908295-1-zhouzongmin@kylinos.cn>	2023-11-30 09:51:24 +01:00
Het Gala	967f2de5c9	migration: Implement MigrateChannelList to hmp migration flow. Integrate MigrateChannelList with all transport backends (socket, exec and rdma) for both src and dest migration endpoints for hmp migration. Suggested-by: Aravind Retnakaran <aravind.retnakaran@nutanix.com> Signed-off-by: Het Gala <het.gala@nutanix.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Message-ID: <20231023182053.8711-14-farosas@suse.de> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-11-02 11:35:04 +01:00
Het Gala	5994024fd1	migration: Implement MigrateChannelList to qmp migration flow. Integrate MigrateChannelList with all transport backends (socket, exec and rdma) for both src and dest migration endpoints for qmp migration. For current series, limit the size of MigrateChannelList to single element (single interface) as runtime check. Suggested-by: Aravind Retnakaran <aravind.retnakaran@nutanix.com> Signed-off-by: Het Gala <het.gala@nutanix.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-ID: <20231023182053.8711-13-farosas@suse.de>	2023-11-02 11:35:04 +01:00
Het Gala	d95533e1cd	migration: modify migration_channels_and_uri_compatible() for new QAPI syntax migration_channels_and_uri_compatible() check for transport mechanism suitable for multifd migration gets executed when the caller calls old uri syntax. It needs it to be run when using the modern MigrateChannel QAPI syntax too. After URI -> 'MigrateChannel' : migration_channels_and_uri_compatible() -> migration_channels_and_transport_compatible() passes object as argument and check for valid transport mechanism. Suggested-by: Aravind Retnakaran <aravind.retnakaran@nutanix.com> Signed-off-by: Het Gala <het.gala@nutanix.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-ID: <20231023182053.8711-12-farosas@suse.de>	2023-11-02 11:35:04 +01:00
Het Gala	074dbce5fc	migration: New migrate and migrate-incoming argument 'channels' MigrateChannelList allows to connect accross multiple interfaces. Add MigrateChannelList struct as argument to migration QAPIs. We plan to include multiple channels in future, to connnect multiple interfaces. Hence, we choose 'MigrateChannelList' as the new argument over 'MigrateChannel' to make migration QAPIs future proof. Suggested-by: Aravind Retnakaran <aravind.retnakaran@nutanix.com> Signed-off-by: Het Gala <het.gala@nutanix.com> Acked-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-ID: <20231023182053.8711-10-farosas@suse.de>	2023-11-02 11:35:04 +01:00
Fabiano Rosas	02afba63e9	migration: Convert the file backend to the new QAPI syntax Convert the file: URI to accept a FileMigrationArgs to be compatible with the new migration QAPI. Signed-off-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-ID: <20231023182053.8711-9-farosas@suse.de>	2023-11-02 11:35:04 +01:00
Het Gala	cbab4face5	migration: convert exec backend to accept MigrateAddress. Exec transport backend for 'migrate'/'migrate-incoming' QAPIs accept new wire protocol of MigrateAddress struct. It is achived by parsing 'uri' string and storing migration parameters required for exec connection into strList struct. Suggested-by: Aravind Retnakaran <aravind.retnakaran@nutanix.com> Signed-off-by: Het Gala <het.gala@nutanix.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-ID: <20231023182053.8711-8-farosas@suse.de>	2023-11-02 11:35:04 +01:00
Het Gala	3fa9642ff7	migration: convert rdma backend to accept MigrateAddress RDMA based transport backend for 'migrate'/'migrate-incoming' QAPIs accept new wire protocol of MigrateAddress struct. It is achived by parsing 'uri' string and storing migration parameters required for RDMA connection into well defined InetSocketAddress struct. Suggested-by: Aravind Retnakaran <aravind.retnakaran@nutanix.com> Signed-off-by: Het Gala <het.gala@nutanix.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-ID: <20231023182053.8711-7-farosas@suse.de>	2023-11-02 11:35:03 +01:00
Het Gala	34dfc5e407	migration: convert socket backend to accept MigrateAddress Socket transport backend for 'migrate'/'migrate-incoming' QAPIs accept new wire protocol of MigrateAddress struct. It is achived by parsing 'uri' string and storing migration parameters required for socket connection into well defined SocketAddress struct. Suggested-by: Aravind Retnakaran <aravind.retnakaran@nutanix.com> Signed-off-by: Het Gala <het.gala@nutanix.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-ID: <20231023182053.8711-6-farosas@suse.de>	2023-11-02 11:35:03 +01:00
Het Gala	72a8192e22	migration: convert migration 'uri' into 'MigrateAddress' This patch parses 'migrate' and 'migrate-incoming' QAPI's 'uri' string containing migration connection related information and stores them inside well defined 'MigrateAddress' struct. Fabiano fixed for "file" transport. Suggested-by: Aravind Retnakaran <aravind.retnakaran@nutanix.com> Signed-off-by: Het Gala <het.gala@nutanix.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-ID: <20231023182053.8711-4-farosas@suse.de> Message-ID: <20231023182053.8711-5-farosas@suse.de>	2023-11-02 11:35:03 +01:00
Peter Xu	88577f3242	migration: Change ram_dirty_bitmap_reload() retval to bool Now we have a Error** passed into the return path thread stack, which is even clearer than an int retval. Change ram_dirty_bitmap_reload() and the callers to use a bool instead to replace errnos. Suggested-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-ID: <20231017202633.296756-5-peterx@redhat.com>	2023-11-02 11:35:03 +01:00
Peter Xu	f8c543e808	migration: Allow network to fail even during recovery Normally the postcopy recover phase should only exist for a super short period, that's the duration when QEMU is trying to recover from an interrupted postcopy migration, during which handshake will be carried out for continuing the procedure with state changes from PAUSED -> RECOVER -> POSTCOPY_ACTIVE again. Here RECOVER phase should be super small, that happens right after the admin specified a new but working network link for QEMU to reconnect to dest QEMU. However there can still be case where the channel is broken in this small RECOVER window. If it happens, with current code there's no way the src QEMU can got kicked out of RECOVER stage. No way either to retry the recover in another channel when established. This patch allows the RECOVER phase to fail itself too - we're mostly ready, just some small things missing, e.g. properly kick the main migration thread out when sleeping on rp_sem when we found that we're at RECOVER stage. When this happens, it fails the RECOVER itself, and rollback to PAUSED stage. Then the user can retry another round of recovery. To make it even stronger, teach QMP command migrate-pause to explicitly kick src/dst QEMU out when needed, so even if for some reason the migration thread didn't got kicked out already by a failing rethrn-path thread, the admin can also kick it out. This will be an super, super corner case, but still try to cover that. One can try to test this with two proxy channels for migration: (a) socat unix-listen:/tmp/src.sock,reuseaddr,fork tcp:localhost:10000 (b) socat tcp-listen:10000,reuseaddr,fork unix:/tmp/dst.sock So the migration channel will be: (a) (b) src -> /tmp/src.sock -> tcp:10000 -> /tmp/dst.sock -> dst Then to make QEMU hang at RECOVER stage, one can do below: (1) stop the postcopy using QMP command postcopy-pause (2) kill the 2nd proxy (b) (3) try to recover the postcopy using /tmp/src.sock on src (4) src QEMU will go into RECOVER stage but won't be able to continue from there, because the channel is actually broken at (b) Before this patch, step (4) will make src QEMU stuck in RECOVER stage, without a way to kick the QEMU out or continue the postcopy again. After this patch, (4) will quickly fail qemu and bounce back to PAUSED stage. Admin can also kick QEMU from (4) into PAUSED when needed using migrate-pause when needed. After bouncing back to PAUSED stage, one can recover again. Reported-by: Xiaohui Li <xiaohli@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2111332 Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-ID: <20231017202633.296756-3-peterx@redhat.com>	2023-11-02 11:35:03 +01:00
Peter Xu	7aa6070d09	migration: Refactor error handling in source return path rp_state.error was a boolean used to show error happened in return path thread. That's not only duplicating error reporting (migrate_set_error), but also not good enough in that we only do error_report() and set it to true, we never can keep a history of the exact error and show it in query-migrate. To make this better, a few things done: - Use error_setg() rather than error_report() across the whole lifecycle of return path thread, keeping the error in an Error*. - With above, no need to have mark_source_rp_bad(), remove it, alongside with rp_state.error itself. - Use migrate_set_error() to apply that captured error to the global migration object when error occured in this thread. - Do the same when detected qemufile error in source return path We need to re-export qemu_file_get_error_obj() to do the last one. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-ID: <20231017202633.296756-2-peterx@redhat.com>	2023-11-02 11:35:03 +01:00
Steve Sistare	fa3673e497	migration: per-mode blockers Extend the blocker interface so that a blocker can be registered for one or more migration modes. The existing interfaces register a blocker for all modes, and the new interfaces take a varargs list of modes. Internally, maintain a separate blocker list per mode. The same Error object may be added to multiple lists. When a block is deleted, it is removed from every list, and the Error is freed. No functional change until a new mode is added. Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-ID: <1698263069-406971-3-git-send-email-steven.sistare@oracle.com>	2023-11-01 16:13:59 +01:00
Peter Xu	3e5f3bcdc2	migration: Add tracepoints for downtime checkpoints This patch is inspired by Joao Martin's patch here: https://lore.kernel.org/r/20230926161841.98464-1-joao.m.martins@oracle.com Add tracepoints for major downtime checkpoints on both src and dst. They share the same tracepoint with a string showing its stage. Besides the checkpoints in the previous patch, this patch also added destination checkpoints. On src, we have these checkpoints added: - src-downtime-start: right before vm stops on src - src-vm-stopped: after vm is fully stopped - src-iterable-saved: after all iterables saved (END sections) - src-non-iterable-saved: after all non-iterable saved (FULL sections) - src-downtime-stop: migration fully completed On dst, we have these checkpoints added: - dst-precopy-loadvm-completes: after loadvm all done for precopy - dst-precopy-bh-: record BH steps to resume VM for precopy - dst-postcopy-bh-: record BH steps to resume VM for postcopy On dst side, we don't have a good way to trace total time consumed by iterable or non-iterable for now. We can mark it by 1st time receiving a FULL / END section, but rather than that let's just rely on the other tracepoints added for vmstates to back up the information. With this patch, one can enable "vmstate_downtime*" tracepoints and it'll enable all tracepoints for downtime measurements necessary. Drop loadvm_postcopy_handle_run_bh() tracepoint alongside, because they service the same purpose, which was only for postcopy. We then have unified prefix for all downtime relevant tracepoints. Co-developed-by: Joao Martins <joao.m.martins@oracle.com> Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-ID: <20231030163346.765724-6-peterx@redhat.com>	2023-11-01 16:13:58 +01:00
Peter Xu	93bdf888fa	migration: migration_stop_vm() helper Provide a helper for non-COLO use case of migration to stop a VM. This prepares for adding some downtime relevant tracepoints to migration, where they may or may not apply to COLO. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-ID: <20231030163346.765724-5-peterx@redhat.com>	2023-11-01 16:13:58 +01:00
Peter Xu	e22ffad03a	migration: Add migration_downtime_start\|end() helpers Unify the three users on recording downtimes with the same pair of helpers. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-ID: <20231030163346.765724-3-peterx@redhat.com>	2023-11-01 16:13:58 +01:00
Peter Xu	62f5da7dd1	migration: Set downtime_start even for postcopy Postcopy calculates its downtime separately. It always sets MigrationState.downtime properly, but not MigrationState.downtime_start. Make postcopy do the same as other modes on properly recording the timestamp when the VM is going to be stopped. Drop the temporary variable in postcopy_start() along the way. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-ID: <20231030163346.765724-2-peterx@redhat.com>	2023-11-01 16:13:58 +01:00
Juan Quintela	be07a0ed22	qemu-file: Make qemu_fflush() return errors This let us simplify code of this shape. qemu_fflush(f); int ret = qemu_file_get_error(f); if (ret) { return ret; } into: int ret = qemu_fflush(f); if (ret) { return ret; } I updated all callers where there is any error check. qemu_fclose() don't need to check for f->last_error because qemu_fflush() returns it at the beggining of the function. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-ID: <20231025091117.6342-13-quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-10-31 08:44:33 +01:00
Juan Quintela	897fd8bdce	migration: Use migration_transferred_bytes() There are only two differnces with the old value: - the amount of QEMUFile that hasn't yet been flushed. It can be discussed what is more exact, the new or the old one. - the amount of transferred bytes that we forgot to account for (the newer is better, i.e. exact). Notice that this two values are used to: a - present to the user b - calculate the rate_limit So a few KB here and there is not going to make a difference. Reviewed-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-ID: <20231025091117.6342-11-quintela@redhat.com>	2023-10-31 08:44:33 +01:00
Juan Quintela	0743f41fd2	migration: migration_rate_limit_reset() don't need the QEMUFile Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-ID: <20231025091117.6342-9-quintela@redhat.com>	2023-10-31 08:44:33 +01:00
Juan Quintela	f57e5a6ce5	migration: migration_transferred_bytes() don't need the QEMUFile Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-ID: <20231025091117.6342-8-quintela@redhat.com>	2023-10-31 08:44:33 +01:00
Juan Quintela	8846b5bfca	migration: migrate 'blk' command option is deprecated. Use blocked-mirror with NBD instead. Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-ID: <20231018115513.2163-4-quintela@redhat.com>	2023-10-31 08:44:33 +01:00
Juan Quintela	40101f320d	migration: migrate 'inc' command option is deprecated. Use blockdev-mirror with NBD instead. Reviewed-by: Thomas Huth <thuth@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-ID: <20231018115513.2163-3-quintela@redhat.com>	2023-10-31 08:44:33 +01:00
Steve Sistare	d9cda21303	migration: simplify notifiers Pass the callback function to add_migration_state_change_notifier so that migration can initialize the notifier on add and clear it on delete, which simplifies the call sites. Shorten the function names so the extra arg can be added more legibly. Hide the global notifier list in a new function migration_call_notifiers, and make it externally visible so future live update code can call it. No functional change. Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Peter Xu <peterx@redhat.com> Tested-by: Michael Galaxy <mgalaxy@akamai.com> Reviewed-by: Michael Galaxy <mgalaxy@akamai.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-ID: <1686148954-250144-1-git-send-email-steven.sistare@oracle.com>	2023-10-20 08:51:41 +02:00
Steve Sistare	c8a7fc5179	migration: simplify blockers Modify migrate_add_blocker and migrate_del_blocker to take an Error reason. This allows migration to own the Error object, so that if an error occurs in migrate_add_blocker, migration code can free the Error and clear the client handle, simplifying client code. It also simplifies the migrate_del_blocker call site. In addition, this is a pre-requisite for a proposed future patch that would add a mode argument to migration requests to support live update, and maintain a list of blockers for each mode. A blocker may apply to a single mode or to multiple modes, and passing Error will allow one Error object to be registered for multiple modes. No functional change. Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Tested-by: Michael Galaxy <mgalaxy@akamai.com> Reviewed-by: Michael Galaxy <mgalaxy@akamai.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-ID: <1697634216-84215-1-git-send-email-steven.sistare@oracle.com>	2023-10-20 08:51:41 +02:00
Juan Quintela	6f60900573	migration: Create populate_compress() So we don't have to access compression_counters from outside ram-compress.c. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Lukas Straub <lukasstraub2@web.de> Message-ID: <20230613145757.10131-7-quintela@redhat.com>	2023-10-17 22:14:51 +02:00
Juan Quintela	809f188a1a	migration: Move compression_counters cleanup ram-compress.c Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Lukas Straub <lukasstraub2@web.de> Message-ID: <20230613145757.10131-6-quintela@redhat.com>	2023-10-17 22:14:51 +02:00
Juan Quintela	b88a3306fd	migration: RDMA is not compatible with anything else So give an error instead of just ignoring the other methods. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Lukas Straub <lukasstraub2@web.de> Message-ID: <20230613145757.10131-4-quintela@redhat.com>	2023-10-17 22:14:51 +02:00
Juan Quintela	27fd25b0fb	migration: Create migrate_rdma() Helper to say if we are doing a migration over rdma. Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-ID: <20231011203527.9061-2-quintela@redhat.com>	2023-10-17 09:25:13 +02:00
Fiona Ebner	930e239d11	migration: hold the BQL during setup This is intended to be a semantic revert of commit `9b09503752` ("migration: run setup callbacks out of big lock"). There have been so many changes since that commit (e.g. a new setup callback dirty_bitmap_save_setup() that also needs to be adapted now), it's easier to do the revert manually. For snapshots, the bdrv_writev_vmstate() function is used during setup (in QIOChannelBlock backing the QEMUFile), but not holding the BQL while calling it could lead to an assertion failure. To understand how, first note the following: 1. Generated coroutine wrappers for block layer functions spawn the coroutine and use AIO_WAIT_WHILE()/aio_poll() to wait for it. 2. If the host OS switches threads at an inconvenient time, it can happen that a bottom half scheduled for the main thread's AioContext is executed as part of a vCPU thread's aio_poll(). An example leading to the assertion failure is as follows: main thread: 1. A snapshot-save QMP command gets issued. 2. snapshot_save_job_bh() is scheduled. vCPU thread: 3. aio_poll() for the main thread's AioContext is called (e.g. when the guest writes to a pflash device, as part of blk_pwrite which is a generated coroutine wrapper). 4. snapshot_save_job_bh() is executed as part of aio_poll(). 3. qemu_savevm_state() is called. 4. qemu_mutex_unlock_iothread() is called. Now qemu_get_current_aio_context() returns 0x0. 5. bdrv_writev_vmstate() is executed during the usual savevm setup via qemu_fflush(). But this function is a generated coroutine wrapper, so it uses AIO_WAIT_WHILE. There, the assertion assert(qemu_get_current_aio_context() == qemu_get_aio_context()); will fail. To fix it, ensure that the BQL is held during setup. While it would only be needed for snapshots, adapting migration too avoids additional logic for conditional locking/unlocking in the setup callbacks. Writing the header could (in theory) also trigger qemu_fflush() and thus bdrv_writev_vmstate(), so the locked section also covers the qemu_savevm_state_header() call, even for migration for consistency. The section around multifd_send_sync_main() needs to be unlocked to avoid a deadlock. In particular, the multifd_save_setup() function calls socket_send_channel_create() using multifd_new_send_channel_async() as a callback and then waits for the callback to signal via the channels_ready semaphore. The connection happens via qio_task_run_in_thread(), but the callback is only executed via qio_task_thread_result() which is scheduled for the main event loop. Without unlocking the section, the main thread would never get to process the task result and the callback meaning there would be no signal via the channels_ready semaphore. The comment in ram_init_bitmaps() was introduced by `4987783400` ("migration: fix incorrect memory_global_dirty_log_start outside BQL") and is removed, because it referred to the qemu_mutex_lock_iothread() call. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-ID: <20231013105839.415989-1-f.ebner@proxmox.com>	2023-10-17 09:25:13 +02:00
Nikolay Borisov	2aae1eb8da	migration: Add the configuration vmstate to the json writer Make the migration json writer part of MigrationState struct, allowing the 'configuration' object be serialized to json. This will facilitate the parsing of the 'configuration' object in the next patch that fixes analyze-migration.py for arm. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-ID: <20231009184326.15777-2-farosas@suse.de>	2023-10-17 09:14:32 +02:00
Peter Xu	8b2395970a	migration: Allow user to specify available switchover bandwidth Migration bandwidth is a very important value to live migration. It's because it's one of the major factors that we'll make decision on when to switchover to destination in a precopy process. This value is currently estimated by QEMU during the whole live migration process by monitoring how fast we were sending the data. This can be the most accurate bandwidth if in the ideal world, where we're always feeding unlimited data to the migration channel, and then it'll be limited to the bandwidth that is available. However in reality it may be very different, e.g., over a 10Gbps network we can see query-migrate showing migration bandwidth of only a few tens of MB/s just because there are plenty of other things the migration thread might be doing. For example, the migration thread can be busy scanning zero pages, or it can be fetching dirty bitmap from other external dirty sources (like vhost or KVM). It means we may not be pushing data as much as possible to migration channel, so the bandwidth estimated from "how many data we sent in the channel" can be dramatically inaccurate sometimes. With that, the decision to switchover will be affected, by assuming that we may not be able to switchover at all with such a low bandwidth, but in reality we can. The migration may not even converge at all with the downtime specified, with that wrong estimation of bandwidth, keeping iterations forever with a low estimation of bandwidth. The issue is QEMU itself may not be able to avoid those uncertainties on measuing the real "available migration bandwidth". At least not something I can think of so far. One way to fix this is when the user is fully aware of the available bandwidth, then we can allow the user to help providing an accurate value. For example, if the user has a dedicated channel of 10Gbps for migration for this specific VM, the user can specify this bandwidth so QEMU can always do the calculation based on this fact, trusting the user as long as specified. It may not be the exact bandwidth when switching over (in which case qemu will push migration data as fast as possible), but much better than QEMU trying to wildly guess, especially when very wrong. A new parameter "avail-switchover-bandwidth" is introduced just for this. So when the user specified this parameter, instead of trusting the estimated value from QEMU itself (based on the QEMUFile send speed), it trusts the user more by using this value to decide when to switchover, assuming that we'll have such bandwidth available then. Note that specifying this value will not throttle the bandwidth for switchover yet, so QEMU will always use the full bandwidth possible for sending switchover data, assuming that should always be the most important way to use the network at that time. This can resolve issues like "unconvergence migration" which is caused by hilarious low "migration bandwidth" detected for whatever reason. Reported-by: Zhiyi Guo <zhguo@redhat.com> Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-ID: <20231010221922.40638-1-peterx@redhat.com>	2023-10-17 09:14:32 +02:00
Wei Wang	d7f5a04320	migration: refactor migration_completion Current migration_completion function is a bit long. Refactor the long implementation into different subfunctions: - migration_completion_precopy: completion code related to precopy - migration_completion_postcopy: completion code related to postcopy Rename await_return_path_close_on_source to close_return_path_on_source: It is renamed to match with open_return_path_on_source. This improves readability and is easier for future updates (e.g. add new subfunctions when completion code related to new features are needed). No functional changes intended. Signed-off-by: Wei Wang <wei.w.wang@intel.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Isaku Yamahata <isaku.yamahata@intel.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Message-ID: <20230804093053.5037-1-wei.w.wang@intel.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-10-17 09:14:32 +02:00
Peter Xu	5e79a4bf03	migration: Add migration_rp_wait\|kick() It's just a simple wrapper for rp_sem on either wait() or kick(), make it even clearer on how it is used. Prepared to be used even for other things. Reviewed-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Peter Xu <peterx@redhat.com> Message-ID: <20231004220240.167175-8-peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-10-11 11:17:05 +02:00
Peter Xu	2b2f6f411e	migration: Introduce migrate_has_error() Introduce a helper to detect whether MigrationState.error is set for whatever reason. This is preparation work for any thread (e.g. source return path thread) to setup errors in an unified way to MigrationState, rather than relying on its own way to set errors (mark_source_rp_bad()). Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-ID: <20231004220240.167175-3-peterx@redhat.com>	2023-10-11 11:17:05 +02:00
Peter Xu	c94143e587	migration: Display error in query-migrate irrelevant of status Display it as long as being set, irrelevant of FAILED status. E.g., it may also be applicable to PAUSED stage of postcopy, to provide hint on what has gone wrong. The error_mutex seems to be overlooked when referencing the error, add it to be very safe. This will change QAPI behavior by showing up error message outside !FAILED status, but it's intended and doesn't expect to break anyone. Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2018404 Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-ID: <20231004220240.167175-2-peterx@redhat.com>	2023-10-11 11:17:04 +02:00
Fabiano Rosas	4111a732e8	migration: Set migration status early in incoming side We are sending a migration event of MIGRATION_STATUS_SETUP at qemu_start_incoming_migration but never actually setting the state. This creates a window between qmp_migrate_incoming and process_incoming_migration_co where the migration status is still MIGRATION_STATUS_NONE. Calling query-migrate during this time will return an empty response even though the incoming migration command has already been issued. Commit `7cf1fe6d68` ("migration: Add migration events on target side") has added support to the 'events' capability to the incoming part of migration, but chose to send the SETUP event without setting the state. I'm assuming this was a mistake. This introduces a change in behavior, any QMP client waiting for the SETUP event will hang, unless it has previously enabled the 'events' capability. Having the capability enabled is sufficient to continue to receive the event. Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-ID: <20230712190742.22294-5-farosas@suse.de>	2023-10-11 11:17:02 +02:00
Steve Sistare	2a9e2e595f	migration: file URI Extend the migration URI to support file:<filename>. This can be used for any migration scenario that does not require a reverse path. It can be used as an alternative to 'exec:cat > file' in minimized containers that do not contain /bin/sh, and it is easier to use than the fd:<fdname> URI. It can be used in HMP commands, and as a qemu command-line parameter. For best performance, guest ram should be shared and x-ignore-shared should be true, so guest pages are not written to the file, in which case the guest may remain running. If ram is not so configured, then the user is advised to stop the guest first. Otherwise, a busy guest may re-dirty the same page, causing it to be appended to the file multiple times, and the file may grow unboundedly. That issue is being addressed in the "fixed-ram" patch series. Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Tested-by: Michael Galaxy <mgalaxy@akamai.com> Reviewed-by: Michael Galaxy <mgalaxy@akamai.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-ID: <1694182931-61390-2-git-send-email-steven.sistare@oracle.com>	2023-10-04 13:16:58 +02:00
Tejus GK	f4e1b61336	migration: Refactor repeated call of yank_unregister_instance In the function qmp_migrate(), yank_unregister_instance() gets called twice which isn't required. Hence, refactoring it so that it gets called during the local_error cleanup. Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Acked-by: Peter Xu <peterx@redhat.com> Signed-off-by: Tejus GK <tejus.gk@nutanix.com> Message-ID: <20230621130940.178659-3-tejus.gk@nutanix.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-09-29 17:05:23 +02:00
Fabiano Rosas	36e9aab3c5	migration: Move return path cleanup to main migration thread Now that the return path thread is allowed to finish during a paused migration, we can move the cleanup of the QEMUFiles to the main migration thread. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-ID: <20230918172822.19052-9-farosas@suse.de>	2023-09-27 13:58:02 -04:00
Fabiano Rosas	ef796ee93b	migration: Replace the return path retry logic Replace the return path retry logic with finishing and restarting the thread. This fixes a race when resuming the migration that leads to a segfault. Currently when doing postcopy we consider that an IO error on the return path file could be due to a network intermittency. We then keep the thread alive but have it do cleanup of the 'from_dst_file' and wait on the 'postcopy_pause_rp' semaphore. When the user issues a migrate resume, a new return path is opened and the thread is allowed to continue. There's a race condition in the above mechanism. It is possible for the new return path file to be setup before the cleanup code in the return path thread has had a chance to run, leading to the new file being closed and the pointer set to NULL. When the thread is released after the resume, it tries to dereference 'from_dst_file' and crashes: Thread 7 "return path" received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7fffd1dbf700 (LWP 9611)] 0x00005555560e4893 in qemu_file_get_error_obj (f=0x0, errp=0x0) at ../migration/qemu-file.c:154 154 return f->last_error; (gdb) bt #0 0x00005555560e4893 in qemu_file_get_error_obj (f=0x0, errp=0x0) at ../migration/qemu-file.c:154 #1 0x00005555560e4983 in qemu_file_get_error (f=0x0) at ../migration/qemu-file.c:206 #2 0x0000555555b9a1df in source_return_path_thread (opaque=0x555556e06000) at ../migration/migration.c:1876 #3 0x000055555602e14f in qemu_thread_start (args=0x55555782e780) at ../util/qemu-thread-posix.c:541 #4 0x00007ffff38d76ea in start_thread (arg=0x7fffd1dbf700) at pthread_create.c:477 #5 0x00007ffff35efa6f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 Here's the race (important bit is open_return_path happening before migration_release_dst_files): migration \| qmp \| return path --------------------------+-----------------------------+--------------------------------- qmp_migrate_pause() shutdown(ms->to_dst_file) f->last_error = -EIO migrate_detect_error() postcopy_pause() set_state(PAUSED) wait(postcopy_pause_sem) qmp_migrate(resume) migrate_fd_connect() resume = state == PAUSED open_return_path <-- TOO SOON! set_state(RECOVER) post(postcopy_pause_sem) (incoming closes to_src_file) res = qemu_file_get_error(rp) migration_release_dst_files() ms->rp_state.from_dst_file = NULL post(postcopy_pause_rp_sem) postcopy_pause_return_path_thread() wait(postcopy_pause_rp_sem) rp = ms->rp_state.from_dst_file goto retry qemu_file_get_error(rp) SIGSEGV ------------------------------------------------------------------------------------------- We can keep the retry logic without having the thread alive and waiting. The only piece of data used by it is the 'from_dst_file' and it is only allowed to proceed after a migrate resume is issued and the semaphore released at migrate_fd_connect(). Move the retry logic to outside the thread by waiting for the thread to finish before pausing the migration. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-ID: <20230918172822.19052-8-farosas@suse.de>	2023-09-27 13:58:02 -04:00
Fabiano Rosas	d50f5dc075	migration: Consolidate return path closing code We'll start calling the await_return_path_close_on_source() function from other parts of the code, so move all of the related checks and tracepoints into it. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-ID: <20230918172822.19052-7-farosas@suse.de>	2023-09-27 13:58:02 -04:00
Fabiano Rosas	b3b101157d	migration: Remove redundant cleanup of postcopy_qemufile_src This file is owned by the return path thread which is already doing cleanup. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-ID: <20230918172822.19052-6-farosas@suse.de>	2023-09-27 13:58:02 -04:00
Fabiano Rosas	7478fb0df9	migration: Fix possible race when shutting down to_dst_file It's not safe to call qemu_file_shutdown() on the to_dst_file without first checking for the file's presence under the lock. The cleanup of this file happens at postcopy_pause() and migrate_fd_cleanup() which are not necessarily running in the same thread as migrate_fd_cancel(). Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-ID: <20230918172822.19052-5-farosas@suse.de>	2023-09-27 13:58:02 -04:00
Fabiano Rosas	639decf529	migration: Fix possible races when shutting down the return path We cannot call qemu_file_shutdown() on the return path file without taking the file lock. The return path thread could be running it's cleanup code and have just cleared the from_dst_file pointer. Checking ms->to_dst_file for errors could also race with migrate_fd_cleanup() which clears the to_dst_file pointer. Protect both accesses by taking the file lock. This was caught by inspection, it should be rare, but the next patches will start calling this code from other places, so let's do the correct thing. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-ID: <20230918172822.19052-4-farosas@suse.de>	2023-09-27 13:58:02 -04:00
Fabiano Rosas	28a8347281	migration: Fix possible race when setting rp_state.error We don't need to set the rp_state.error right after a shutdown because qemu_file_shutdown() always sets the QEMUFile error, so the return path thread would have seen it and set the rp error itself. Setting the error outside of the thread is also racy because the thread could clear it after we set it. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-ID: <20230918172822.19052-3-farosas@suse.de>	2023-09-27 13:58:02 -04:00
Peter Xu	cf02f29e1e	migration: Fix race that dest preempt thread close too early We hit intermit CI issue on failing at migration-test over the unit test preempt/plain: qemu-system-x86_64: Unable to read from socket: Connection reset by peer Memory content inconsistency at 5b43000 first_byte = bd last_byte = bc current = 4f hit_edge = 1 ** ERROR:../tests/qtest/migration-test.c:300:check_guests_ram: assertion failed: (bad == 0) (test program exited with status code -6) Fabiano debugged into it and found that the preempt thread can quit even without receiving all the pages, which can cause guest not receiving all the pages and corrupt the guest memory. To make sure preempt thread finished receiving all the pages, we can rely on the page_requested_count being zero because preempt channel will only receive requested page faults. Note, not all the faulted pages are required to be sent via the preempt channel/thread; imagine the case when a requested page is just queued into the background main channel for migration, the src qemu will just still send it via the background channel. Here instead of spinning over reading the count, we add a condvar so the main thread can wait on it if that unusual case happened, without burning the cpu for no good reason, even if the duration is short; so even if we spin in this rare case is probably fine. It's just better to not do so. The condvar is only used when that special case is triggered. Some memory ordering trick is needed to guarantee it from happening (against the preempt thread status field), so the main thread will always get a kick when that triggers correctly. Closes: https://gitlab.com/qemu-project/qemu/-/issues/1886 Debugged-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-ID: <20230918172822.19052-2-farosas@suse.de>	2023-09-27 13:58:02 -04:00

1 2 3 4 5 ...

684 Commits