Commit Graph

973 Commits

Author SHA1 Message Date
Peter Maydell f6c63c0dbf * ASAN fixes
-----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAlyHw88UHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroNL4Qf/UPunPKY/OK47evFGPG0ZMGF3IxOp
 OgM0MMBOPdSMaLuI+cgmI+U1+hOqw9Vf/eyyfRFZCTQXjr1BQL0exAG+KvBeLOSC
 h1hJmpecc0IS2D3DaXDI2SvlLr7AFAVIY2JR9lCdJW99mC6HROSeaWnjQ0XflxTM
 2BSl1FDzO6bHz3OgUHM2NAPYzjpwTOq7ZnaTd20a7zE+7ef7iEJ3edRHEg+RmHtN
 gMwOkZw1Ip5Zn5hCjJbURZG+OMOKY4/mSqV6a9IByQ5Kws8rhb38f9wpA09C7y3S
 Q7Tv1XIT84sVg7B0eToQObzmkagA6NGJuNy+TleOeTemntEmzQGQ4fk6Zw==
 =ybUj
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging

* ASAN fixes

# gpg: Signature made Tue 12 Mar 2019 14:35:59 GMT
# gpg:                using RSA key F13338574B662389866C7682BFFBD25F78C7AE83
# gpg:                issuer "pbonzini@redhat.com"
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [full]
# gpg:                 aka "Paolo Bonzini <pbonzini@redhat.com>" [full]
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4  E2F7 7E15 100C CD36 69B1
#      Subkey fingerprint: F133 3857 4B66 2389 866C  7682 BFFB D25F 78C7 AE83

* remotes/bonzini/tags/for-upstream:
  test-migration: fix memory leak
  migration: fix memory leak
  test-bdrv-graph-mod: fix Error leak
  test-char: fix undefined behavior

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2019-03-14 12:02:12 +00:00
Eric Blake 796a3798ab bitmaps: Fix typo in function name
Commit a88b179f introduced the ability to set and query bitmap
persistence, but with an atypical spelling.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-id: 20190308205845.25734-1-eblake@redhat.com
Signed-off-by: John Snow <jsnow@redhat.com>
2019-03-12 12:05:49 -04:00
John Snow 3ae96d6684 block/dirty-bitmaps: add block_dirty_bitmap_check function
Instead of checking against busy, inconsistent, or read only directly,
use a check function with permissions bits that let us streamline the
checks without reproducing them in many places.

Included in this patch are permissions changes that simply add the
inconsistent check to existing permissions call spots, without
addressing existing bugs.

In general, this means that busy+readonly checks become BDRV_BITMAP_DEFAULT,
which checks against all three conditions. busy-only checks become
BDRV_BITMAP_ALLOW_RO.

Notably, remove allows inconsistent bitmaps, so it doesn't follow the pattern.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20190301191545.8728-4-jsnow@redhat.com
Signed-off-by: John Snow <jsnow@redhat.com>
2019-03-12 12:05:49 -04:00
John Snow 27a1b301a4 block/dirty-bitmaps: unify qmp_locked and user_locked calls
These mean the same thing now. Unify them and rename the merged call
bdrv_dirty_bitmap_busy to indicate semantically what we are describing,
as well as help disambiguate from the various _locked and _unlocked
versions of bitmap helpers that refer to mutex locks.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20190223000614.13894-8-jsnow@redhat.com
Signed-off-by: John Snow <jsnow@redhat.com>
2019-03-12 12:05:48 -04:00
John Snow 50a47257f8 block/dirty-bitmaps: rename frozen predicate helper
"Frozen" was a good description a long time ago, but it isn't adequate now.
Rename the frozen predicate to has_successor to make the semantics of the
predicate more clear to outside callers.

In the process, remove some calls to frozen() that no longer semantically
make sense. For bdrv_enable_dirty_bitmap_locked and
bdrv_disable_dirty_bitmap_locked, it doesn't make sense to prohibit QEMU
internals from performing this action when we only wished to prohibit QMP
users from issuing these commands. All of the QMP API commands for bitmap
manipulation already check against user_locked() to prohibit these actions.

Several other assertions really want to check that the bitmap isn't in-use
by another operation -- use the bitmap_user_locked function for this instead,
which presently also checks for has_successor. This leaves some redundant
checks of has_successor through different helpers that are addressed in
forthcoming patches.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20190223000614.13894-3-jsnow@redhat.com
Signed-off-by: John Snow <jsnow@redhat.com>
2019-03-12 12:05:48 -04:00
Paolo Bonzini 5e78bc6a47 migration: fix memory leak
Reported by ASAN.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-03-12 15:18:40 +01:00
Marc-André Lureau d890344166 slirp: use libslirp migration code
slirp migration code uses QEMU vmstate so far, when building WITH_QEMU.

Introduce slirp_state_{load,save,version}() functions to move the
state saving handling to libslirp side.

So far, the bitstream compatibility should remain equal with current
QEMU, as this is effectively using the same code, with the same format
etc. When libslirp is made standalone, we will need some mechanism to
ensure bitstream compatibility regardless of the libslirp version
installed. See the FIXME note in the code.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20190212162524.31504-3-marcandre.lureau@redhat.com>
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
2019-03-07 12:46:31 +01:00
Zhang Chen db00972922 Migration/colo.c: Make COLO node running after failover
Delay to close COLO for auto start VM after failover.

Signed-off-by: Zhang Chen <chen.zhang@intel.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20190303145021.2962-4-chen.zhang@intel.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2019-03-06 10:49:18 +00:00
Zhang Chen b8b5734b09 Migration/colo.c: Fix double close bug when occur COLO failover
In migration_incoming_state_destroy(void) will check the mis->to_src_file
to double close the mis->to_src_file when occur COLO failover.

Signed-off-by: Zhang Chen <chen.zhang@intel.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20190303145021.2962-2-chen.zhang@intel.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2019-03-06 10:49:18 +00:00
Wei Wang 6eeb63f740 migration/ram.c: add the free page optimization enable flag
This patch adds the free page optimization enable flag, and a function
to set this flag. When the free page optimization is enabled, not all
the pages are needed to be sent in the bulk stage.

Why using a new flag, instead of directly disabling ram_bulk_stage when
the optimization is running?
Thanks for Peter Xu's reminder that disabling ram_bulk_stage will affect
the use of compression. Please see save_page_use_compression. When
xbzrle and compression are used, if free page optimizaion causes the
ram_bulk_stage to be disabled, save_page_use_compression will return
false, which disables the use of compression. That is, if free page
optimization avoids the sending of half of the guest pages, the other
half of pages loses the benefits of compression in the meantime. Using a
new flag to let migration_bitmap_find_dirty skip the free pages in the
bulk stage will avoid the above issue.

Signed-off-by: Wei Wang <wei.w.wang@intel.com>
CC: Dr. David Alan Gilbert <dgilbert@redhat.com>
CC: Juan Quintela <quintela@redhat.com>
CC: Michael S. Tsirkin <mst@redhat.com>
CC: Peter Xu <peterx@redhat.com>
Message-Id: <1544516693-5395-7-git-send-email-wei.w.wang@intel.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2019-03-06 10:49:18 +00:00
Wei Wang bd2270608f migration/ram.c: add a notifier chain for precopy
This patch adds a notifier chain for the memory precopy. This enables various
precopy optimizations to be invoked at specific places.

Signed-off-by: Wei Wang <wei.w.wang@intel.com>
CC: Dr. David Alan Gilbert <dgilbert@redhat.com>
CC: Juan Quintela <quintela@redhat.com>
CC: Michael S. Tsirkin <mst@redhat.com>
CC: Peter Xu <peterx@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Message-Id: <1544516693-5395-6-git-send-email-wei.w.wang@intel.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2019-03-06 10:49:18 +00:00
Wei Wang 6bcb05fc42 migration: API to clear bits of guest free pages from the dirty bitmap
This patch adds an API to clear bits corresponding to guest free pages
from the dirty bitmap. Spilt the free page block if it crosses the QEMU
RAMBlock boundary.

Signed-off-by: Wei Wang <wei.w.wang@intel.com>
CC: Dr. David Alan Gilbert <dgilbert@redhat.com>
CC: Juan Quintela <quintela@redhat.com>
CC: Michael S. Tsirkin <mst@redhat.com>
CC: Peter Xu <peterx@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Message-Id: <1544516693-5395-5-git-send-email-wei.w.wang@intel.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2019-03-06 10:49:18 +00:00
Wei Wang 386a907b37 migration: use bitmap_mutex in migration_bitmap_clear_dirty
The bitmap mutex is used to synchronize threads to update the dirty
bitmap and the migration_dirty_pages counter. For example, the free
page optimization clears bits of free pages from the bitmap in an
iothread context. This patch makes migration_bitmap_clear_dirty update
the bitmap and counter under the mutex.

Signed-off-by: Wei Wang <wei.w.wang@intel.com>
CC: Dr. David Alan Gilbert <dgilbert@redhat.com>
CC: Juan Quintela <quintela@redhat.com>
CC: Michael S. Tsirkin <mst@redhat.com>
CC: Peter Xu <peterx@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Message-Id: <1544516693-5395-4-git-send-email-wei.w.wang@intel.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2019-03-06 10:49:18 +00:00
Juan Quintela 9aca82ba31 migration: Create socket-address parameter
It will be used to store the uri parameters. We want this only for
tcp, so we don't set it for other uris.  We need it to know what port
is migration running.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
  dgilbert: Removed DummyStruct as suggested by Eric & Markus

--
2019-03-06 10:49:17 +00:00
Yury Kotov 6cafc8e4dd migration: Add capabilities validation
Currently we don't check which capabilities set in the source QEMU.
We just expect that the target QEMU has the same enabled capabilities.

Add explicit validation for capabilities to make sure that the target VM
has them too. This is enabled for only new capabilities to keep compatibily.

Signed-off-by: Yury Kotov <yury-kotov@yandex-team.ru>
Message-Id: <20190215174548.2630-6-yury-kotov@yandex-team.ru>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
  dgilbert: Manual merge
2019-03-06 10:49:17 +00:00
Yury Kotov fbd162e629 migration: Add an ability to ignore shared RAM blocks
If ignore-shared capability is set then skip shared RAMBlocks during the
RAM migration.
Also, move qemu_ram_foreach_migratable_block (and rename) to the
migration code, because it requires access to the migration capabilities.

Signed-off-by: Yury Kotov <yury-kotov@yandex-team.ru>
Message-Id: <20190215174548.2630-4-yury-kotov@yandex-team.ru>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2019-03-06 10:49:17 +00:00
Yury Kotov 18269069c3 migration: Introduce ignore-shared capability
We want to use local migration to update QEMU for running guests.
In this case we don't need to migrate shared (file backed) RAM.
So, add a capability to ignore such blocks during live migration.

Signed-off-by: Yury Kotov <yury-kotov@yandex-team.ru>
Message-Id: <20190215174548.2630-3-yury-kotov@yandex-team.ru>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2019-03-06 10:49:17 +00:00
Yury Kotov 754cb9c0eb exec: Change RAMBlockIterFunc definition
Currently, qemu_ram_foreach_* calls RAMBlockIterFunc with many
block-specific arguments. But often iter func needs RAMBlock*.
This refactoring is needed for fast access to RAMBlock flags from
qemu_ram_foreach_block's callback. The only way to achieve this now
is to call qemu_ram_block_from_host (which also enumerates blocks).

So, this patch reduces complexity of
qemu_ram_foreach_block() -> cb() -> qemu_ram_block_from_host()
from O(n^2) to O(n).

Fix RAMBlockIterFunc definition and add some functions to read
RAMBlock* fields witch were passed.

Signed-off-by: Yury Kotov <yury-kotov@yandex-team.ru>
Message-Id: <20190215174548.2630-2-yury-kotov@yandex-team.ru>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2019-03-06 10:49:17 +00:00
Marcel Apfelbaum 9589e76301 migration/rdma: clang compilation fix
Configuring QEMU with:
        ../configure --cc=clang --enable-rdma

Leads to compilation error:

  CC      migration/rdma.o
  CC      migration/block.o
  qemu/migration/rdma.c:3615:58: error: taking address of packed member 'rkey' of class or structure
      'RDMARegisterResult' may result in an unaligned pointer value [-Werror,-Waddress-of-packed-member]
                            (uintptr_t)host_addr, NULL, &reg_result->rkey,
                                                         ^~~~~~~~~~~~~~~~
Fix it by using a temp local variable.

Signed-off-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Message-Id: <20190304184923.24215-1-marcel.apfelbaum@gmail.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
2019-03-06 10:49:17 +00:00
Dr. David Alan Gilbert 892ae715b6 migration: Cleanup during exit
Currently we cleanup the migration object as we exit main after the
main_loop finishes; however if there's a migration running things
get messy and we can end up with the migration thread still trying
to access freed structures.

We now take a ref to the object around the migration thread itself,
so the act of dropping the ref during exit doesn't cause us to lose
the state until the thread quits.

Cancelling the migration during migration also tries to get the thread
to quit.

We do this a bit earlier; so hopefully migration gets out of the way
before all the devices etc are freed.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20190227164900.16378-1-dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2019-03-06 10:49:17 +00:00
Dr. David Alan Gilbert cf75e26849 migration/rdma: Fix qemu_rdma_cleanup null check
If the migration fails before the channel is open (e.g. a bad
address) we end up in the cleanup with rdma->channel==NULL.

Spotted by Coverity: CID 1398634
Fixes: fbbaacab27
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20190214185351.5927-1-dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
2019-03-06 10:49:17 +00:00
Dr. David Alan Gilbert c3c5eae6ac migration: Fix cancel state
During a cancelled migration there's a race where the fd can
go into an error state before we get back around the migration loop
and migration_detect_error transitions from cancelling->failed.

Check for cancelled/cancelling and don't change the state.

Red Hat bug: https://bugzilla.redhat.com/show_bug.cgi?id=1608649

Fixes: b23c2ade25
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20190219195928.12289-1-dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
2019-03-06 10:49:17 +00:00
Dr. David Alan Gilbert 7659505c16 migration: Switch to using announce timer
Switch the announcements to using the new announce timer.
Move the code that does it to announce.c rather than savevm
because it really has nothing to do with the actual migration.

Migration starts the announce from bh's and so they're all
in the main thread/bql, and so there's never any racing with
the timers themselves.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2019-03-05 11:27:41 +08:00
Dr. David Alan Gilbert ee3d96baf3 migration: Add announce parameters
Add migration parameters that control RARP/GARP announcement timeouts.

Based on earlier patches by myself and
  Vladislav Yasevich <vyasevic@redhat.com>

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2019-03-05 11:27:41 +08:00
Dr. David Alan Gilbert 50510ea2c2 net: Introduce announce timer
The 'announce timer' will be used by migration, and explicit
requests for qemu to perform network announces.

Based on the work by Germano Veit Michel <germano@redhat.com>
 and Vlad Yasevich <vyasevic@redhat.com>

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2019-03-05 11:27:41 +08:00
Vladimir Sementsov-Ogievskiy f556f37b11 migration/block: use qemu_iovec_init_buf
Use new qemu_iovec_init_buf() instead of
qemu_iovec_init_external( ... , 1), which simplifies the code.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20190218140926.333779-14-vsementsov@virtuozzo.com
Message-Id: <20190218140926.333779-14-vsementsov@virtuozzo.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2019-02-22 09:42:13 +00:00
Xiao Guangrong aecbfe9c64 migration: introduce pages-per-second
It introduces a new statistic, pages-per-second, as bandwidth or mbps is
not enough to measure the performance of posting pages out as we have
compression, xbzrle, which can significantly reduce the amount of the
data size, instead, pages-per-second is the one we want

Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com>
Message-Id: <20190111063732.10484-2-xiaoguangrong@tencent.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
  With typo's Eric spotted fixed
2019-01-23 15:51:47 +00:00
Marc-André Lureau de22ded044 vmstate: constify SaveVMHandlers
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20181114133139.27346-1-marcandre.lureau@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2019-01-23 15:51:47 +00:00
Dr. David Alan Gilbert fbbaacab27 migration/rdma: unregister fd handler
Unregister the fd handler before we destroy the channel,
otherwise we've got a race where we might land in the
fd handler just as we're closing the device.

(The race is quite data dependent, you just have to have
the right set of devices for it to trigger).

Corresponds to RH bz: https://bugzilla.redhat.com/show_bug.cgi?id=1666601

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20190122173111.29821-1-dgilbert@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2019-01-23 15:51:32 +00:00
Fei Li 6d99c2d41c migration: unify error handling for process_incoming_migration_co
In the current code, if process_incoming_migration_co() fails we do
the same error handing: set the error state, close the source file,
do the cleanup for multifd, and then exit(EXIT_FAILURE). To make the
code clearer, add a "goto fail" to unify the error handling.

Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Fei Li <fli@suse.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20190113140849.38339-6-lifei1214@126.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2019-01-23 15:02:07 +00:00
Fei Li 91b02dc750 migration: add more error handling for postcopy_ram_enable_notify
Call postcopy_ram_incoming_cleanup() to do the cleanup when
postcopy_ram_enable_notify fails. Besides, report the error
message when qemu_ram_foreach_migratable_block() fails.

Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Fei Li <fli@suse.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20190113140849.38339-5-lifei1214@126.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2019-01-23 15:02:07 +00:00
Fei Li 1398b2e3fe migration: multifd_save_cleanup() can't fail, simplify
multifd_save_cleanup() takes an Error ** argument and returns an
error code even though it can't actually fail.  Its callers
dutifully check for failure.  Remove the useless argument and return
value, and simplify the callers.

Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Cc: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Fei Li <fli@suse.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-Id: <20190113140849.38339-4-lifei1214@126.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2019-01-23 15:02:07 +00:00
Fei Li 49ed0d24a4 migration: fix the multifd code when receiving less channels
In our current code, when multifd is used during migration, if there
is an error before the destination receives all new channels, the
source keeps running, however the destination does not exit but keeps
waiting until the source is killed deliberately.

Fix this by dumping the specific error and let users decide whether
to quit from the destination side when failing to receive packet via
some channel. And update the comment for multifd_recv_new_channel().

Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Fei Li <fli@suse.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Message-Id: <20190113140849.38339-3-lifei1214@126.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2019-01-23 15:02:07 +00:00
Aaron Lindsay 8c07559fc7 migration: Add post_save function to VMStateDescription
In some cases it may be helpful to modify state before saving it for
migration, and then modify the state back after it has been saved. The
existing pre_save function provides half of this functionality. This
patch adds a post_save function to provide the second half.

Signed-off-by: Aaron Lindsay <aclindsa@gmail.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-id: 20181211151945.29137-2-aaron@os.amperecomputing.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2019-01-21 10:38:55 +00:00
Philippe Mathieu-Daudé a346af9c88 migration: Use strnlen() for fixed-size string
GCC 8 introduced the -Wstringop-overflow, which detect buffer overflow
by string-modifying functions declared in <string.h>, such strncpy(),
used in global_state_store_running().

GCC indeed found an incorrect use of strlen(), because this array
is loaded by VMSTATE_BUFFER(runstate, GlobalState) then parsed
using qapi_enum_parse which does not get the buffer length.

Use strnlen() which returns sizeof(s->runstate) if the array is not
NUL-terminated, assert the size is within range, and enforce the array
to be NUL-terminated to avoid an overflow in qapi_enum_parse().

This fixes:

    CC      migration/global_state.o
  qemu/migration/global_state.c: In function 'global_state_pre_save':
  qemu/migration/global_state.c:109:15: error: 'strlen' argument 1 declared attribute 'nonstring' [-Werror=stringop-overflow=]
       s->size = strlen((char *)s->runstate) + 1;
                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~
  qemu/migration/global_state.c:24:13: note: argument 'runstate' declared here
       uint8_t runstate[100] QEMU_NONSTRING;
               ^~~~~~~~
  cc1: all warnings being treated as errors
  make: *** [qemu/rules.mak:69: migration/global_state.o] Error 1

Suggested-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2019-01-17 21:10:57 -05:00
Marc-André Lureau 0a5526a18b migration: Fix stringop-truncation warning
GCC 8 added a -Wstringop-truncation warning:

  The -Wstringop-truncation warning added in GCC 8.0 via r254630 for
  bug 81117 is specifically intended to highlight likely unintended
  uses of the strncpy function that truncate the terminating NUL
  character from the source string.

This new warning leads to compilation failures:

    CC      migration/global_state.o
  qemu/migration/global_state.c: In function 'global_state_store_running':
  qemu/migration/global_state.c:45:5: error: 'strncpy' specified bound 100 equals destination size [-Werror=stringop-truncation]
       strncpy((char *)global_state.runstate, state, sizeof(global_state.runstate));
       ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  make: *** [qemu/rules.mak:69: migration/global_state.o] Error 1

Adding an assert is enough to silence GCC.

(alternatively, we could hard-code "running")

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
[PMD: More verbose commit message]
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2019-01-17 21:10:57 -05:00
Paolo Bonzini b58deb344d qemu/queue.h: leave head structs anonymous unless necessary
Most list head structs need not be given a name.  In most cases the
name is given just in case one is going to use QTAILQ_LAST, QTAILQ_PREV
or reverse iteration, but this does not apply to lists of other kinds,
and even for QTAILQ in practice this is only rarely needed.  In addition,
we will soon reimplement those macros completely so that they do not
need a name for the head struct.  So clean up everything, not giving a
name except in the rare case where it is necessary.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-01-11 15:46:55 +01:00
Daniel Henrique Barboza fb06411210 qmp hmp: Make system_wakeup check wake-up support and run state
The qmp/hmp command 'system_wakeup' is simply a direct call to
'qemu_system_wakeup_request' from vl.c. This function verifies if
runstate is SUSPENDED and if the wake up reason is valid before
proceeding. However, no error or warning is thrown if any of those
pre-requirements isn't met. There is no way for the caller to
differentiate between a successful wakeup or an error state caused
when trying to wake up a guest that wasn't suspended.

This means that system_wakeup is silently failing, which can be
considered a bug. Adding error handling isn't an API break in this
case - applications that didn't check the result will remain broken,
the ones that check it will have a chance to deal with it.

Adding to that, the commit before previous created a new QMP API called
query-current-machine, with a new flag called wakeup-suspend-support,
that indicates if the guest has the capability of waking up from suspended
state. Although such guest will never reach SUSPENDED state and erroring
it out in this scenario would suffice, it is more informative for the user
to differentiate between a failure because the guest isn't suspended versus
a failure because the guest does not have support for wake up at all.

All this considered, this patch changes qmp_system_wakeup to check if
the guest is capable of waking up from suspend, and if it is suspended.
After this patch, this is the output of system_wakeup in a guest that
does not have wake-up from suspend support (ppc64):

(qemu) system_wakeup
wake-up from suspend is not supported by this guest
(qemu)

And this is the output of system_wakeup in a x86 guest that has the
support but isn't suspended:

(qemu) system_wakeup
Unable to wake up: guest is not in suspended state
(qemu)

Reported-by: Balamuruhan S <bala24@linux.vnet.ibm.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20181205194701.17836-4-danielhb413@gmail.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Acked-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2018-12-18 07:55:47 +01:00
Marc-André Lureau 335d10cd8e qapi: add conditions to REPLICATION type/commands on the schema
Add #if defined(CONFIG_REPLICATION) in generated code, and adjust the
code accordingly.

Made conditional:

* xen-set-replication, query-xen-replication-status,
  xen-colo-do-checkpoint

  Before the patch, we first register the commands unconditionally in
  generated code (requires a stub), then conditionally unregister in
  qmp_unregister_commands_hack().

  Afterwards, we register only when CONFIG_REPLICATION.  The command
  fails exactly the same, with CommandNotFound.

  Improvement, because now query-qmp-schema is accurate, and we're one
  step closer to killing qmp_unregister_commands_hack().

* enum BlockdevDriver value "replication" in command blockdev-add

* BlockdevOptions variant @replication

and related structures.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20181213123724.4866-23-marcandre.lureau@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2018-12-14 06:52:48 +01:00
Marc-André Lureau 03fee66fde vmstate: constify VMStateField
Because they are supposed to remain const.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20181114132931.22624-1-marcandre.lureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-11-27 15:35:15 +01:00
Paolo Bonzini 5aaac46793 migration: savevm: consult migration blockers
There is really no difference between live migration and savevm, except
that savevm does not require bdrv_invalidate_cache to be implemented
by all disks.  However, it is unlikely that savevm is used with anything
except qcow2 disks, so the penalty is small and worth the improvement
in catching bad usage of savevm.

Only one place was taking care of savevm when adding a migration blocker,
and it can be removed.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-11-27 15:06:14 +01:00
Zhang Chen 7e934f5b27 migration/migration.c: Add COLO dependency checks
Current COLO mode(independent disk mode) need replication module work
together. Suggested by Dr. David Alan Gilbert <dgilbert@redhat.com>.

Signed-off-by: Zhang Chen <chen.zhang@intel.com>
Message-Id: <20181114190912.7242-1-chen.zhang@intel.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2018-11-21 11:38:12 +00:00
Zhang Chen 3ebb9c4f52 migration/colo.c: Fix compilation issue when disable replication
This compilation issue will occur when user use --disable-replication
to config Qemu.

Reported-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Zhang Chen <zhangckid@gmail.com>
Message-Id: <20181101021226.6353-1-zhangckid@gmail.com>
Tested-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2018-11-21 11:20:14 +00:00
Jia Lina 3d63da16fb migration: avoid segmentfault when take a snapshot of a VM which being migrated
During an active background migration, snapshot will trigger a
segmentfault. As snapshot clears the "current_migration" struct
and updates "to_dst_file" before it finds out that there is a
migration task, Migration accesses the null pointer in
"current_migration" struct and qemu crashes eventually.

Signed-off-by: Jia Lina <jialina01@baidu.com>
Signed-off-by: Chai Wen <chaiwen@baidu.com>
Signed-off-by: Zhang Yu <zhangyu31@baidu.com>
Message-Id: <20181026083620.10172-1-jialina01@baidu.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2018-10-31 09:38:59 +00:00
Vladimir Sementsov-Ogievskiy 9c98f145df dirty-bitmaps: clean-up bitmaps loading and migration logic
This patch aims to bring the following behavior:

1. We don't load bitmaps, when started in inactive mode. It's the case
of incoming migration. In this case we wait for bitmaps migration
through migration channel (if 'dirty-bitmaps' capability is enabled) or
for invalidation (to load bitmaps from the image).

2. We don't remove persistent bitmaps on inactivation. Instead, we only
remove bitmaps after storing. This is the only way to restore bitmaps,
if we decided to resume source after [failed] migration with
'dirty-bitmaps' capability enabled (which means, that bitmaps were not
stored).

3. We load bitmaps on open and any invalidation, it's ok for all cases:
  - normal open
  - migration target invalidation with dirty-bitmaps capability
    (bitmaps are migrating through migration channel, the are not
     stored, so they should have IN_USE flag set and will be skipped
     when loading. However, it would fail if bitmaps are read-only[1])
  - migration target invalidation without dirty-bitmaps capability
    (normal load of the bitmaps, if migrated with shared storage)
  - source invalidation with dirty-bitmaps capability
    (skip because IN_USE)
  - source invalidation without dirty-bitmaps capability
    (bitmaps were dropped, reload them)

[1]: to accurately handle this, migration of read-only bitmaps is
     explicitly forbidden in this patch.

New mechanism for not storing bitmaps when migrate with dirty-bitmaps
capability is introduced: migration filed in BdrvDirtyBitmap.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: John Snow <jsnow@redhat.com>
2018-10-29 16:23:17 -04:00
John Snow 993edc0ce0 block/dirty-bitmaps: add user_locked status checker
Instead of both frozen and qmp_locked checks, wrap it into one check.
frozen implies the bitmap is split in two (for backup), and shouldn't
be modified. qmp_locked implies it's being used by another operation,
like being exported over NBD. In both cases it means we shouldn't allow
the user to modify it in any meaningful way.

Replace any usages where we check both frozen and qmp_locked with the
new check.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 20181002230218.13949-2-jsnow@redhat.com
[w/edits Suggested-By: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>]
Signed-off-by: John Snow <jsnow@redhat.com>
2018-10-29 16:23:16 -04:00
Peter Maydell 13399aad4f Error reporting patches for 2018-10-22
-----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJbzcCHAAoJEDhwtADrkYZT3YsP/2qE4HNY/htj3IP6vNJuSaqw
 CLPRTz7zWmUBTE6FqSkvLsq3X2BMFFLeaIPA9EFcbyn2km6qPqBYgg9ElXXvPZBm
 6hDeRIoC8FdRD0Apozd5MGC94/lE47PheDRV8V+4KrGLaaMXEPxMZ0wP4AfdS5pS
 6Pt2xuF7nPu1+OWVxMk0fXadGjGLEuOQQmTh3B21J5RaynQ3gtd6h7XFC/LJyOGG
 LC/6GyPc0h7KU83VnvrRjH/EOpu1wENgrsvWsS0sem8op35Z+i9jU5BfCp4qFkDy
 gCHHUEyEeyexS+W+Tj87eBtK2gfrqQx9ovo8CIsWcUwpKbdD6AMK4FKGsDNMNHab
 Kg5u/M+O8nHCB7DuursF+3mqEbZHb05cfKe6JEtiq49EuORMV5hp4Ap966noSwTw
 UEU0NJNA1p8EdmXVudyyyYR7wpoSSmZpoenA+bJ3nthK8K0KcU4RUGk6ZEbxfJy+
 7ENl+3R2IxmxzgXv/x0tz0uFisaVW1rltTXtMte+ElQsO0qy74iHdfR7JHsmLxj9
 CO/ABMVoYsWq2OJv8pWLrdKpT4v3HQLJdHhknyu0ZcJGDyICqX29ULLEhPrNEZvW
 rxVxAkiemlaqxlUjbrM46CDQQm+w03OCnk7aCYcV4oK+u5+o3mCag705gMPErapZ
 6uOE3fAjiWw43sA31mek
 =kPZX
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/armbru/tags/pull-error-2018-10-22' into staging

Error reporting patches for 2018-10-22

# gpg: Signature made Mon 22 Oct 2018 13:20:23 BST
# gpg:                using RSA key 3870B400EB918653
# gpg: Good signature from "Markus Armbruster <armbru@redhat.com>"
# gpg:                 aka "Markus Armbruster <armbru@pond.sub.org>"
# Primary key fingerprint: 354B C8B3 D7EB 2A6B 6867  4E5F 3870 B400 EB91 8653

* remotes/armbru/tags/pull-error-2018-10-22: (40 commits)
  error: Drop bogus "use error_setg() instead" admonitions
  vpc: Fail open on bad header checksum
  block: Clean up bdrv_img_create()'s error reporting
  vl: Simplify call of parse_name()
  vl: Fix exit status for -drive format=help
  blockdev: Convert drive_new() to Error
  vl: Assert drive_new() does not fail in default_drive()
  fsdev: Clean up error reporting in qemu_fsdev_add()
  spice: Clean up error reporting in add_channel()
  tpm: Clean up error reporting in tpm_init_tpmdev()
  numa: Clean up error reporting in parse_numa()
  vnc: Clean up error reporting in vnc_init_func()
  ui: Convert vnc_display_init(), init_keyboard_layout() to Error
  ui/keymaps: Fix handling of erroneous include files
  vl: Clean up error reporting in device_init_func()
  vl: Clean up error reporting in parse_fw_cfg()
  vl: Clean up error reporting in mon_init_func()
  vl: Clean up error reporting in machine_set_property()
  vl: Clean up error reporting in chardev_init_func()
  qom: Clean up error reporting in user_creatable_add_opts_foreach()
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-10-23 17:20:23 +01:00
Markus Armbruster 4dd32b3dda migration: Fix !replay_can_snapshot() error handling
Calling error_report() in a function that takes an Error ** argument
is suspicious.  save_snapshot() and load_snapshot() do that, and then
fail without setting an error.  Wrong.  The HMP commands survive this
unscathed, since hmp_handle_error() does nothing when no error has
been set.  Callers main() (on behalf of -loadvm) and
replay_vmstate_init() crash, but I'm not sure the error is possible
there.

Screwed up when commit 377b21ccea (v2.12.0) added incorrect error
handling right next to correct examples.  Fix by calling error_setg()
instead of error_report().

Fixes: 377b21ccea
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20181017082702.5581-13-armbru@redhat.com>
2018-10-19 14:51:34 +02:00
Markus Armbruster 4b5766488f error: Fix use of error_prepend() with &error_fatal, &error_abort
From include/qapi/error.h:

  * Pass an existing error to the caller with the message modified:
  *     error_propagate(errp, err);
  *     error_prepend(errp, "Could not frobnicate '%s': ", name);

Fei Li pointed out that doing error_propagate() first doesn't work
well when @errp is &error_fatal or &error_abort: the error_prepend()
is never reached.

Since I doubt fixing the documentation will stop people from getting
it wrong, introduce error_propagate_prepend(), in the hope that it
lures people away from using its constituents in the wrong order.
Update the instructions in error.h accordingly.

Convert existing error_prepend() next to error_propagate to
error_propagate_prepend().  If any of these get reached with
&error_fatal or &error_abort, the error messages improve.  I didn't
check whether that's the case anywhere.

Cc: Fei Li <fli@suse.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20181017082702.5581-2-armbru@redhat.com>
2018-10-19 14:51:34 +02:00
zhanghailiang 2518aec192 COLO: quick failover process by kick COLO thread
COLO thread may sleep at qemu_sem_wait(&s->colo_checkpoint_sem),
while failover works begin, It's better to wakeup it to quick
the process.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2018-10-19 11:15:03 +08:00