COLO: Add 'x-colo-lost-heartbeat' command to trigger failover
We leave users to choose whatever heartbeat solution they want,
if the heartbeat is lost, or other errors they detect, they can use
experimental command 'x_colo_lost_heartbeat' to tell COLO to do failover,
COLO will do operations accordingly.
For example, if the command is sent to the Primary side,
the Primary side will exit COLO mode, does cleanup work,
and then, PVM will take over the service work. If sent to the Secondary side,
the Secondary side will run failover work, then takes over PVM's service work.
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Eric Blake <eblake@redhat.com>
Cc: Markus Armbruster <armbru@redhat.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Amit Shah <amit@amitshah.net>
2016-10-27 08:43:03 +02:00
|
|
|
/*
|
|
|
|
* COarse-grain LOck-stepping Virtual Machines for Non-stop Service (COLO)
|
|
|
|
* (a.k.a. Fault Tolerance or Continuous Replication)
|
|
|
|
*
|
|
|
|
* Copyright (c) 2016 HUAWEI TECHNOLOGIES CO., LTD.
|
|
|
|
* Copyright (c) 2016 FUJITSU LIMITED
|
|
|
|
* Copyright (c) 2016 Intel Corporation
|
|
|
|
*
|
|
|
|
* This work is licensed under the terms of the GNU GPL, version 2 or
|
|
|
|
* later. See the COPYING file in the top-level directory.
|
|
|
|
*/
|
|
|
|
|
|
|
|
#include "qemu/osdep.h"
|
|
|
|
#include "migration/colo.h"
|
|
|
|
#include "migration/failover.h"
|
2017-04-24 16:51:10 +02:00
|
|
|
#include "qemu/main-loop.h"
|
|
|
|
#include "migration.h"
|
2018-02-01 12:18:31 +01:00
|
|
|
#include "qapi/error.h"
|
2018-02-11 10:36:01 +01:00
|
|
|
#include "qapi/qapi-commands-migration.h"
|
2016-10-27 08:43:04 +02:00
|
|
|
#include "qemu/error-report.h"
|
|
|
|
#include "trace.h"
|
COLO: Add 'x-colo-lost-heartbeat' command to trigger failover
We leave users to choose whatever heartbeat solution they want,
if the heartbeat is lost, or other errors they detect, they can use
experimental command 'x_colo_lost_heartbeat' to tell COLO to do failover,
COLO will do operations accordingly.
For example, if the command is sent to the Primary side,
the Primary side will exit COLO mode, does cleanup work,
and then, PVM will take over the service work. If sent to the Secondary side,
the Secondary side will run failover work, then takes over PVM's service work.
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Eric Blake <eblake@redhat.com>
Cc: Markus Armbruster <armbru@redhat.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Amit Shah <amit@amitshah.net>
2016-10-27 08:43:03 +02:00
|
|
|
|
|
|
|
static QEMUBH *failover_bh;
|
2016-10-27 08:43:04 +02:00
|
|
|
static FailoverStatus failover_state;
|
COLO: Add 'x-colo-lost-heartbeat' command to trigger failover
We leave users to choose whatever heartbeat solution they want,
if the heartbeat is lost, or other errors they detect, they can use
experimental command 'x_colo_lost_heartbeat' to tell COLO to do failover,
COLO will do operations accordingly.
For example, if the command is sent to the Primary side,
the Primary side will exit COLO mode, does cleanup work,
and then, PVM will take over the service work. If sent to the Secondary side,
the Secondary side will run failover work, then takes over PVM's service work.
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Eric Blake <eblake@redhat.com>
Cc: Markus Armbruster <armbru@redhat.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Amit Shah <amit@amitshah.net>
2016-10-27 08:43:03 +02:00
|
|
|
|
|
|
|
static void colo_failover_bh(void *opaque)
|
|
|
|
{
|
2016-10-27 08:43:04 +02:00
|
|
|
int old_state;
|
|
|
|
|
COLO: Add 'x-colo-lost-heartbeat' command to trigger failover
We leave users to choose whatever heartbeat solution they want,
if the heartbeat is lost, or other errors they detect, they can use
experimental command 'x_colo_lost_heartbeat' to tell COLO to do failover,
COLO will do operations accordingly.
For example, if the command is sent to the Primary side,
the Primary side will exit COLO mode, does cleanup work,
and then, PVM will take over the service work. If sent to the Secondary side,
the Secondary side will run failover work, then takes over PVM's service work.
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Eric Blake <eblake@redhat.com>
Cc: Markus Armbruster <armbru@redhat.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Amit Shah <amit@amitshah.net>
2016-10-27 08:43:03 +02:00
|
|
|
qemu_bh_delete(failover_bh);
|
|
|
|
failover_bh = NULL;
|
2016-10-27 08:43:04 +02:00
|
|
|
|
|
|
|
old_state = failover_set_state(FAILOVER_STATUS_REQUIRE,
|
|
|
|
FAILOVER_STATUS_ACTIVE);
|
|
|
|
if (old_state != FAILOVER_STATUS_REQUIRE) {
|
|
|
|
error_report("Unknown error for failover, old_state = %s",
|
2017-08-24 10:46:08 +02:00
|
|
|
FailoverStatus_str(old_state));
|
2016-10-27 08:43:04 +02:00
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
2019-04-26 11:07:28 +02:00
|
|
|
colo_do_failover();
|
COLO: Add 'x-colo-lost-heartbeat' command to trigger failover
We leave users to choose whatever heartbeat solution they want,
if the heartbeat is lost, or other errors they detect, they can use
experimental command 'x_colo_lost_heartbeat' to tell COLO to do failover,
COLO will do operations accordingly.
For example, if the command is sent to the Primary side,
the Primary side will exit COLO mode, does cleanup work,
and then, PVM will take over the service work. If sent to the Secondary side,
the Secondary side will run failover work, then takes over PVM's service work.
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Eric Blake <eblake@redhat.com>
Cc: Markus Armbruster <armbru@redhat.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Amit Shah <amit@amitshah.net>
2016-10-27 08:43:03 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
void failover_request_active(Error **errp)
|
|
|
|
{
|
2016-10-27 08:43:04 +02:00
|
|
|
if (failover_set_state(FAILOVER_STATUS_NONE,
|
|
|
|
FAILOVER_STATUS_REQUIRE) != FAILOVER_STATUS_NONE) {
|
2020-09-17 09:50:21 +02:00
|
|
|
error_setg(errp, "COLO failover is already activated");
|
2016-10-27 08:43:04 +02:00
|
|
|
return;
|
|
|
|
}
|
COLO: Add 'x-colo-lost-heartbeat' command to trigger failover
We leave users to choose whatever heartbeat solution they want,
if the heartbeat is lost, or other errors they detect, they can use
experimental command 'x_colo_lost_heartbeat' to tell COLO to do failover,
COLO will do operations accordingly.
For example, if the command is sent to the Primary side,
the Primary side will exit COLO mode, does cleanup work,
and then, PVM will take over the service work. If sent to the Secondary side,
the Secondary side will run failover work, then takes over PVM's service work.
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Eric Blake <eblake@redhat.com>
Cc: Markus Armbruster <armbru@redhat.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Amit Shah <amit@amitshah.net>
2016-10-27 08:43:03 +02:00
|
|
|
failover_bh = qemu_bh_new(colo_failover_bh, NULL);
|
|
|
|
qemu_bh_schedule(failover_bh);
|
|
|
|
}
|
|
|
|
|
2016-10-27 08:43:04 +02:00
|
|
|
void failover_init_state(void)
|
|
|
|
{
|
|
|
|
failover_state = FAILOVER_STATUS_NONE;
|
|
|
|
}
|
|
|
|
|
|
|
|
FailoverStatus failover_set_state(FailoverStatus old_state,
|
|
|
|
FailoverStatus new_state)
|
|
|
|
{
|
|
|
|
FailoverStatus old;
|
|
|
|
|
2020-09-23 12:56:46 +02:00
|
|
|
old = qatomic_cmpxchg(&failover_state, old_state, new_state);
|
2016-10-27 08:43:04 +02:00
|
|
|
if (old == old_state) {
|
2017-08-24 10:46:08 +02:00
|
|
|
trace_colo_failover_set_state(FailoverStatus_str(new_state));
|
2016-10-27 08:43:04 +02:00
|
|
|
}
|
|
|
|
return old;
|
|
|
|
}
|
|
|
|
|
|
|
|
FailoverStatus failover_get_state(void)
|
|
|
|
{
|
2020-09-23 12:56:46 +02:00
|
|
|
return qatomic_read(&failover_state);
|
2016-10-27 08:43:04 +02:00
|
|
|
}
|
|
|
|
|
COLO: Add 'x-colo-lost-heartbeat' command to trigger failover
We leave users to choose whatever heartbeat solution they want,
if the heartbeat is lost, or other errors they detect, they can use
experimental command 'x_colo_lost_heartbeat' to tell COLO to do failover,
COLO will do operations accordingly.
For example, if the command is sent to the Primary side,
the Primary side will exit COLO mode, does cleanup work,
and then, PVM will take over the service work. If sent to the Secondary side,
the Secondary side will run failover work, then takes over PVM's service work.
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Eric Blake <eblake@redhat.com>
Cc: Markus Armbruster <armbru@redhat.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Amit Shah <amit@amitshah.net>
2016-10-27 08:43:03 +02:00
|
|
|
void qmp_x_colo_lost_heartbeat(Error **errp)
|
|
|
|
{
|
2018-09-03 06:38:52 +02:00
|
|
|
if (get_colo_mode() == COLO_MODE_NONE) {
|
2023-02-07 08:51:14 +01:00
|
|
|
error_setg(errp, "VM is not in COLO mode");
|
COLO: Add 'x-colo-lost-heartbeat' command to trigger failover
We leave users to choose whatever heartbeat solution they want,
if the heartbeat is lost, or other errors they detect, they can use
experimental command 'x_colo_lost_heartbeat' to tell COLO to do failover,
COLO will do operations accordingly.
For example, if the command is sent to the Primary side,
the Primary side will exit COLO mode, does cleanup work,
and then, PVM will take over the service work. If sent to the Secondary side,
the Secondary side will run failover work, then takes over PVM's service work.
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Eric Blake <eblake@redhat.com>
Cc: Markus Armbruster <armbru@redhat.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Amit Shah <amit@amitshah.net>
2016-10-27 08:43:03 +02:00
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
failover_request_active(errp);
|
|
|
|
}
|