9c2037d0a4
This patch implements initial vmstate creation or loading at the start of record/replay. It is needed for rewinding the execution in the replay mode. v4 changes: - snapshots are not created by default anymore v3 changes: - added rrsnapshot option Signed-off-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru> Message-Id: <20170124071746.4572.61449.stgit@PASHA-ISP> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
228 lines
11 KiB
Plaintext
228 lines
11 KiB
Plaintext
Copyright (c) 2010-2015 Institute for System Programming
|
|
of the Russian Academy of Sciences.
|
|
|
|
This work is licensed under the terms of the GNU GPL, version 2 or later.
|
|
See the COPYING file in the top-level directory.
|
|
|
|
Record/replay
|
|
-------------
|
|
|
|
Record/replay functions are used for the reverse execution and deterministic
|
|
replay of qemu execution. This implementation of deterministic replay can
|
|
be used for deterministic debugging of guest code through a gdb remote
|
|
interface.
|
|
|
|
Execution recording writes a non-deterministic events log, which can be later
|
|
used for replaying the execution anywhere and for unlimited number of times.
|
|
It also supports checkpointing for faster rewinding during reverse debugging.
|
|
Execution replaying reads the log and replays all non-deterministic events
|
|
including external input, hardware clocks, and interrupts.
|
|
|
|
Deterministic replay has the following features:
|
|
* Deterministically replays whole system execution and all contents of
|
|
the memory, state of the hardware devices, clocks, and screen of the VM.
|
|
* Writes execution log into the file for later replaying for multiple times
|
|
on different machines.
|
|
* Supports i386, x86_64, and ARM hardware platforms.
|
|
* Performs deterministic replay of all operations with keyboard and mouse
|
|
input devices.
|
|
|
|
Usage of the record/replay:
|
|
* First, record the execution, by adding the following arguments to the command line:
|
|
'-icount shift=7,rr=record,rrfile=replay.bin -net none'.
|
|
Block devices' images are not actually changed in the recording mode,
|
|
because all of the changes are written to the temporary overlay file.
|
|
* Then you can replay it by using another command
|
|
line option: '-icount shift=7,rr=replay,rrfile=replay.bin -net none'
|
|
* '-net none' option should also be specified if network replay patches
|
|
are not applied.
|
|
|
|
Papers with description of deterministic replay implementation:
|
|
http://www.computer.org/csdl/proceedings/csmr/2012/4666/00/4666a553-abs.html
|
|
http://dl.acm.org/citation.cfm?id=2786805.2803179
|
|
|
|
Modifications of qemu include:
|
|
* wrappers for clock and time functions to save their return values in the log
|
|
* saving different asynchronous events (e.g. system shutdown) into the log
|
|
* synchronization of the bottom halves execution
|
|
* synchronization of the threads from thread pool
|
|
* recording/replaying user input (mouse and keyboard)
|
|
* adding internal checkpoints for cpu and io synchronization
|
|
|
|
Non-deterministic events
|
|
------------------------
|
|
|
|
Our record/replay system is based on saving and replaying non-deterministic
|
|
events (e.g. keyboard input) and simulating deterministic ones (e.g. reading
|
|
from HDD or memory of the VM). Saving only non-deterministic events makes
|
|
log file smaller, simulation faster, and allows using reverse debugging even
|
|
for realtime applications.
|
|
|
|
The following non-deterministic data from peripheral devices is saved into
|
|
the log: mouse and keyboard input, network packets, audio controller input,
|
|
USB packets, serial port input, and hardware clocks (they are non-deterministic
|
|
too, because their values are taken from the host machine). Inputs from
|
|
simulated hardware, memory of VM, software interrupts, and execution of
|
|
instructions are not saved into the log, because they are deterministic and
|
|
can be replayed by simulating the behavior of virtual machine starting from
|
|
initial state.
|
|
|
|
We had to solve three tasks to implement deterministic replay: recording
|
|
non-deterministic events, replaying non-deterministic events, and checking
|
|
that there is no divergence between record and replay modes.
|
|
|
|
We changed several parts of QEMU to make event log recording and replaying.
|
|
Devices' models that have non-deterministic input from external devices were
|
|
changed to write every external event into the execution log immediately.
|
|
E.g. network packets are written into the log when they arrive into the virtual
|
|
network adapter.
|
|
|
|
All non-deterministic events are coming from these devices. But to
|
|
replay them we need to know at which moments they occur. We specify
|
|
these moments by counting the number of instructions executed between
|
|
every pair of consecutive events.
|
|
|
|
Instruction counting
|
|
--------------------
|
|
|
|
QEMU should work in icount mode to use record/replay feature. icount was
|
|
designed to allow deterministic execution in absence of external inputs
|
|
of the virtual machine. We also use icount to control the occurrence of the
|
|
non-deterministic events. The number of instructions elapsed from the last event
|
|
is written to the log while recording the execution. In replay mode we
|
|
can predict when to inject that event using the instruction counter.
|
|
|
|
Timers
|
|
------
|
|
|
|
Timers are used to execute callbacks from different subsystems of QEMU
|
|
at the specified moments of time. There are several kinds of timers:
|
|
* Real time clock. Based on host time and used only for callbacks that
|
|
do not change the virtual machine state. For this reason real time
|
|
clock and timers does not affect deterministic replay at all.
|
|
* Virtual clock. These timers run only during the emulation. In icount
|
|
mode virtual clock value is calculated using executed instructions counter.
|
|
That is why it is completely deterministic and does not have to be recorded.
|
|
* Host clock. This clock is used by device models that simulate real time
|
|
sources (e.g. real time clock chip). Host clock is the one of the sources
|
|
of non-determinism. Host clock read operations should be logged to
|
|
make the execution deterministic.
|
|
* Virtual real time clock. This clock is similar to real time clock but
|
|
it is used only for increasing virtual clock while virtual machine is
|
|
sleeping. Due to its nature it is also non-deterministic as the host clock
|
|
and has to be logged too.
|
|
|
|
Checkpoints
|
|
-----------
|
|
|
|
Replaying of the execution of virtual machine is bound by sources of
|
|
non-determinism. These are inputs from clock and peripheral devices,
|
|
and QEMU thread scheduling. Thread scheduling affect on processing events
|
|
from timers, asynchronous input-output, and bottom halves.
|
|
|
|
Invocations of timers are coupled with clock reads and changing the state
|
|
of the virtual machine. Reads produce non-deterministic data taken from
|
|
host clock. And VM state changes should preserve their order. Their relative
|
|
order in replay mode must replicate the order of callbacks in record mode.
|
|
To preserve this order we use checkpoints. When a specific clock is processed
|
|
in record mode we save to the log special "checkpoint" event.
|
|
Checkpoints here do not refer to virtual machine snapshots. They are just
|
|
record/replay events used for synchronization.
|
|
|
|
QEMU in replay mode will try to invoke timers processing in random moment
|
|
of time. That's why we do not process a group of timers until the checkpoint
|
|
event will be read from the log. Such an event allows synchronizing CPU
|
|
execution and timer events.
|
|
|
|
Two other checkpoints govern the "warping" of the virtual clock.
|
|
While the virtual machine is idle, the virtual clock increments at
|
|
1 ns per *real time* nanosecond. This is done by setting up a timer
|
|
(called the warp timer) on the virtual real time clock, so that the
|
|
timer fires at the next deadline of the virtual clock; the virtual clock
|
|
is then incremented (which is called "warping" the virtual clock) as
|
|
soon as the timer fires or the CPUs need to go out of the idle state.
|
|
Two functions are used for this purpose; because these actions change
|
|
virtual machine state and must be deterministic, each of them creates a
|
|
checkpoint. qemu_start_warp_timer checks if the CPUs are idle and if so
|
|
starts accounting real time to virtual clock. qemu_account_warp_timer
|
|
is called when the CPUs get an interrupt or when the warp timer fires,
|
|
and it warps the virtual clock by the amount of real time that has passed
|
|
since qemu_start_warp_timer.
|
|
|
|
Bottom halves
|
|
-------------
|
|
|
|
Disk I/O events are completely deterministic in our model, because
|
|
in both record and replay modes we start virtual machine from the same
|
|
disk state. But callbacks that virtual disk controller uses for reading and
|
|
writing the disk may occur at different moments of time in record and replay
|
|
modes.
|
|
|
|
Reading and writing requests are created by CPU thread of QEMU. Later these
|
|
requests proceed to block layer which creates "bottom halves". Bottom
|
|
halves consist of callback and its parameters. They are processed when
|
|
main loop locks the global mutex. These locks are not synchronized with
|
|
replaying process because main loop also processes the events that do not
|
|
affect the virtual machine state (like user interaction with monitor).
|
|
|
|
That is why we had to implement saving and replaying bottom halves callbacks
|
|
synchronously to the CPU execution. When the callback is about to execute
|
|
it is added to the queue in the replay module. This queue is written to the
|
|
log when its callbacks are executed. In replay mode callbacks are not processed
|
|
until the corresponding event is read from the events log file.
|
|
|
|
Sometimes the block layer uses asynchronous callbacks for its internal purposes
|
|
(like reading or writing VM snapshots or disk image cluster tables). In this
|
|
case bottom halves are not marked as "replayable" and do not saved
|
|
into the log.
|
|
|
|
Block devices
|
|
-------------
|
|
|
|
Block devices record/replay module intercepts calls of
|
|
bdrv coroutine functions at the top of block drivers stack.
|
|
To record and replay block operations the drive must be configured
|
|
as following:
|
|
-drive file=disk.qcow,if=none,id=img-direct
|
|
-drive driver=blkreplay,if=none,image=img-direct,id=img-blkreplay
|
|
-device ide-hd,drive=img-blkreplay
|
|
|
|
blkreplay driver should be inserted between disk image and virtual driver
|
|
controller. Therefore all disk requests may be recorded and replayed.
|
|
|
|
All block completion operations are added to the queue in the coroutines.
|
|
Queue is flushed at checkpoints and information about processed requests
|
|
is recorded to the log. In replay phase the queue is matched with
|
|
events read from the log. Therefore block devices requests are processed
|
|
deterministically.
|
|
|
|
Snapshotting
|
|
------------
|
|
|
|
New VM snapshots may be created in replay mode. They can be used later
|
|
to recover the desired VM state. All VM states created in replay mode
|
|
are associated with the moment of time in the replay scenario.
|
|
After recovering the VM state replay will start from that position.
|
|
|
|
Default starting snapshot name may be specified with icount field
|
|
rrsnapshot as follows:
|
|
-icount shift=7,rr=record,rrfile=replay.bin,rrsnapshot=snapshot_name
|
|
|
|
This snapshot is created at start of recording and restored at start
|
|
of replaying. It also can be loaded while replaying to roll back
|
|
the execution.
|
|
|
|
Network devices
|
|
---------------
|
|
|
|
Record and replay for network interactions is performed with the network filter.
|
|
Each backend must have its own instance of the replay filter as follows:
|
|
-netdev user,id=net1 -device rtl8139,netdev=net1
|
|
-object filter-replay,id=replay,netdev=net1
|
|
|
|
Replay network filter is used to record and replay network packets. While
|
|
recording the virtual machine this filter puts all packets coming from
|
|
the outer world into the log. In replay mode packets from the log are
|
|
injected into the network device. All interactions with network backend
|
|
in replay mode are disabled.
|