27af7d6ea5
Avoid hot pages being replaced by others to remarkably decrease cache misses Sample results with the test program which quote from xbzrle.txt ran in vm:(migrate bandwidth:1GE and xbzrle cache size 8MB) the test program: include <stdlib.h> include <stdio.h> int main() { char *buf = (char *) calloc(4096, 4096); while (1) { int i; for (i = 0; i < 4096 * 4; i++) { buf[i * 4096 / 4]++; } printf("."); } } before this patch: virsh qemu-monitor-command test_vm '{"execute": "query-migrate"}' {"return":{"expected-downtime":1020,"xbzrle-cache":{"bytes":1108284, "cache-size":8388608,"cache-miss-rate":0.987013,"pages":18297,"overflow":8, "cache-miss":1228737},"status":"active","setup-time":10,"total-time":52398, "ram":{"total":12466991104,"remaining":1695744,"mbps":935.559472, "transferred":5780760580,"dirty-sync-counter":271,"duplicate":2878530, "dirty-pages-rate":29130,"skipped":0,"normal-bytes":5748592640, "normal":1403465}},"id":"libvirt-706"} 18k pages sent compressed in 52 seconds. cache-miss-rate is 98.7%, totally miss. after optimizing: virsh qemu-monitor-command test_vm '{"execute": "query-migrate"}' {"return":{"expected-downtime":2054,"xbzrle-cache":{"bytes":5066763, "cache-size":8388608,"cache-miss-rate":0.485924,"pages":194823,"overflow":0, "cache-miss":210653},"status":"active","setup-time":11,"total-time":18729, "ram":{"total":12466991104,"remaining":3895296,"mbps":937.663549, "transferred":1615042219,"dirty-sync-counter":98,"duplicate":2869840, "dirty-pages-rate":58781,"skipped":0,"normal-bytes":1588404224, "normal":387794}},"id":"libvirt-266"} 194k pages sent compressed in 18 seconds. The value of cache-miss-rate decrease to 48.59%. Signed-off-by: ChenLiang <chenliang88@huawei.com> Signed-off-by: Gonglei <arei.gonglei@huawei.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Amit Shah <amit.shah@redhat.com>
137 lines
4.8 KiB
Plaintext
137 lines
4.8 KiB
Plaintext
XBZRLE (Xor Based Zero Run Length Encoding)
|
|
===========================================
|
|
|
|
Using XBZRLE (Xor Based Zero Run Length Encoding) allows for the reduction
|
|
of VM downtime and the total live-migration time of Virtual machines.
|
|
It is particularly useful for virtual machines running memory write intensive
|
|
workloads that are typical of large enterprise applications such as SAP ERP
|
|
Systems, and generally speaking for any application that uses a sparse memory
|
|
update pattern.
|
|
|
|
Instead of sending the changed guest memory page this solution will send a
|
|
compressed version of the updates, thus reducing the amount of data sent during
|
|
live migration.
|
|
In order to be able to calculate the update, the previous memory pages need to
|
|
be stored on the source. Those pages are stored in a dedicated cache
|
|
(hash table) and are accessed by their address.
|
|
The larger the cache size the better the chances are that the page has already
|
|
been stored in the cache.
|
|
A small cache size will result in high cache miss rate.
|
|
Cache size can be changed before and during migration.
|
|
|
|
Format
|
|
=======
|
|
|
|
The compression format performs a XOR between the previous and current content
|
|
of the page, where zero represents an unchanged value.
|
|
The page data delta is represented by zero and non zero runs.
|
|
A zero run is represented by its length (in bytes).
|
|
A non zero run is represented by its length (in bytes) and the new data.
|
|
The run length is encoded using ULEB128 (http://en.wikipedia.org/wiki/LEB128)
|
|
|
|
There can be more than one valid encoding, the sender may send a longer encoding
|
|
for the benefit of reducing computation cost.
|
|
|
|
page = zrun nzrun
|
|
| zrun nzrun page
|
|
|
|
zrun = length
|
|
|
|
nzrun = length byte...
|
|
|
|
length = uleb128 encoded integer
|
|
|
|
On the sender side XBZRLE is used as a compact delta encoding of page updates,
|
|
retrieving the old page content from the cache (default size of 512 MB). The
|
|
receiving side uses the existing page's content and XBZRLE to decode the new
|
|
page's content.
|
|
|
|
This work was originally based on research results published
|
|
VEE 2011: Evaluation of Delta Compression Techniques for Efficient Live
|
|
Migration of Large Virtual Machines by Benoit, Svard, Tordsson and Elmroth.
|
|
Additionally the delta encoder XBRLE was improved further using the XBZRLE
|
|
instead.
|
|
|
|
XBZRLE has a sustained bandwidth of 2-2.5 GB/s for typical workloads making it
|
|
ideal for in-line, real-time encoding such as is needed for live-migration.
|
|
|
|
Example
|
|
old buffer:
|
|
1001 zeros
|
|
05 06 07 08 09 0a 0b 0c 0d 0e 0f 10 11 12 13 68 00 00 6b 00 6d
|
|
3074 zeros
|
|
|
|
new buffer:
|
|
1001 zeros
|
|
01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f 68 00 00 67 00 69
|
|
3074 zeros
|
|
|
|
encoded buffer:
|
|
|
|
encoded length 24
|
|
e9 07 0f 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f 03 01 67 01 01 69
|
|
|
|
Cache update strategy
|
|
=====================
|
|
Keeping the hot pages in the cache is effective for decreased cache
|
|
misses. XBZRLE uses a counter as the age of each page. The counter will
|
|
increase after each ram dirty bitmap sync. When a cache conflict is
|
|
detected, XBZRLE will only evict pages in the cache that are older than
|
|
a threshold.
|
|
|
|
Usage
|
|
======================
|
|
1. Verify the destination QEMU version is able to decode the new format.
|
|
{qemu} info migrate_capabilities
|
|
{qemu} xbzrle: off , ...
|
|
|
|
2. Activate xbzrle on both source and destination:
|
|
{qemu} migrate_set_capability xbzrle on
|
|
|
|
3. Set the XBZRLE cache size - the cache size is in MBytes and should be a
|
|
power of 2. The cache default value is 64MBytes. (on source only)
|
|
{qemu} migrate_set_cache_size 256m
|
|
|
|
4. Start outgoing migration
|
|
{qemu} migrate -d tcp:destination.host:4444
|
|
{qemu} info migrate
|
|
capabilities: xbzrle: on
|
|
Migration status: active
|
|
transferred ram: A kbytes
|
|
remaining ram: B kbytes
|
|
total ram: C kbytes
|
|
total time: D milliseconds
|
|
duplicate: E pages
|
|
normal: F pages
|
|
normal bytes: G kbytes
|
|
cache size: H bytes
|
|
xbzrle transferred: I kbytes
|
|
xbzrle pages: J pages
|
|
xbzrle cache miss: K
|
|
xbzrle overflow : L
|
|
|
|
xbzrle cache-miss: the number of cache misses to date - high cache-miss rate
|
|
indicates that the cache size is set too low.
|
|
xbzrle overflow: the number of overflows in the decoding which where the delta
|
|
could not be compressed. This can happen if the changes in the pages are too
|
|
large or there are many short changes; for example, changing every second byte
|
|
(half a page).
|
|
|
|
Testing: Testing indicated that live migration with XBZRLE was completed in 110
|
|
seconds, whereas without it would not be able to complete.
|
|
|
|
A simple synthetic memory r/w load generator:
|
|
.. include <stdlib.h>
|
|
.. include <stdio.h>
|
|
.. int main()
|
|
.. {
|
|
.. char *buf = (char *) calloc(4096, 4096);
|
|
.. while (1) {
|
|
.. int i;
|
|
.. for (i = 0; i < 4096 * 4; i++) {
|
|
.. buf[i * 4096 / 4]++;
|
|
.. }
|
|
.. printf(".");
|
|
.. }
|
|
.. }
|