Allow "unlocked" reads of the ram_list by using an RCU-enabled QLIST.
The ramlist mutex is kept. call_rcu callbacks are run with the iothread
lock taken, but that may change in the future. Writers still take the
ramlist mutex, but they no longer need to assume that the iothread lock
is taken.
Readers of the list, instead, no longer require either the iothread
or ramlist mutex, but they need to use rcu_read_lock() and
rcu_read_unlock().
One place in arch_init.c was downgrading from write side to read side
like this:
qemu_mutex_lock_iothread()
qemu_mutex_lock_ramlist()
...
qemu_mutex_unlock_iothread()
...
qemu_mutex_unlock_ramlist()
and the equivalent idiom is:
qemu_mutex_lock_ramlist()
rcu_read_lock()
...
qemu_mutex_unlock_ramlist()
...
rcu_read_unlock()
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Mike Day <ncmike@ncultra.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
QLIST has RCU-friendly primitives, so switch to it.
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Mike Day <ncmike@ncultra.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Avoid hot pages being replaced by others to remarkably decrease cache
misses
Sample results with the test program which quote from xbzrle.txt ran in
vm:(migrate bandwidth:1GE and xbzrle cache size 8MB)
the test program:
include <stdlib.h>
include <stdio.h>
int main()
{
char *buf = (char *) calloc(4096, 4096);
while (1) {
int i;
for (i = 0; i < 4096 * 4; i++) {
buf[i * 4096 / 4]++;
}
printf(".");
}
}
before this patch:
virsh qemu-monitor-command test_vm '{"execute": "query-migrate"}'
{"return":{"expected-downtime":1020,"xbzrle-cache":{"bytes":1108284,
"cache-size":8388608,"cache-miss-rate":0.987013,"pages":18297,"overflow":8,
"cache-miss":1228737},"status":"active","setup-time":10,"total-time":52398,
"ram":{"total":12466991104,"remaining":1695744,"mbps":935.559472,
"transferred":5780760580,"dirty-sync-counter":271,"duplicate":2878530,
"dirty-pages-rate":29130,"skipped":0,"normal-bytes":5748592640,
"normal":1403465}},"id":"libvirt-706"}
18k pages sent compressed in 52 seconds.
cache-miss-rate is 98.7%, totally miss.
after optimizing:
virsh qemu-monitor-command test_vm '{"execute": "query-migrate"}'
{"return":{"expected-downtime":2054,"xbzrle-cache":{"bytes":5066763,
"cache-size":8388608,"cache-miss-rate":0.485924,"pages":194823,"overflow":0,
"cache-miss":210653},"status":"active","setup-time":11,"total-time":18729,
"ram":{"total":12466991104,"remaining":3895296,"mbps":937.663549,
"transferred":1615042219,"dirty-sync-counter":98,"duplicate":2869840,
"dirty-pages-rate":58781,"skipped":0,"normal-bytes":1588404224,
"normal":387794}},"id":"libvirt-266"}
194k pages sent compressed in 18 seconds.
The value of cache-miss-rate decrease to 48.59%.
Signed-off-by: ChenLiang <chenliang88@huawei.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
If block used_length does not match, try to resize it.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
This patch allows us to distinguish between two
length values for each block:
max_length - length of memory block that was allocated
used_length - length of block used by QEMU/guest
Currently, we set used_length - max_length, unconditionally.
Follow-up patches allow used_length <= max_length.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
The static variables in migration_bitmap_sync will not be reset in
the case of a second attempted migration.
Signed-off-by: ChenLiang <chenliang88@huawei.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
During migration, the values read from migration stream during ram load
are not validated. Especially offset in host_from_stream_offset() and
also the length of the writes in the callers of said function.
To fix this, we need to make sure that the [offset, offset + length]
range fits into one of the allocated memory regions.
Validating addr < len should be sufficient since data seems to always be
managed in TARGET_PAGE_SIZE chunks.
Fixes: CVE-2014-7840
Note: follow-up patches add extra checks on each block->host access.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
this patch extends commit db80fac by not only checking
for unknown flags, but also filtering out unknown flag
combinations.
Suggested-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Juan Quintela <quintela@redhat.com>
As the function always return 1, it is not needed anymore.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
When live migrate fails due to a section length mismatch we currently
see an error message like:
Length mismatch: 0000:00:03.0/virtio-net-pci.rom: 10000 in != 20000
The section lengths are in fact in hex, so this should read
Length mismatch: 0000:00:03.0/virtio-net-pci.rom: 0x10000 in != 0x20000
Correct the error string to reflect this.
Signed-off-by: Alex Bligh <alex@alex.org.uk>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
if a saved vm has unknown flags in the memory data qemu
currently simply ignores this flag and continues which
yields in an unpredictable result.
This patch catches all unknown flags and aborts the
loading of the vm. Additionally error reports are thrown
if the migration aborts abnormally.
Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Juan Quintela <quintela@redhat.com>
We call g_free() after cache_fini() in migration_end(), but we don't
call it after cache_fini() in xbzrle_cache_resize(), leaking the
memory.
cache_init() and cache_fini() are a pair. Since cache_init()
allocates the cache, let cache_fini() free it. This plugs the leak.
Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Replace fprintf(stderr,...) with error_report() in the file
arch_init.c. The trailing "\n"s of the @fmt argument have been removed
because @fmt of error_report() should not contain newline.
Signed-off-by: Le Tan <tamlokveer@gmail.com>
Reviewed-by: Andreas Färber <afaerber@suse.de>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
ram_save_block is getting a bit too complicated, and does two separate
things:
1) Finds a page to send
2) Sends the page (dealing with compression etc)
Split into 'ram_save_page' to send the page and deal with compression (2)
Rename remaining function to 'ram_find_and_save_block'
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
For xbzrle_decode_buffer(), when decoding contents will exceed writing
buffer, it will return -1, so need not check the return value whether
large than writing buffer.
And when failure occurs within load_xbzrle(), it always return -1
without any resources which need release.
So can remove the related checking statements, and also can remove 'rc'
and 'ret' local variables,
Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
When DPRINTF() has effect, the original author wants to print all
ram_load() calling results. So need use 'goto' instead of 'return'
within ram_load(), just like other areas have done.
Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
expose the count that logs the times of updating the dirty bitmap to
end user.
Signed-off-by: ChenLiang <chenliang88@huawei.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Add counts to log the times of updating the dirty bitmap.
Signed-off-by: ChenLiang <chenliang88@huawei.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
The page may not be inserted into cache after executing save_xbzrle_page.
In case of failure to insert, the original page should be sent rather
than the page in the cache.
Signed-off-by: ChenLiang <chenliang88@huawei.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
version_id is checked twice in the ram_load.
Signed-off-by: ChenLiang <chenliang88@huawei.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Initialising the XBZRLE.lock earlier simplifies the lock use.
Based on Markus's patch in:
http://lists.gnu.org/archive/html/qemu-devel/2014-03/msg03879.html
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Gonglei <arei.gonglei@huawei.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Provide ram_mig_init (like blk_mig_init) for vl.c to initialise stuff
to do with ram migration (currently in arch_init.c).
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Gonglei <arei.gonglei@huawei.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
This is a fix for a bug* triggered by a migration after hot unplugging
a few virtio-net NICs, that caused migration never to converge, because
'migration_dirty_pages' is incorrectly initialised.
'migration_dirty_pages' is used as a tally of the number of outstanding
dirty pages, to give the migration code an idea of how much more data
will need to be transferred, and thus whether it can end the iterative
phase.
It was initialised to the total size of the RAMBlock address space,
however hotunplug can leave this space sparse, and hence
migration_dirty_pages ended up too large.
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
(* https://bugzilla.redhat.com/show_bug.cgi?id=1074913 )
Signed-off-by: Juan Quintela <quintela@redhat.com>
Resizing the xbzrle cache during migration causes qemu-crash,
because the main-thread and migration-thread modify the xbzrle
cache size concurrently without lock-protection.
Signed-off-by: ChenLiang <chenliang88@huawei.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Push zero'd pages into the XBZRLE cache
A page that was cached by XBZRLE, zero'd and then XBZRLE'd again
was being compared against a stale cache value
Don't use 'qemu_put_buffer_async' to put pages from the XBZRLE cache
Since the cache might change before the data hits the wire
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
It is better to fail migration in case of failure to
allocate new cache item
Signed-off-by: Orit Wasserman <owasserm@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Orit Wasserman <owasserm@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
When qemu do live migration with xbzrle, qemu malloc decoded_buf
at destination end but free it at source end. It will crash qemu
by double free error in some scenarios. Splitting the XBZRLE structure
for clear logic distinguishing src/dst side.
Signed-off-by: ChenLiang <chenliang88@huawei.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Orit Wasserman <owasserm@redhat.com>
Signed-off-by: GongLei <arei.gonglei@huawei.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
We use the old code if the bitmaps are not aligned
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Orit Wasserman <owasserm@redhat.com>
This function is the only bit where we care about speed.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Orit Wasserman <owasserm@redhat.com>
The madvise for zeroed out pages was introduced when every transferred
zero page was memset to zero and thus allocated. Since commit
211ea740 we check for zeroness of a target page before we memset
it to zero. Additionally we memmap target memory so it is essentially
zero initialized (except for e.g. option roms and bios which are loaded
into target memory although they shouldn't).
It was reported recently that this madvise causes a performance degradation
in some situations. As the madvise should only be called rarely and if it's called
it is likely on a busy page (it was non-zero and changed to zero during migration)
drop it completely.
Reported-By: Zhang Haoyu <haoyu.zhang@huawei.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Juan Quintela <quintela@redhat.com>
This includes pc and pci cleanups and enhancements,
and a virtio-net bugfix related to softmac programming.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.14 (GNU/Linux)
iQEcBAABAgAGBQJSR83kAAoJECgfDbjSjVRpX08H/jKgYBNJaChev1TROIVHEGbu
IzvkjfocvKO+6wmhOf5x+xwFmzrijUMa1CPvOkCp8c2A3Iek7rmnedknlhXYh7dM
z5mXcvFGjnu7ST38ydF/Emk9+Z6rRg5Y/hkmlDyr+9lNcoiCDLXXcUrKjeIHNoWl
e8w3yiPCJ528QyrLwQ890XetJphv67pMlsjMgLQ2betMk++Ac/ctUf1D2p1X4NeQ
Q2drbo5Z4yDk0i6QMA3iLq1Bh/AhE10bCDq9rCzfZGIKVyncL6ne2pSi/xDvpLrF
dmxoiJ5QrK6xLnagCcn5T6SB9DkwbEPdL7qCqlxZ8USr7cVyPdzYtHtGSBWdeXY=
=xF01
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'mst/tags/for_anthony' into staging
pc,pci,virtio fixes and cleanups
This includes pc and pci cleanups and enhancements,
and a virtio-net bugfix related to softmac programming.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
# gpg: Signature made Sun 29 Sep 2013 01:51:16 AM CDT using RSA key ID D28D5469
# gpg: Can't check signature: public key not found
# By Michael S. Tsirkin (8) and others
# Via Michael S. Tsirkin
* mst/tags/for_anthony:
smbios: Factor out smbios_maybe_add_str()
smbios: Make multiple -smbios type= accumulate sanely
smbios: Improve diagnostics for conflicting entries
smbios: Convert to QemuOpts
smbios: Normalize smbios_entry_add()'s error handling to exit(1)
virtio-net: fix up HMP NIC info string on reset
pci: remove explicit check to 64K ioport size
piix4: disable io on reset
piix: use 64 bit window programmed by guest
q35: use 64 bit window programmed by guest
pci: add helper to retrieve the 64-bit range
range: add min/max operations on ranges
range: add Range to typedefs
q35: make pci window address/size match guest cfg
Message-id: 1380437951-21788-1-git-send-email-mst@redhat.com
Currently, -smbios type=T,NAME=VAL,... adds one field (T,NAME) with
value VAL to fw_cfg for each unique NAME. If NAME occurs multiple
times, the last one's VAL is used (before the QemuOpts conversion, the
first one was used).
Multiple -smbios can add multiple fields with the same (T, NAME).
SeaBIOS reads all of them from fw_cfg, but uses only the first field
(T, NAME). The others are ignored.
"First one wins, subsequent ones get ignored silently" isn't nice. We
commonly let the last option win. Useful, because it lets you
-readconfig first, then selectively override with command line
options.
Clean up -smbios to work the common way. Accumulate the settings,
with later ones overwriting earlier ones. Put the result into fw_cfg
(no more useless duplicates).
Bonus cleanup: qemu_uuid_parse() no longer sets SMBIOS system uuid by
side effect.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
So that it can be set in config file for -readconfig.
This tightens parsing of -smbios, and makes it more consistent with
other options: unknown parameters are rejected, numbers with trailing
junk are rejected, when a parameter is given multiple times, last
rather than first wins, ...
MST: drop one chunk to fix build errors
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
It exits on all error conditions but one, where it returns -1.
Normalize, and return void.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
ram_handle_compressed() should be aware of size > TARGET_PAGE_SIZE.
migration-rdma can call it with larger size.
Signed-off-by: Isaku Yamahata <yamahata@private.email.ne.jp>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Later is_zero_page will be used for non TARGET_PAGE_SIZE
range.
And rename it to is_zero_range as it isn't page size any more.
Signed-off-by: Isaku Yamahata <yamahata@private.email.ne.jp>
Signed-off-by: Juan Quintela <quintela@redhat.com>
qemu_file_rate_limit() never return negative value since the refactor
by Commit 1964a39, this patch gets rid of the negative check for it,
adjust bytes_transferred and return value correspondingly in
ram_save_iterate().
Signed-off-by: Lei Li <lilei@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
It was introduced to loop over CPUs from target-independent code, but
since commit 182735efaf target-independent
CPUState is used.
A loop can be considered more efficient than function calls in a loop,
and CPU_FOREACH() hides implementation details just as well, so use that
instead.
Suggested-by: Markus Armbruster <armbru@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Andreas Färber <afaerber@suse.de>
This includes pc and pci cleanups, future-proofing of ROM files,
and a virtio bugfix correcting splice on virtio console.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.14 (GNU/Linux)
iQEcBAABAgAGBQJSGvbsAAoJECgfDbjSjVRp0qsH/2N/b/5bNhL6gx5Kjigzv7Lt
bmCUsSsUm/Rnhke2SpDPVdIOU9W5WIX2zte8qiXuBJi3j/RS9procbgUZrCetpUI
GHQOn+n8e2TX/P9CIcHaN/bA1b+Kx+bvaVbDL/6P2PJvodbDGwDLp/9y+Tisv5Ye
kLow8Y4ZJcaTlmPf/Mh1AnhNa3gel231A3qwugVUNDSyITM6pG/0M07xk8YBj+JJ
6DYrlK0bKMNDxPu5St+YP94D1ODv2zM1aio/TdMUaNfqTZM1iqGTj3zKkBS2PjT0
RhuU2x4N91lY/uCdudtWSj5dYtfRT/xw7qwNAz9IHNz6RlGeX0n++ClMC8z9Y8A=
=Pc09
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'mst/tags/for_anthony' into stable-1.5
pc,pci,virtio fixes and cleanups
This includes pc and pci cleanups, future-proofing of ROM files,
and a virtio bugfix correcting splice on virtio console.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
# gpg: Signature made Mon 26 Aug 2013 01:34:20 AM CDT using RSA key ID D28D5469
# gpg: Can't check signature: public key not found
# By Markus Armbruster (5) and others
# Via Michael S. Tsirkin
* mst/tags/for_anthony:
virtio: virtqueue_get_avail_bytes: fix desc_pa when loop over the indirect descriptor table
pc_piix: Kill pc_init1() memory region args
pc: pc_compat_1_4() now can call pc_compat_1_5()
pc: Create pc_compat_*() functions
pc: Kill pc_init_pci_1_0()
pc: Don't explode QEMUMachineInitArgs into local variables needlessly
pc: Don't prematurely explode QEMUMachineInitArgs
ppc: Don't duplicate QEMUMachineInitArgs in PPCE500Params
ppc: Don't explode QEMUMachineInitArgs into local variables needlessly
sun4: Don't prematurely explode QEMUMachineInitArgs
q35: Add PCIe switch to example q35 configuration
loader: store FW CFG ROM files in RAM
arch_init: align MR size to target page size
pc: cleanup 1.4 compat support
Message-id: 1377535318-30491-1-git-send-email-mst@redhat.com
# By Alex Bligh (32) and others
# Via Stefan Hajnoczi
* stefanha/block: (42 commits)
win32-aio: drop win32_aio_flush_cb()
aio-win32: replace incorrect AioHandler->opaque usage with ->e
aio / timers: remove dummy_io_handler_flush from tests/test-aio.c
aio / timers: Remove legacy interface
aio / timers: Switch entire codebase to the new timer API
aio / timers: Add scripts/switch-timer-api
aio / timers: Add test harness for AioContext timers
aio / timers: convert block_job_sleep_ns and co_sleep_ns to new API
aio / timers: Convert rtc_clock to be a QEMUClockType
aio / timers: Remove main_loop_timerlist
aio / timers: Rearrange timer.h & make legacy functions call non-legacy
aio / timers: Add qemu_clock_get_ms and qemu_clock_get_ms
aio / timers: Remove legacy qemu_clock_deadline & qemu_timerlist_deadline
aio / timers: Remove alarm timers
aio / timers: Add documentation and new format calls
aio / timers: Use all timerlists in icount warp calculations
aio / timers: Introduce new API timer_new and friends
aio / timers: On timer modification, qemu_notify or aio_notify
aio / timers: Convert mainloop to use timeout
aio / timers: Convert aio_poll to use AioContext timers' deadline
...
Message-id: 1377202298-22896-1-git-send-email-stefanha@redhat.com
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
This is an autogenerated patch using scripts/switch-timer-api.
Switch the entire code base to using the new timer API.
Note this patch may introduce some line length issues.
Signed-off-by: Alex Bligh <alex@alex.org.uk>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>