qemu-e2k

History

David Hildenbrand 177f9b1ee4 virtio-mem: Expose device memory dynamically via multiple memslots if enabled Having large virtio-mem devices that only expose little memory to a VM is currently a problem: we map the whole sparse memory region into the guest using a single memslot, resulting in one gigantic memslot in KVM. KVM allocates metadata for the whole memslot, which can result in quite some memory waste. Assuming we have a 1 TiB virtio-mem device and only expose little (e.g., 1 GiB) memory, we would create a single 1 TiB memslot and KVM has to allocate metadata for that 1 TiB memslot: on x86, this implies allocating a significant amount of memory for metadata: (1) RMAP: 8 bytes per 4 KiB, 8 bytes per 2 MiB, 8 bytes per 1 GiB -> For 1 TiB: 2147483648 + 4194304 + 8192 = ~ 2 GiB (0.2 %) With the TDP MMU (cat /sys/module/kvm/parameters/tdp_mmu) this gets allocated lazily when required for nested VMs (2) gfn_track: 2 bytes per 4 KiB -> For 1 TiB: 536870912 = ~512 MiB (0.05 %) (3) lpage_info: 4 bytes per 2 MiB, 4 bytes per 1 GiB -> For 1 TiB: 2097152 + 4096 = ~2 MiB (0.0002 %) (4) 2x dirty bitmaps for tracking: 2x 1 bit per 4 KiB page -> For 1 TiB: 536870912 = 64 MiB (0.006 %) So we primarily care about (1) and (2). The bad thing is, that the memory consumption doubles once SMM is enabled, because we create the memslot once for !SMM and once for SMM. Having a 1 TiB memslot without the TDP MMU consumes around: * With SMM: 5 GiB * Without SMM: 2.5 GiB Having a 1 TiB memslot with the TDP MMU consumes around: * With SMM: 1 GiB * Without SMM: 512 MiB ... and that's really something we want to optimize, to be able to just start a VM with small boot memory (e.g., 4 GiB) and a virtio-mem device that can grow very large (e.g., 1 TiB). Consequently, using multiple memslots and only mapping the memslots we really need can significantly reduce memory waste and speed up memslot-related operations. Let's expose the sparse RAM memory region using multiple memslots, mapping only the memslots we currently need into our device memory region container. The feature can be enabled using "dynamic-memslots=on" and requires "unplugged-inaccessible=on", which is nowadays the default. Once enabled, we'll auto-detect the number of memslots to use based on the memslot limit provided by the core. We'll use at most 1 memslot per gigabyte. Note that our global limit of memslots accross all memory devices is currently set to 256: even with multiple large virtio-mem devices, we'd still have a sane limit on the number of memslots used. The default is to not dynamically map memslot for now ("dynamic-memslots=off"). The optimization must be enabled manually, because some vhost setups (e.g., hotplug of vhost-user devices) might be problematic until we support more memslots especially in vhost-user backends. Note that "dynamic-memslots=on" is just a hint that multiple memslots may be used for internal optimizations, not that multiple memslots must be used. The actual number of memslots that are used is an internal detail: for example, once memslot metadata is no longer an issue, we could simply stop optimizing for that. Migration source and destination can differ on the setting of "dynamic-memslots". Message-ID: <20230926185738.277351-17-david@redhat.com> Reviewed-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com>		2023-10-12 14:15:22 +02:00
..
authz
block	nbd/server: Prepare for per-request filtering of BLOCK_STATUS	2023-10-05 11:02:08 -05:00
chardev	include/: spelling fixes	2023-09-08 13:08:52 +03:00
crypto	crypto: Add generic 64-bit carry-less multiply routine	2023-09-15 13:57:00 +00:00
disas	disas: Change type of disassemble_info.target_info to pointer	2023-06-13 17:25:47 +10:00
exec	memory: Clarify mapping requirements for RamDiscardManager	2023-10-12 14:15:22 +02:00
fpu	fpu: Add conversions between bfloat16 and [u]int8	2023-09-16 14:57:15 +00:00
gdbstub	gdbstub: Remove gdb_do_syscallv	2023-03-07 20:44:09 +00:00
hw	virtio-mem: Expose device memory dynamically via multiple memslots if enabled	2023-10-12 14:15:22 +02:00
io	io: follow coroutine AioContext in qio_channel_yield()	2023-09-07 20:32:11 -05:00
libdecnumber
migration	migration/vmstate: Introduce vmstate_save_state_with_err	2023-10-04 10:54:40 +02:00
monitor	monitor: add more *_locked() functions	2023-05-25 10:18:33 +02:00
net	net/net: Clean up global variable shadowing	2023-10-06 13:27:43 +02:00
qapi	qobject atomics osdep: Make a few macros more hygienic	2023-09-29 08:13:57 +02:00
qemu	* util/log: re-allow switching away from stderr log file	2023-10-09 10:11:18 -04:00
qom	qom/object_interfaces: Clean up global variable shadowing	2023-10-06 13:27:48 +02:00
scsi	hw/ufs: Support for UFS logical unit	2023-09-07 14:01:29 -04:00
semihosting	* util/log: re-allow switching away from stderr log file	2023-10-09 10:11:18 -04:00
standard-headers	linux-headers: Update to Linux v6.6-rc1	2023-09-12 11:34:56 +02:00
sysemu	kvm: Add stub for kvm_get_max_memslots()	2023-10-12 14:15:22 +02:00
tcg	tcg: Correct invalid mentions of 'softmmu' by 'system-mode'	2023-10-07 19:02:33 +02:00
ui	ui: add XBGR8888 and ABGR8888 in drm_format_pixman_map	2023-10-03 15:04:56 +04:00
user	bulk: Do not declare function prototypes using 'extern' keyword	2023-08-31 19:47:43 +02:00
elf.h	util: spelling fixes	2023-08-31 19:47:43 +02:00
glib-compat.h
qemu-io.h
qemu-main.h