qemu-e2k

Author	SHA1	Message	Date
Peter Lieven	48672ac058	block/rbd: bump librbd requirement to luminous release Ceph Luminous (version 12.2.z) is almost 4 years old at this point. Bump the requirement to get rid of the ifdef'ry in the code. Qemu 6.1 dropped the support for RHEL-7 which was the last supported OS that required an older librbd. Signed-off-by: Peter Lieven <pl@kamp.de> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Message-Id: <20210702172356.11574-2-idryomov@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-07-09 12:26:05 +02:00
Or Ozeri	42e4ac9ef5	block/rbd: Add support for rbd image encryption Starting from ceph Pacific, RBD has built-in support for image-level encryption. Currently supported formats are LUKS version 1 and 2. There are 2 new relevant librbd APIs for controlling encryption, both expect an open image context: rbd_encryption_format: formats an image (i.e. writes the LUKS header) rbd_encryption_load: loads encryptor/decryptor to the image IO stack This commit extends the qemu rbd driver API to support the above. Signed-off-by: Or Ozeri <oro@il.ibm.com> Message-Id: <20210627114635.39326-1-oro@il.ibm.com> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-07-09 12:26:05 +02:00
Ilya Dryomov	0725570b2d	MAINTAINERS: update block/rbd.c maintainer Jason has moved on from working on RBD and Ceph. I'm taking over his role upstream. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Message-Id: <20210519112513.19694-1-idryomov@gmail.com> Acked-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-07-09 12:26:05 +02:00
Bharata B Rao	82123b756a	target/ppc: Support for H_RPT_INVALIDATE hcall If KVM_CAP_RPT_INVALIDATE KVM capability is enabled, then - indicate the availability of H_RPT_INVALIDATE hcall to the guest via ibm,hypertas-functions property. - Enable the hcall Both the above are done only if the new sPAPR machine capability cap-rpt-invalidate is set. Signed-off-by: Bharata B Rao <bharata@linux.ibm.com> Message-Id: <20210706112440.1449562-3-bharata@linux.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-07-09 11:01:06 +10:00
Bharata B Rao	327d4b7f3f	linux-headers: Update Update to mainline commit: 79160a603bdb ("Merge tag 'usb-5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb" Signed-off-by: Bharata B Rao <bharata@linux.ibm.com> Message-Id: <20210706112440.1449562-2-bharata@linux.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-07-09 11:01:06 +10:00
Alexey Kardashevskiy	21bde1ecb6	spapr: Fix implementation of Open Firmware client interface This addresses the comments from v22. The functional changes are (the VOF ones need retesting with Pegasos2): (VOF) setprop will start failing if the machine class callback did not handle it; (VOF) unit addresses are lowered in path_offset(); (SPAPR) /chosen/bootargs is initialized from kernel_cmdline if the client did not change it. Fixes: 5c991e5d4378 ("spapr: Implement Open Firmware client interface") Cc: BALATON Zoltan <balaton@eik.bme.hu> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Message-Id: <20210708065625.548396-1-aik@ozlabs.ru> Tested-by: BALATON Zoltan <balaton@eik.bme.hu> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-07-09 10:55:11 +10:00
Lucas Mateus Castro (alqotel)	89bb5a4dfd	target/ppc: Don't compile ppc_tlb_invalid_all without TCG The function ppc_tlb_invalid_all is not compiled anymore in a TCG-less environment, and the call to that function has been disabled in this situation Signed-off-by: Lucas Mateus Castro (alqotel) <lucas.araujo@eldorado.org.br> Message-Id: <20210708164957.28096-2-lucas.araujo@eldorado.org.br> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-07-09 10:44:25 +10:00
BALATON Zoltan	5f2eb04961	ppc/pegasos2: Implement some RTAS functions with VOF Linux uses RTAS functions to access PCI devices so we need to provide these with VOF. Implement some of the most important functions to allow booting Linux with VOF. With this the board is now usable without a binary ROM image and we can enable it by default as other boards. Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu> Message-Id: <20210708215113.B3F747456E3@zero.eik.bme.hu> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-07-09 10:40:43 +10:00
David Gibson	e7dfb29e5a	ppc/pegasos2: Fix use of && instead of & This is obviously intended to be a mask, not a logical operation. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-07-09 10:38:19 +10:00
BALATON Zoltan	a6c9808a68	ppc/pegasos2: Use Virtual Open Firmware as firmware replacement The pegasos2 board comes with an Open Firmware compliant ROM based on SmartFirmware but it has some changes that are not open source therefore the ROM binary cannot be included in QEMU. Guests running on the board however depend on services provided by the firmware. The Virtual Open Firmware recently added to QEMU implements a minimal set of these services to allow some guests to boot without the original firmware. This patch adds VOF as the default firmware for pegasos2 which allows booting Linux and MorphOS via -kernel option while a ROM image can still be used with -bios for guests that don't run with VOF. Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu> Message-Id: <1d6ed6f290c5c1f0b5a1e1c51cf1151452d70d9a.1624811233.git.balaton@eik.bme.hu> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-07-09 10:38:19 +10:00
Nicholas Piggin	17fd09c021	target/ppc/spapr: Update H_GET_CPU_CHARACTERISTICS L1D cache flush bits There are several new L1D cache flush bits added to the hcall which reflect hardware security features for speculative cache access issues. These behaviours are now being specified as negative in order to simplify patched kernel compatibility with older firmware (a new problem found in existing systems would automatically be vulnerable). [dwg: Technically this changes behaviour for existing machine types. After discussion with Nick, we've determined this is safe, because the worst that will happen if a guest gets the wrong information due to a migration is that it will perform some unnecessary workarounds, but will remain correct and secure (well, as secure as it was going to be anyway). In addition the change only affects cap-cfpc=safe which is not enabled by default, and in fact is not possible to set on any current hardware (though it's expected it will be possible on POWER10)] Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Message-Id: <20210615044107.1481608-1-npiggin@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-07-09 10:38:19 +10:00
BALATON Zoltan	5e994fc019	target/ppc: Allow virtual hypervisor on CPU without HV Change the assert in ppc_store_sdr1() to allow vhyp to be set on CPUs without HV bit. This allows using the vhyp interface for firmware emulation on pegasos2. Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu> Message-Id: <21c7745aabbb68fcc50bb2ffaf16b939ba21261c.1624811233.git.balaton@eik.bme.hu> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-07-09 10:38:19 +10:00
BALATON Zoltan	a8eda5ed3d	ppc/pegasos2: Introduce Pegasos2MachineState structure Add own machine state structure which will be used to store state needed for firmware emulation. Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-Id: <7f6d5fbf4f70c64dba001483174a2921dd616ecd.1624811233.git.balaton@eik.bme.hu> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-07-09 10:38:19 +10:00
Nicholas Piggin	caf590ddc9	target/ppc: mtmsrd is an illegal instruction on BookE MSR is a 32-bit register in BookE and there is no mtmsrd instruction. Cc: Christian Zigotzky <chzigotzky@xenosoft.de> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Message-Id: <20210706051321.609046-1-npiggin@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-07-09 10:38:19 +10:00
Alexey Kardashevskiy	fc8c745d50	spapr: Implement Open Firmware client interface The PAPR platform describes an OS environment that's presented by a combination of a hypervisor and firmware. The features it specifies require collaboration between the firmware and the hypervisor. Since the beginning, the runtime component of the firmware (RTAS) has been implemented as a 20 byte shim which simply forwards it to a hypercall implemented in qemu. The boot time firmware component is SLOF - but a build that's specific to qemu, and has always needed to be updated in sync with it. Even though we've managed to limit the amount of runtime communication we need between qemu and SLOF, there's some, and it has become increasingly awkward to handle as we've implemented new features. This implements a boot time OF client interface (CI) which is enabled by a new "x-vof" pseries machine option (stands for "Virtual Open Firmware). When enabled, QEMU implements the custom H_OF_CLIENT hcall which implements Open Firmware Client Interface (OF CI). This allows using a smaller stateless firmware which does not have to manage the device tree. The new "vof.bin" firmware image is included with source code under pc-bios/. It also includes RTAS blob. This implements a handful of CI methods just to get -kernel/-initrd working. In particular, this implements the device tree fetching and simple memory allocator - "claim" (an OF CI memory allocator) and updates "/memory@0/available" to report the client about available memory. This implements changing some device tree properties which we know how to deal with, the rest is ignored. To allow changes, this skips fdt_pack() when x-vof=on as not packing the blob leaves some room for appending. In absence of SLOF, this assigns phandles to device tree nodes to make device tree traversing work. When x-vof=on, this adds "/chosen" every time QEMU (re)builds a tree. This adds basic instances support which are managed by a hash map ihandle -> [phandle]. Before the guest started, the used memory is: 0..e60 - the initial firmware 8000..10000 - stack 400000.. - kernel 3ea0000.. - initramdisk This OF CI does not implement "interpret". Unlike SLOF, this does not format uninitialized nvram. Instead, this includes a disk image with pre-formatted nvram. With this basic support, this can only boot into kernel directly. However this is just enough for the petitboot kernel and initradmdisk to boot from any possible source. Note this requires reasonably recent guest kernel with: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=df5be5be8735 The immediate benefit is much faster booting time which especially crucial with fully emulated early CPU bring up environments. Also this may come handy when/if GRUB-in-the-userspace sees light of the day. This separates VOF and sPAPR in a hope that VOF bits may be reused by other POWERPC boards which do not support pSeries. This assumes potential support for booting from QEMU backends such as blockdev or netdev without devices/drivers used. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Message-Id: <20210625055155.2252896-1-aik@ozlabs.ru> Reviewed-by: BALATON Zoltan <balaton@eik.bme.hu> [dwg: Adjusted some includes which broke compile in some more obscure compilation setups] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-07-09 10:38:19 +10:00
Bin Meng	ea41397055	docs/system: ppc: Update ppce500 documentation with eTSEC support This adds eTSEC support to the PowerPC `ppce500` machine documentation. Signed-off-by: Bin Meng <bmeng.cn@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-07-09 10:38:19 +10:00
Bin Meng	a0c3747e14	roms/u-boot: Bump ppce500 u-boot to v2021.07 to add eTSEC support Update the QEMU shipped u-boot.e500 image built from U-Boot mainline v2021.07 release, which added eTSEC support to the QEMU ppce500 target, via the following U-Boot series: http://patchwork.ozlabs.org/project/uboot/list/?series=233875&state=* The cross-compilation toolchain used to build the U-Boot image is: https://mirrors.edge.kernel.org/pub/tools/crosstool/files/bin/x86_64/10.1.0/x86_64-gcc-10.1.0-nolibc-powerpc-linux.tar.xz Signed-off-by: Bin Meng <bmeng.cn@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-07-09 10:38:19 +10:00
Bruno Larsen (billionai)	d423baf9b4	target/ppc: change ppc_hash32_xlate to use mmu_idx Changed hash32 address translation to use the supplied mmu_idx, instead of using what was stored in the msr, for parity purposes (radix64 already uses that) and for conceptual correctness, all the relevant functions should always use the supplied mmu_idx, as there are no guarantees that the mmu_idx stored in the CPU variable will not desync. Signed-off-by: Bruno Larsen (billionai) <bruno.larsen@eldorado.org.br> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Message-Id: <20210706150316.21005-3-bruno.larsen@eldorado.org.br> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-07-09 10:38:19 +10:00
Bruno Larsen (billionai)	a97c4d3c1e	target/ppc: introduce mmu-books.h Intrudoce a header common to all BookS MMUs, that can hold code that is common to hash32 and book3s-v3 MMUs. Suggested-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Bruno Larsen (billionai) <bruno.larsen@eldorado.org.br> Message-Id: <20210706150316.21005-2-bruno.larsen@eldorado.org.br> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-07-09 10:38:19 +10:00
Bruno Larsen (billionai)	03695a9870	target/ppc: changed ppc_hash64_xlate to use mmu_idx Changed hash64 address translation to use the supplied mmu_idx instead of using the one stored in the msr, for parity purposes (other book3s MMUs already use it). Signed-off-by: Bruno Larsen (billionai) <bruno.larsen@eldorado.org.br> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20210628133610.1143-4-bruno.larsen@eldorado.org.br> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-07-09 10:38:19 +10:00
Bruno Larsen (billionai)	3f9f76d5bb	target/ppc: fix address translation bug for radix mmus This commit attempts to fix a technical hiccup first mentioned by Richard Henderson in https://lists.nongnu.org/archive/html/qemu-devel/2021-05/msg06247.html To sumarize the hiccup here, when radix-style mmus are translating an address, they might need to call a second level of translation, with hypervisor privileges. However, the way it was being done up until this point meant that the second level translation had the same privileges as the first level. It could lead to a bug in address translation when running KVM inside a TCG guest, but this bug was never experienced by users, so this isn't as much a bug fix as it is a correctness cleanup. This patch attempts that cleanup by making radix64_*_xlate functions receive the mmu_idx, and passing one with the correct permission for the second level translation. The mmuidx macros added by this patch are only correct for non-bookE mmus, because BookE style set the IS and DS bits inverted and there might be other subtle differences. However, there doesn't seem to be BookE cpus that have radix-style mmus, so we left a comment there to document the issue, in case a machine does have that and was missed. As part of this cleanup, we now need to send the correct mmmu_idx when calling get_phys_page_debug, otherwise we might not be able to see the memory that the CPU could Suggested-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Bruno Larsen (billionai) <bruno.larsen@eldorado.org.br> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Cédric Le Goater <clg@kaod.org> Tested-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20210628133610.1143-2-bruno.larsen@eldorado.org.br> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-07-09 10:38:19 +10:00
Fabiano Rosas	ba1b5df070	target/ppc: Fix compilation with DEBUG_BATS debug option ../target/ppc/mmu-hash32.c: In function 'ppc_hash32_bat_lookup': ../target/ppc/mmu-hash32.c:204:13: error: 'BATu' undeclared (first use in this function); 204 \| BATu = &BATut[i]; \| ^~~~ \| BATut ../target/ppc/mmu-hash32.c:205:13: error: 'BATl' undeclared (first use in this function); 205 \| BATl = &BATlt[i]; \| ^~~~ \| BATlt ../target/ppc/mmu-hash32.c:206:13: error: 'BEPIu' undeclared (first use in this function) 206 \| BEPIu = BATu & BATU32_BEPIU; \| ^~~~~ ../target/ppc/mmu-hash32.c:206:29: error: 'BATU32_BEPIU' undeclared (first use in this function); 206 \| BEPIu = BATu & BATU32_BEPIU; \| ^~~~~~~~~~~~ \| BATU32_BEPI ../target/ppc/mmu-hash32.c:207:13: error: 'BEPIl' undeclared (first use in this function) 207 \| BEPIl = BATu & BATU32_BEPIL; \| ^~~~~ ../target/ppc/mmu-hash32.c:207:29: error: 'BATU32_BEPIL' undeclared (first use in this function); 207 \| BEPIl = BATu & BATU32_BEPIL; \| ^~~~~~~~~~~~ \| BATU32_BEPI ../target/ppc/mmu-hash32.c:208:13: error: 'bl' undeclared (first use in this function) 208 \| bl = (*BATu & 0x00001FFC) << 15; \| ^~ Fixes: `9813279664` ("target-ppc: Disentangle BAT code for 32-bit hash MMUs") Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com> Message-Id: <20210702215235.1941771-4-farosas@linux.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-07-09 10:38:19 +10:00
Fabiano Rosas	d3841fce0d	target/ppc: Fix compilation with FLUSH_ALL_TLBS debug option ../target/ppc/mmu_helper.c: In function 'helper_store_ibatu': ../target/ppc/mmu_helper.c:1802:17: error: unused variable 'cpu' [-Werror=unused-variable] 1802 \| PowerPCCPU cpu = env_archcpu(env); \| ^~~ ../target/ppc/mmu_helper.c: In function 'helper_store_dbatu': ../target/ppc/mmu_helper.c:1838:17: error: unused variable 'cpu' [-Werror=unused-variable] 1838 \| PowerPCCPU cpu = env_archcpu(env); \| ^~~ ../target/ppc/mmu_helper.c: In function 'helper_store_601_batu': ../target/ppc/mmu_helper.c:1874:17: error: unused variable 'cpu' [-Werror=unused-variable] 1874 \| PowerPCCPU cpu = env_archcpu(env); \| ^~~ ../target/ppc/mmu_helper.c: In function 'helper_store_601_batl': ../target/ppc/mmu_helper.c:1919:17: error: unused variable 'cpu' [-Werror=unused-variable] 1919 \| PowerPCCPU cpu = env_archcpu(env); Fixes: `db70b31144` ("target/ppc: Use env_cpu, env_archcpu") Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com> Message-Id: <20210702215235.1941771-3-farosas@linux.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-07-09 10:38:19 +10:00
Fabiano Rosas	26ba91db6c	target/ppc: Fix compilation with DUMP_PAGE_TABLES debug option ../target/ppc/mmu_helper.c: In function 'get_segment_6xx_tlb': ../target/ppc/mmu_helper.c:514:46: error: passing argument 1 of 'ppc_hash32_hpt_mask' from incompatible pointer type [-Werror=incompatible-pointer-types] 514 \| ppc_hash32_hpt_mask(env) + 0x80); \| ^~~ \| \| \| CPUPPCState * Fixes: `36778660d7` ("target/ppc: Eliminate htab_base and htab_mask variables") Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com> Message-Id: <20210702215235.1941771-2-farosas@linux.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-07-09 10:38:19 +10:00
Richard Henderson	cbf35bac39	target/ppc: Restrict ppc_cpu_tlb_fill to TCG This function is used by TCGCPUOps, and is thus TCG specific. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20210621125115.67717-10-bruno.larsen@eldorado.org.br> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-07-09 10:38:19 +10:00
Richard Henderson	51806b5458	target/ppc: Introduce ppc_xlate Create one common dispatch for all of the ppc__xlate functions. Use ppc64_v3_radix to directly dispatch between ppc_radix64_xlate and ppc_hash64_xlate. Remove the separate _handle_mmu_fault and *_get_phys_page_debug functions, using common code for ppc_cpu_tlb_fill and ppc_cpu_get_phys_page_debug. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20210621125115.67717-9-bruno.larsen@eldorado.org.br> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-07-09 10:38:19 +10:00
Richard Henderson	af44a14236	target/ppc: Split out ppc_jumbo_xlate Mirror the interface of ppc_radix64_xlate (mostly), putting all of the logic for older mmu translation into a single entry point. For booke, we need to add mmu_idx to the xlate-style interface. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20210621125115.67717-8-bruno.larsen@eldorado.org.br> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-07-09 10:38:19 +10:00
Richard Henderson	6c3c873c63	target/ppc: Split out ppc_hash32_xlate Mirror the interface of ppc_radix64_xlate, putting all of the logic for hash32 translation into a single entry point. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20210621125115.67717-7-bruno.larsen@eldorado.org.br> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-07-09 10:38:19 +10:00
Richard Henderson	1a8c647bbd	target/ppc: Split out ppc_hash64_xlate Mirror the interface of ppc_radix64_xlate, putting all of the logic for hash64 translation into a single function. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20210621125115.67717-6-bruno.larsen@eldorado.org.br> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-07-09 10:38:19 +10:00
Richard Henderson	077a370499	target/ppc: Use bool success for ppc_radix64_xlate Instead of returning non-zero for failure, return true for success. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20210621125115.67717-5-bruno.larsen@eldorado.org.br> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-07-09 10:38:19 +10:00
Richard Henderson	42a611240e	target/ppc: Push real-mode handling into ppc_radix64_xlate This removes some incomplete duplication between ppc_radix64_handle_mmu_fault and ppc_radix64_get_phys_page_debug. The former was correct wrt SPR_HRMOR and the latter was not. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20210621125115.67717-4-bruno.larsen@eldorado.org.br> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-07-09 10:38:19 +10:00
Richard Henderson	1b4d1cb31a	target/ppc: Use MMUAccessType with *_handle_mmu_fault These changes were waiting until we didn't need to match the function type of PowerPCCPUClass.handle_mmu_fault. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20210621125115.67717-3-bruno.larsen@eldorado.org.br> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-07-09 10:38:18 +10:00
Richard Henderson	db20cc2c56	target/ppc: Remove PowerPCCPUClass.handle_mmu_fault Instead, use a switch on env->mmu_model. This avoids some replicated information in cpu setup. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20210621125115.67717-2-bruno.larsen@eldorado.org.br> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-07-09 10:38:18 +10:00
Alexey Kardashevskiy	7381c5d11f	spapr: tune rtas-size QEMU reserves space for RTAS via /rtas/rtas-size which tells the client how much space the RTAS requires to work which includes the RTAS binary blob implementing RTAS runtime. Because pseries supports FWNMI which requires plenty of space, QEMU reserves more than 2KB which is enough for the RTAS blob as it is just 20 bytes (under QEMU). Since FWNMI reset delivery was added, RTAS_SIZE macro is not used anymore. This replaces RTAS_SIZE with RTAS_MIN_SIZE and uses it in the /rtas/rtas-size calculation to account for the RTAS blob. Fixes: `0e236d3477` ("ppc/spapr: Implement FWNMI System Reset delivery") Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Message-Id: <20210622070336.1463250-1-aik@ozlabs.ru> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-07-09 10:38:18 +10:00
Greg Kurz	642f6f59cd	target/ppc: Drop PowerPCCPUClass::interrupts_big_endian() This isn't used anymore. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <20210622140926.677618-3-groug@kaod.org> Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-07-09 10:38:18 +10:00
Greg Kurz	c11dc15d3a	target/ppc: Introduce ppc_interrupts_little_endian() PowerPC CPUs use big endian by default but starting with POWER7, server grade CPUs use the ILE bit of the LPCR special purpose register to decide on the endianness to use when handling interrupts. This gives a clue to QEMU on the endianness the guest kernel is running, which is needed when generating an ELF dump of the guest or when delivering an FWNMI machine check interrupt. Commit `382d2db62b` ("target-ppc: Introduce callback for interrupt endianness") added a class method to PowerPCCPUClass to modelize this : default implementation returns a fixed "big endian" value, while POWER7 and newer do the LPCR_ILE check. This is suboptimal as it forces to implement the method for every new CPU family, and it is very unlikely that this will ever be different than what we have today. We basically only have three cases to consider: a) CPU doesn't have an LPCR => big endian b) CPU has an LPCR but doesn't support the ILE bit => big endian c) CPU has an LPCR and supports the ILE bit => little or big endian Instead of class methods, introduce an inline helper that checks the ILE bit in the LPCR_MASK to decide on the outcome. The new helper words little endian instead of big endian. This allows to drop a ! operator in ppc_cpu_do_fwnmi_machine_check(). Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <20210622140926.677618-2-groug@kaod.org> Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-07-09 10:38:18 +10:00
Peter Maydell	53c0123118	Pull request -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEhpWov9P5fNqsNXdanKSrs4Grc8gFAmDm+YkACgkQnKSrs4Gr c8jkfQf+PuiSrU9t5MjU//jKPBpx/Jl9FqsKIlWC+6NOYQFhzD7A7rftuOSG1HOW Cil9rWO/57u/i7DcrABRlMkZ17zv6tP0876LRXRybVnjX4pQ4ayEtMio4WdjAna+ 1hL9P+c6tuo6wlGkM+R2yX4QIOSw+74Qbkh8sLrVwGYzA94erXbuJi/iix8t11iR joiNZ/k6yZPyMbzgT11A0On5qdaNNVfQeA59pyQEnucia02285VkCPLMaO+Trkgl 7VQ1V3cNAdWVaPU0kXDCx595K1BlfrGHNimxfq2UeUZd2AvqHCohlG/egrEHfOu2 Ek4dAqXn6p2eufGi/BzQi8V4J0nPRA== =SDmL -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/stefanha-gitlab/tags/block-pull-request' into staging Pull request # gpg: Signature made Thu 08 Jul 2021 14:11:37 BST # gpg: using RSA key 8695A8BFD3F97CDAAC35775A9CA4ABB381AB73C8 # gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>" [full] # gpg: aka "Stefan Hajnoczi <stefanha@gmail.com>" [full] # Primary key fingerprint: 8695 A8BF D3F9 7CDA AC35 775A 9CA4 ABB3 81AB 73C8 * remotes/stefanha-gitlab/tags/block-pull-request: block/io: Merge discard request alignments block: Add backend_defaults property block/file-posix: Optimize for macOS util/async: print leaked BH name when AioContext finalizes util/async: add a human-readable name to BHs for debugging Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2021-07-08 22:17:28 +01:00
David Hildenbrand	53d1b5fcfb	vfio: Disable only uncoordinated discards for VFIO_TYPE1 iommus We support coordinated discarding of RAM using the RamDiscardManager for the VFIO_TYPE1 iommus. Let's unlock support for coordinated discards, keeping uncoordinated discards (e.g., via virtio-balloon) disabled if possible. This unlocks virtio-mem + vfio on x86-64. Note that vfio used via "nvme://" by the block layer has to be implemented/unlocked separately. For now, virtio-mem only supports x86-64; we don't restrict RamDiscardManager to x86-64, though: arm64 and s390x are supposed to work as well, and we'll test once unlocking virtio-mem support. The spapr IOMMUs will need special care, to be tackled later, e.g.., once supporting virtio-mem. Note: The block size of a virtio-mem device has to be set to sane sizes, depending on the maximum hotplug size - to not run out of vfio mappings. The default virtio-mem block size is usually in the range of a couple of MBs. The maximum number of mapping is 64k, shared with other users. Assume you want to hotplug 256GB using virtio-mem - the block size would have to be set to at least 8 MiB (resulting in 32768 separate mappings). Acked-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Dr. David Alan Gilbert <dgilbert@redhat.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Cc: Peter Xu <peterx@redhat.com> Cc: Auger Eric <eric.auger@redhat.com> Cc: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: teawater <teawaterz@linux.alibaba.com> Cc: Marek Kedzierski <mkedzier@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20210413095531.25603-14-david@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2021-07-08 15:54:45 -04:00
David Hildenbrand	bc072ed403	virtio-mem: Require only coordinated discards We implement the RamDiscardManager interface and only require coordinated discarding of RAM to work. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Pankaj Gupta <pankaj.gupta@cloud.ionos.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Dr. David Alan Gilbert <dgilbert@redhat.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Cc: Peter Xu <peterx@redhat.com> Cc: Auger Eric <eric.auger@redhat.com> Cc: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: teawater <teawaterz@linux.alibaba.com> Cc: Marek Kedzierski <mkedzier@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20210413095531.25603-13-david@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2021-07-08 15:54:45 -04:00
David Hildenbrand	7e6d32ebf7	softmmu/physmem: Extend ram_block_discard_(require\|disable) by two discard types We want to separate the two cases whereby we discard ram - uncoordinated: e.g., virito-balloon - coordinated: e.g., virtio-mem coordinated via the RamDiscardManager Reviewed-by: Pankaj Gupta <pankaj.gupta@cloud.ionos.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Dr. David Alan Gilbert <dgilbert@redhat.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Cc: Peter Xu <peterx@redhat.com> Cc: Auger Eric <eric.auger@redhat.com> Cc: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: teawater <teawaterz@linux.alibaba.com> Cc: Marek Kedzierski <mkedzier@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20210413095531.25603-12-david@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2021-07-08 15:54:45 -04:00
David Hildenbrand	98da491dff	softmmu/physmem: Don't use atomic operations in ram_block_discard_(disable\|require) We have users in migration context that don't hold the BQL (when finishing migration). To prepare for further changes, use a dedicated mutex instead of atomic operations. Keep using qatomic_read ("READ_ONCE") for the functions that only extract the current state (e.g., used by virtio-balloon), locking isn't necessary. While at it, split up the counter into two variables to make it easier to understand. Suggested-by: Peter Xu <peterx@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Pankaj Gupta <pankaj.gupta@cloud.ionos.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Dr. David Alan Gilbert <dgilbert@redhat.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Cc: Peter Xu <peterx@redhat.com> Cc: Auger Eric <eric.auger@redhat.com> Cc: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: teawater <teawaterz@linux.alibaba.com> Cc: Marek Kedzierski <mkedzier@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20210413095531.25603-11-david@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2021-07-08 15:54:45 -04:00
David Hildenbrand	0fd7616e0f	vfio: Support for RamDiscardManager in the vIOMMU case vIOMMU support works already with RamDiscardManager as long as guests only map populated memory. Both, populated and discarded memory is mapped into &address_space_memory, where vfio_get_xlat_addr() will find that memory, to create the vfio mapping. Sane guests will never map discarded memory (e.g., unplugged memory blocks in virtio-mem) into an IOMMU - or keep it mapped into an IOMMU while memory is getting discarded. However, there are two cases where a malicious guests could trigger pinning of more memory than intended. One case is easy to handle: the guest trying to map discarded memory into an IOMMU. The other case is harder to handle: the guest keeping memory mapped in the IOMMU while it is getting discarded. We would have to walk over all mappings when discarding memory and identify if any mapping would be a violation. Let's keep it simple for now and print a warning, indicating that setting RLIMIT_MEMLOCK can mitigate such attacks. We have to take care of incoming migration: at the point the IOMMUs get restored and start creating mappings in vfio, RamDiscardManager implementations might not be back up and running yet: let's add runstate priorities to enforce the order when restoring. Acked-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Dr. David Alan Gilbert <dgilbert@redhat.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Cc: Peter Xu <peterx@redhat.com> Cc: Auger Eric <eric.auger@redhat.com> Cc: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: teawater <teawaterz@linux.alibaba.com> Cc: Marek Kedzierski <mkedzier@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20210413095531.25603-10-david@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2021-07-08 15:54:45 -04:00
David Hildenbrand	a74317f636	vfio: Sanity check maximum number of DMA mappings with RamDiscardManager Although RamDiscardManager can handle running into the maximum number of DMA mappings by propagating errors when creating a DMA mapping, we want to sanity check and warn the user early that there is a theoretical setup issue and that virtio-mem might not be able to provide as much memory towards a VM as desired. As suggested by Alex, let's use the number of KVM memory slots to guess how many other mappings we might see over time. Acked-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Dr. David Alan Gilbert <dgilbert@redhat.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Cc: Peter Xu <peterx@redhat.com> Cc: Auger Eric <eric.auger@redhat.com> Cc: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: teawater <teawaterz@linux.alibaba.com> Cc: Marek Kedzierski <mkedzier@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20210413095531.25603-9-david@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2021-07-08 15:54:45 -04:00
David Hildenbrand	3eed155caf	vfio: Query and store the maximum number of possible DMA mappings Let's query the maximum number of possible DMA mappings by querying the available mappings when creating the container (before any mappings are created). We'll use this informaton soon to perform some sanity checks and warn the user. Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Dr. David Alan Gilbert <dgilbert@redhat.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Cc: Peter Xu <peterx@redhat.com> Cc: Auger Eric <eric.auger@redhat.com> Cc: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: teawater <teawaterz@linux.alibaba.com> Cc: Marek Kedzierski <mkedzier@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20210413095531.25603-8-david@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2021-07-08 15:54:45 -04:00
David Hildenbrand	5e3b981c33	vfio: Support for RamDiscardManager in the !vIOMMU case Implement support for RamDiscardManager, to prepare for virtio-mem support. Instead of mapping the whole memory section, we only map "populated" parts and update the mapping when notified about discarding/population of memory via the RamDiscardListener. Similarly, when syncing the dirty bitmaps, sync only the actually mapped (populated) parts by replaying via the notifier. Using virtio-mem with vfio is still blocked via ram_block_discard_disable()/ram_block_discard_require() after this patch. Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Dr. David Alan Gilbert <dgilbert@redhat.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Cc: Peter Xu <peterx@redhat.com> Cc: Auger Eric <eric.auger@redhat.com> Cc: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: teawater <teawaterz@linux.alibaba.com> Cc: Marek Kedzierski <mkedzier@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20210413095531.25603-7-david@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2021-07-08 15:54:45 -04:00
David Hildenbrand	2044969f0b	virtio-mem: Implement RamDiscardManager interface Let's properly notify when (un)plugging blocks, after discarding memory and before allowing the guest to consume memory. Handle errors from notifiers gracefully (e.g., no remaining VFIO mappings) when plugging, rolling back the change and telling the guest that the VM is busy. One special case to take care of is replaying all notifications after restoring the vmstate. The device starts out with all memory discarded, so after loading the vmstate, we have to notify about all plugged blocks. Acked-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Dr. David Alan Gilbert <dgilbert@redhat.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Cc: Peter Xu <peterx@redhat.com> Cc: Auger Eric <eric.auger@redhat.com> Cc: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: teawater <teawaterz@linux.alibaba.com> Cc: Marek Kedzierski <mkedzier@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20210413095531.25603-6-david@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2021-07-08 15:54:45 -04:00
David Hildenbrand	3aca6380fd	virtio-mem: Don't report errors when ram_block_discard_range() fails Any errors are unexpected and ram_block_discard_range() already properly prints errors. Let's stop manually reporting errors. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Dr. David Alan Gilbert <dgilbert@redhat.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Cc: Peter Xu <peterx@redhat.com> Cc: Auger Eric <eric.auger@redhat.com> Cc: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: teawater <teawaterz@linux.alibaba.com> Cc: Marek Kedzierski <mkedzier@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20210413095531.25603-5-david@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2021-07-08 15:54:45 -04:00
David Hildenbrand	7a9d5d0282	virtio-mem: Factor out traversing unplugged ranges Let's factor out the core logic, no need to replicate. Reviewed-by: Pankaj Gupta <pankaj.gupta@cloud.ionos.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Dr. David Alan Gilbert <dgilbert@redhat.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Cc: Peter Xu <peterx@redhat.com> Cc: Auger Eric <eric.auger@redhat.com> Cc: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: teawater <teawaterz@linux.alibaba.com> Cc: Marek Kedzierski <mkedzier@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20210413095531.25603-4-david@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2021-07-08 15:54:45 -04:00
David Hildenbrand	228438384e	memory: Helpers to copy/free a MemoryRegionSection In case one wants to create a permanent copy of a MemoryRegionSections, one needs access to flatview_ref()/flatview_unref(). Instead of exposing these, let's just add helpers to copy/free a MemoryRegionSection and properly adjust references. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Dr. David Alan Gilbert <dgilbert@redhat.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Cc: Peter Xu <peterx@redhat.com> Cc: Auger Eric <eric.auger@redhat.com> Cc: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: teawater <teawaterz@linux.alibaba.com> Cc: Marek Kedzierski <mkedzier@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20210413095531.25603-3-david@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2021-07-08 15:54:45 -04:00
David Hildenbrand	8947d7fc4e	memory: Introduce RamDiscardManager for RAM memory regions We have some special RAM memory regions (managed by virtio-mem), whereby the guest agreed to only use selected memory ranges. "unused" parts are discarded so they won't consume memory - to logically unplug these memory ranges. Before the VM is allowed to use such logically unplugged memory again, coordination with the hypervisor is required. This results in "sparse" mmaps/RAMBlocks/memory regions, whereby only coordinated parts are valid to be used/accessed by the VM. In most cases, we don't care about that - e.g., in KVM, we simply have a single KVM memory slot. However, in case of vfio, registering the whole region with the kernel results in all pages getting pinned, and therefore an unexpected high memory consumption - discarding of RAM in that context is broken. Let's introduce a way to coordinate discarding/populating memory within a RAM memory region with such special consumers of RAM memory regions: they can register as listeners and get updates on memory getting discarded and populated. Using this machinery, vfio will be able to map only the currently populated parts, resulting in discarded parts not getting pinned and not consuming memory. A RamDiscardManager has to be set for a memory region before it is getting mapped, and cannot change while the memory region is mapped. Note: At some point, we might want to let RAMBlock users (esp. vfio used for nvme://) consume this interface as well. We'll need RAMBlock notifier calls when a RAMBlock is getting mapped/unmapped (via the corresponding memory region), so we can properly register a listener there as well. Reviewed-by: Pankaj Gupta <pankaj.gupta@cloud.ionos.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Dr. David Alan Gilbert <dgilbert@redhat.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Cc: Peter Xu <peterx@redhat.com> Cc: Auger Eric <eric.auger@redhat.com> Cc: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: teawater <teawaterz@linux.alibaba.com> Cc: Marek Kedzierski <mkedzier@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20210413095531.25603-2-david@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2021-07-08 15:54:37 -04:00

... 2 3 4 5 6 ...

88868 Commits