8504f12945
It is assumed that the whole GPA space is available to be DMA addressable, within a given address space limit, except for a tiny region before the 4G. Since Linux v5.4, VFIO validates whether the selected GPA is indeed valid i.e. not reserved by IOMMU on behalf of some specific devices or platform-defined restrictions, and thus failing the ioctl(VFIO_DMA_MAP) with -EINVAL. AMD systems with an IOMMU are examples of such platforms and particularly may only have these ranges as allowed: 0000000000000000 - 00000000fedfffff (0 .. 3.982G) 00000000fef00000 - 000000fcffffffff (3.983G .. 1011.9G) 0000010000000000 - ffffffffffffffff (1Tb .. 16Pb[*]) We already account for the 4G hole, albeit if the guest is big enough we will fail to allocate a guest with >1010G due to the ~12G hole at the 1Tb boundary, reserved for HyperTransport (HT). [*] there is another reserved region unrelated to HT that exists in the 256T boundary in Fam 17h according to Errata #1286, documeted also in "Open-Source Register Reference for AMD Family 17h Processors (PUB)" When creating the region above 4G, take into account that on AMD platforms the HyperTransport range is reserved and hence it cannot be used either as GPAs. On those cases rather than establishing the start of ram-above-4g to be 4G, relocate instead to 1Tb. See AMD IOMMU spec, section 2.1.2 "IOMMU Logical Topology", for more information on the underlying restriction of IOVAs. After accounting for the 1Tb hole on AMD hosts, mtree should look like: 0000000000000000-000000007fffffff (prio 0, i/o): alias ram-below-4g @pc.ram 0000000000000000-000000007fffffff 0000010000000000-000001ff7fffffff (prio 0, i/o): alias ram-above-4g @pc.ram 0000000080000000-000000ffffffffff If the relocation is done or the address space covers it, we also add the the reserved HT e820 range as reserved. Default phys-bits on Qemu is TCG_PHYS_ADDR_BITS (40) which is enough to address 1Tb (0xff ffff ffff). On AMD platforms, if a ram-above-4g relocation is attempted and the CPU wasn't configured with a big enough phys-bits, an error message will be printed due to the maxphysaddr vs maxusedaddr check previously added. Suggested-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Acked-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <20220719170014.27028-11-joao.m.martins@oracle.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> |
||
---|---|---|
.. | ||
kvm | ||
xen | ||
acpi-build.c | ||
acpi-build.h | ||
acpi-common.c | ||
acpi-common.h | ||
acpi-microvm.c | ||
acpi-microvm.h | ||
amd_iommu.c | ||
amd_iommu.h | ||
e820_memory_layout.c | ||
e820_memory_layout.h | ||
fw_cfg.c | ||
fw_cfg.h | ||
generic_event_device_x86.c | ||
intel_iommu_internal.h | ||
intel_iommu.c | ||
Kconfig | ||
kvmvapic.c | ||
meson.build | ||
microvm-dt.c | ||
microvm-dt.h | ||
microvm.c | ||
multiboot.c | ||
multiboot.h | ||
pc_piix.c | ||
pc_q35.c | ||
pc_sysfw_ovmf-stubs.c | ||
pc_sysfw_ovmf.c | ||
pc_sysfw.c | ||
pc.c | ||
port92.c | ||
sgx-epc.c | ||
sgx-stub.c | ||
sgx.c | ||
trace-events | ||
trace.h | ||
vmmouse.c | ||
vmport.c | ||
x86-iommu-stub.c | ||
x86-iommu.c | ||
x86.c |