qemu-e2k/include/hw
Alexey Kardashevskiy ec132efaa8 spapr: Support NVIDIA V100 GPU with NVLink2
NVIDIA V100 GPUs have on-board RAM which is mapped into the host memory
space and accessible as normal RAM via an NVLink bus. The VFIO-PCI driver
implements special regions for such GPUs and emulates an NVLink bridge.
NVLink2-enabled POWER9 CPUs also provide address translation services
which includes an ATS shootdown (ATSD) register exported via the NVLink
bridge device.

This adds a quirk to VFIO to map the GPU memory and create an MR;
the new MR is stored in a PCI device as a QOM link. The sPAPR PCI uses
this to get the MR and map it to the system address space.
Another quirk does the same for ATSD.

This adds additional steps to sPAPR PHB setup:

1. Search for specific GPUs and NPUs, collect findings in
sPAPRPHBState::nvgpus, manage system address space mappings;

2. Add device-specific properties such as "ibm,npu", "ibm,gpu",
"memory-block", "link-speed" to advertise the NVLink2 function to
the guest;

3. Add "mmio-atsd" to vPHB to advertise the ATSD capability;

4. Add new memory blocks (with extra "linux,memory-usable" to prevent
the guest OS from accessing the new memory until it is onlined) and
npuphb# nodes representing an NPU unit for every vPHB as the GPU driver
uses it for link discovery.

This allocates space for GPU RAM and ATSD like we do for MMIOs by
adding 2 new parameters to the phb_placement() hook. Older machine types
set these to zero.

This puts new memory nodes in a separate NUMA node to as the GPU RAM
needs to be configured equally distant from any other node in the system.
Unlike the host setup which assigns numa ids from 255 downwards, this
adds new NUMA nodes after the user configures nodes or from 1 if none
were configured.

This adds requirement similar to EEH - one IOMMU group per vPHB.
The reason for this is that ATSD registers belong to a physical NPU
so they cannot invalidate translations on GPUs attached to another NPU.
It is guaranteed by the host platform as it does not mix NVLink bridges
or GPUs from different NPU in the same IOMMU group. If more than one
IOMMU group is detected on a vPHB, this disables ATSD support for that
vPHB and prints a warning.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
[aw: for vfio portions]
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Message-Id: <20190312082103.130561-1-aik@ozlabs.ru>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-04-26 10:41:23 +10:00
..
acpi i386, acpi: check acpi_memory_hotplug capacity in pre_plug 2019-03-12 22:31:21 -04:00
adc
arm hw/arm/virt: Dynamic memory map depending on RAM requirements 2019-03-05 15:55:09 +00:00
audio
block pflash: Require backend size to match device, improve errors 2019-03-26 08:16:24 +01:00
char hw/char/pl011: Support all interrupt lines 2019-02-21 18:17:46 +00:00
core
cpu qom/cpu: Add cluster_index to CPUState 2019-01-29 11:46:05 +00:00
cris
display hw/display/milkymist-tmu2: Move inlined code from header to source 2019-02-01 11:58:50 +01:00
dma
firmware hw/smbios: fix offset of type 3 sku field 2019-02-22 10:51:31 -05:00
gpio
hyperv
i2c i2c:smbus_slave: Add an SMBus vmstate structure 2019-02-27 21:06:08 -06:00
i386 intel_iommu: Drop extended root field 2019-04-02 11:49:14 -04:00
ide hw/ide: drop iov field from IDEDMA 2019-02-22 09:42:13 +00:00
input hw/input/ps2: Remove PS2State from "qemu/typedefs.h" 2019-01-22 05:14:32 +01:00
intc hw/intc/bcm2836_control: Implement local timer 2019-03-15 11:12:28 +00:00
ipack
ipmi
isa
kvm
lm32
m68k
mem nvdimm: Rename AcpiNVDIMMState into NVDIMMState 2019-03-11 10:44:21 -03:00
mips
misc hw/arm/armsse: Unify init-svtor and cpuwait handling 2019-02-28 11:03:04 +00:00
net
nvram hw/nvram/nrf51_nvm: Add nRF51 non-volatile memories 2019-02-01 15:31:26 +00:00
pci pci: Allow PCI bus subtypes to support extended config space accesses 2019-04-09 09:14:47 +10:00
pci-bridge
pci-host spapr: Support NVIDIA V100 GPU with NVLink2 2019-04-26 10:41:23 +10:00
ppc spapr: Support NVIDIA V100 GPU with NVLink2 2019-04-26 10:41:23 +10:00
rdma {hmp, hw/pvrdma}: Expose device internals via monitor interface 2019-03-16 15:52:44 +02:00
riscv riscv: plic: Fix incorrect irq calculation 2019-04-04 16:36:19 -07:00
s390x target/s390x: Split out s390-tod.h 2019-02-18 11:25:43 +01:00
scsi scsi: esp: Defer command completion until previous interrupts have been handled 2019-01-11 13:57:24 +01:00
sd
sh4 avoid TABs in files that only contain a few 2019-01-11 15:46:56 +01:00
sparc
ssi aspeed/smc: snoop SPI transfers to fake dummy cycles 2019-01-29 11:46:05 +00:00
timer hw/timer/pl031: Allow use as an embedded-struct device 2019-02-21 18:17:46 +00:00
tricore
unicore32
usb
vfio VFIO updates 2019-03-11 2019-03-12 13:37:29 +00:00
virtio virtio-gpu: delay virglrenderer reset when blocked. 2019-03-18 13:10:57 +01:00
watchdog hw/arm/stellaris: Implement watchdog timer 2019-03-05 15:55:09 +00:00
xen pvh: Add x86/HVM direct boot ABI header file 2019-02-05 16:50:16 +01:00
xtensa target/xtensa: add MX interrupt controller 2019-01-28 11:55:20 -08:00
boards.h machine: Move nvdimms state into struct MachineState 2019-03-11 10:44:25 -03:00
bt.h
devices.h hw/devices: Remove unused TC6393XB_RAM definition 2019-03-07 22:16:22 +01:00
elf_ops.h elf-ops.h: Add get_elf_note_type() 2019-02-05 16:50:16 +01:00
empty_slot.h
fw-path-provider.h
hotplug.h
hw.h
ide.h ide/via: Rename functions to match device name 2019-01-25 14:52:12 -05:00
irq.h
loader-fit.h
loader.h elf: Add optional function ptr to load_elf() to parse ELF notes 2019-02-05 16:50:16 +01:00
nmi.h
or-irq.h
pcmcia.h hw/pcmcia: Remove PCMCIACardState from "qemu/typedefs.h" 2019-01-22 05:14:32 +01:00
platform-bus.h
ptimer.h
qdev-core.h qom: Move compat_props machinery from qdev to QOM 2019-03-11 22:53:44 +01:00
qdev-dma.h
qdev-properties.h
qdev.h
register.h
registerfields.h
stream.h
sysbus.h
usb.h qemu/queue.h: simplify reverse access to QTAILQ 2019-01-11 15:46:55 +01:00