qemu-e2k/hw/acpi
Ankit Agrawal b64b7ed8bb qom: new object to associate device to NUMA node
NVIDIA GPU's support MIG (Mult-Instance GPUs) feature [1], which allows
partitioning of the GPU device resources (including device memory) into
several (upto 8) isolated instances. Each of the partitioned memory needs
a dedicated NUMA node to operate. The partitions are not fixed and they
can be created/deleted at runtime.

Unfortunately Linux OS does not provide a means to dynamically create/destroy
NUMA nodes and such feature implementation is not expected to be trivial. The
nodes that OS discovers at the boot time while parsing SRAT remains fixed. So
we utilize the Generic Initiator (GI) Affinity structures that allows
association between nodes and devices. Multiple GI structures per BDF is
possible, allowing creation of multiple nodes by exposing unique PXM in each
of these structures.

Implement the mechanism to build the GI affinity structures as Qemu currently
does not. Introduce a new acpi-generic-initiator object to allow host admin
link a device with an associated NUMA node. Qemu maintains this association
and use this object to build the requisite GI Affinity Structure.

When multiple NUMA nodes are associated with a device, it is required to
create those many number of acpi-generic-initiator objects, each representing
a unique device:node association.

Following is one of a decoded GI affinity structure in VM ACPI SRAT.
[0C8h 0200   1]                Subtable Type : 05 [Generic Initiator Affinity]
[0C9h 0201   1]                       Length : 20

[0CAh 0202   1]                    Reserved1 : 00
[0CBh 0203   1]           Device Handle Type : 01
[0CCh 0204   4]             Proximity Domain : 00000007
[0D0h 0208  16]                Device Handle : 00 00 20 00 00 00 00 00 00 00 00
00 00 00 00 00
[0E0h 0224   4]        Flags (decoded below) : 00000001
                                     Enabled : 1
[0E4h 0228   4]                    Reserved2 : 00000000

[0E8h 0232   1]                Subtable Type : 05 [Generic Initiator Affinity]
[0E9h 0233   1]                       Length : 20

An admin can provide a range of acpi-generic-initiator objects, each
associating a device (by providing the id through pci-dev argument)
to the desired NUMA node (using the node argument). Currently, only PCI
device is supported.

For the grace hopper system, create a range of 8 nodes and associate that
with the device using the acpi-generic-initiator object. While a configuration
of less than 8 nodes per device is allowed, such configuration will prevent
utilization of the feature to the fullest. The following sample creates 8
nodes per PCI device for a VM with 2 PCI devices and link them to the
respecitve PCI device using acpi-generic-initiator objects:

-numa node,nodeid=2 -numa node,nodeid=3 -numa node,nodeid=4 \
-numa node,nodeid=5 -numa node,nodeid=6 -numa node,nodeid=7 \
-numa node,nodeid=8 -numa node,nodeid=9 \
-device vfio-pci-nohotplug,host=0009:01:00.0,bus=pcie.0,addr=04.0,rombar=0,id=dev0 \
-object acpi-generic-initiator,id=gi0,pci-dev=dev0,node=2 \
-object acpi-generic-initiator,id=gi1,pci-dev=dev0,node=3 \
-object acpi-generic-initiator,id=gi2,pci-dev=dev0,node=4 \
-object acpi-generic-initiator,id=gi3,pci-dev=dev0,node=5 \
-object acpi-generic-initiator,id=gi4,pci-dev=dev0,node=6 \
-object acpi-generic-initiator,id=gi5,pci-dev=dev0,node=7 \
-object acpi-generic-initiator,id=gi6,pci-dev=dev0,node=8 \
-object acpi-generic-initiator,id=gi7,pci-dev=dev0,node=9 \

-numa node,nodeid=10 -numa node,nodeid=11 -numa node,nodeid=12 \
-numa node,nodeid=13 -numa node,nodeid=14 -numa node,nodeid=15 \
-numa node,nodeid=16 -numa node,nodeid=17 \
-device vfio-pci-nohotplug,host=0009:01:01.0,bus=pcie.0,addr=05.0,rombar=0,id=dev1 \
-object acpi-generic-initiator,id=gi8,pci-dev=dev1,node=10 \
-object acpi-generic-initiator,id=gi9,pci-dev=dev1,node=11 \
-object acpi-generic-initiator,id=gi10,pci-dev=dev1,node=12 \
-object acpi-generic-initiator,id=gi11,pci-dev=dev1,node=13 \
-object acpi-generic-initiator,id=gi12,pci-dev=dev1,node=14 \
-object acpi-generic-initiator,id=gi13,pci-dev=dev1,node=15 \
-object acpi-generic-initiator,id=gi14,pci-dev=dev1,node=16 \
-object acpi-generic-initiator,id=gi15,pci-dev=dev1,node=17 \

Link: https://www.nvidia.com/en-in/technologies/multi-instance-gpu [1]
Cc: Jonathan Cameron <qemu-devel@nongnu.org>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Ankit Agrawal <ankita@nvidia.com>
Message-Id: <20240308145525.10886-2-ankita@nvidia.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-12 17:56:55 -04:00
..
acpi_generic_initiator.c qom: new object to associate device to NUMA node 2024-03-12 17:56:55 -04:00
acpi_interface.c hw/isa/isa-bus: Turn isa_build_aml() into qbus_build_aml() 2023-01-27 11:47:02 -05:00
acpi-cpu-hotplug-stub.c hw/acpi: refactor acpi hp modules so that targets can just use what they need 2021-09-04 09:07:46 -04:00
acpi-mem-hotplug-stub.c hw/acpi: refactor acpi hp modules so that targets can just use what they need 2021-09-04 09:07:46 -04:00
acpi-nvdimm-stub.c hw/acpi: refactor acpi hp modules so that targets can just use what they need 2021-09-04 09:07:46 -04:00
acpi-pci-hotplug-stub.c pcihp: add ACPI PCI hotplug specific is_hotpluggable_bus() callback 2023-03-07 12:39:00 -05:00
acpi-qmp-cmds.c acpi: Move the QMP command from monitor/ to hw/acpi/ 2023-02-04 07:56:54 +01:00
acpi-stub.c hw/acpi: Dumb down acpi_table_add() stub 2023-02-23 14:10:17 +01:00
acpi-x86-stub.c hw/acpi/acpi_dev_interface: Remove now unused madt_cpu virtual method 2023-10-04 18:15:05 -04:00
aml-build-stub.c acpi: pc: vga: use AcpiDevAmlIf interface to build VGA device descriptors 2022-11-07 14:00:29 -05:00
aml-build.c hw/arm/virt-acpi-build.c: Migrate SPCR creation to common location 2024-03-08 15:38:46 +10:00
bios-linker-loader.c
core.c hw/acpi/core: Trace enable and status registers of GPE separately 2023-10-04 18:15:05 -04:00
cpu_hotplug.c hw/acpi/cpu_hotplug: Include 'x86.h' instead of 'pc.h' 2024-02-20 20:34:21 +03:00
cpu.c hw/acpi: Constify VMState 2023-12-29 11:17:30 +11:00
cxl-stub.c acpi/cxl: Add _OSC implementation (9.14.2) 2022-05-13 06:13:36 -04:00
cxl.c hw/cxl: Add QTG _DSM support for ACPI0017 device 2023-10-22 05:18:17 -04:00
erst.c hw/acpi: Constify VMState 2023-12-29 11:17:30 +11:00
generic_event_device.c hw/acpi: Constify VMState 2023-12-29 11:17:30 +11:00
ghes-stub.c
ghes.c Fix 'writeable' typos 2022-06-08 19:38:47 +01:00
hmat.c hw/acpi/acpi_dev_interface: Remove now unused #include "hw/boards.h" 2023-10-04 18:15:05 -04:00
hmat.h hw/acpi/acpi_dev_interface: Remove now unused #include "hw/boards.h" 2023-10-04 18:15:05 -04:00
ich9_tco.c hw/acpi: Constify VMState 2023-12-29 11:17:30 +11:00
ich9.c hw/acpi: Constify VMState 2023-12-29 11:17:30 +11:00
ipmi-stub.c acpi: ipmi: use AcpiDevAmlIf interface to build IPMI device descriptors 2022-06-09 19:32:49 -04:00
ipmi.c acpi: ipmi: use AcpiDevAmlIf interface to build IPMI device descriptors 2022-06-09 19:32:49 -04:00
Kconfig pcihp: make bridge describe itself using AcpiDevAmlIfClass:build_dev_aml 2023-01-28 06:21:29 -05:00
memory_hotplug.c hw/acpi: Constify VMState 2023-12-29 11:17:30 +11:00
meson.build qom: new object to associate device to NUMA node 2024-03-12 17:56:55 -04:00
nvdimm.c hw/other: spelling fixes 2023-09-21 11:31:16 +03:00
pci-bridge-stub.c pcihp: make bridge describe itself using AcpiDevAmlIfClass:build_dev_aml 2023-01-28 06:21:29 -05:00
pci-bridge.c acpi: pci: move out ACPI PCI hotplug generator from generic slot generator build_append_pci_bus_devices() 2023-03-07 12:39:00 -05:00
pci.c acpi: build_mcfg: use acpi_table_begin()/acpi_table_end() instead of build_header() 2021-10-05 17:30:57 -04:00
pcihp.c hw/acpi: Constify VMState 2023-12-29 11:17:30 +11:00
piix4.c hw/acpi: Constify VMState 2023-12-29 11:17:30 +11:00
tpm.c
trace-events hw/acpi/core: Trace enable and status registers of GPE separately 2023-10-04 18:15:05 -04:00
trace.h
utils.c
viot.c hw/acpi/viot: sort VIOT ACPI table entries by PCI host bridge min_bus 2022-06-09 19:32:49 -04:00
viot.h hw/acpi: Add VIOT table 2021-11-01 18:49:10 -04:00
vmgenid.c hw/acpi: Constify VMState 2023-12-29 11:17:30 +11:00