qemu-e2k/qemu-options.hx

5490 lines
231 KiB
Haxe
Raw Permalink Normal View History

HXCOMM Use DEFHEADING() to define headings in both help text and rST.
HXCOMM Text between SRST and ERST is copied to the rST version and
HXCOMM discarded from C version.
HXCOMM DEF(option, HAS_ARG/0, opt_enum, opt_help, arch_mask) is used to
HXCOMM construct option structures, enums and help message for specified
HXCOMM architectures.
HXCOMM HXCOMM can be used for comments, discarded from both rST and C.
DEFHEADING(Standard options:)
DEF("help", 0, QEMU_OPTION_h,
"-h or -help display this help and exit\n", QEMU_ARCH_ALL)
SRST
``-h``
Display help and exit
ERST
DEF("version", 0, QEMU_OPTION_version,
"-version display version information and exit\n", QEMU_ARCH_ALL)
SRST
``-version``
Display version information and exit
ERST
DEF("machine", HAS_ARG, QEMU_OPTION_machine, \
"-machine [type=]name[,prop[=value][,...]]\n"
" selects emulated machine ('-machine help' for list)\n"
" property accel=accel1[:accel2[:...]] selects accelerator\n"
" supported accelerators are kvm, xen, hax, hvf, nvmm, whpx or tcg (default: tcg)\n"
" vmport=on|off|auto controls emulation of vmport (default: auto)\n"
" dump-guest-core=on|off include guest memory in a core dump (default=on)\n"
" mem-merge=on|off controls memory merge support (default: on)\n"
" aes-key-wrap=on|off controls support for AES key wrapping (default=on)\n"
" dea-key-wrap=on|off controls support for DEA key wrapping (default=on)\n"
" suppress-vmdesc=on|off disables self-describing migration (default=off)\n"
" nvdimm=on|off controls NVDIMM support (default=off)\n"
" memory-encryption=@var{} memory encryption object to use (default=none)\n"
" hmat=on|off controls ACPI HMAT support (default=off)\n"
" memory-backend='backend-id' specifies explicitly provided backend for main RAM (default=none)\n",
QEMU_ARCH_ALL)
SRST
``-machine [type=]name[,prop=value[,...]]``
Select the emulated machine by name. Use ``-machine help`` to list
available machines.
For architectures which aim to support live migration compatibility
across releases, each release will introduce a new versioned machine
type. For example, the 2.8.0 release introduced machine types
"pc-i440fx-2.8" and "pc-q35-2.8" for the x86\_64/i686 architectures.
To allow live migration of guests from QEMU version 2.8.0, to QEMU
version 2.9.0, the 2.9.0 version must support the "pc-i440fx-2.8"
and "pc-q35-2.8" machines too. To allow users live migrating VMs to
skip multiple intermediate releases when upgrading, new releases of
QEMU will support machine types from many previous versions.
Supported machine properties are:
``accel=accels1[:accels2[:...]]``
This is used to enable an accelerator. Depending on the target
architecture, kvm, xen, hax, hvf, nvmm, whpx or tcg can be available.
By default, tcg is used. If there is more than one accelerator
specified, the next one is used if the previous one fails to
initialize.
``vmport=on|off|auto``
Enables emulation of VMWare IO port, for vmmouse etc. auto says
to select the value based on accel. For accel=xen the default is
off otherwise the default is on.
``dump-guest-core=on|off``
Include guest memory in a core dump. The default is on.
``mem-merge=on|off``
Enables or disables memory merge support. This feature, when
supported by the host, de-duplicates identical memory pages
among VMs instances (enabled by default).
``aes-key-wrap=on|off``
Enables or disables AES key wrapping support on s390-ccw hosts.
This feature controls whether AES wrapping keys will be created
to allow execution of AES cryptographic functions. The default
is on.
``dea-key-wrap=on|off``
Enables or disables DEA key wrapping support on s390-ccw hosts.
This feature controls whether DEA wrapping keys will be created
to allow execution of DEA cryptographic functions. The default
is on.
``nvdimm=on|off``
Enables or disables NVDIMM support. The default is off.
``memory-encryption=``
Memory encryption object to use. The default is none.
``hmat=on|off``
Enables or disables ACPI Heterogeneous Memory Attribute Table
(HMAT) support. The default is off.
``memory-backend='id'``
An alternative to legacy ``-mem-path`` and ``mem-prealloc`` options.
Allows to use a memory backend as main RAM.
For example:
::
-object memory-backend-file,id=pc.ram,size=512M,mem-path=/hugetlbfs,prealloc=on,share=on
-machine memory-backend=pc.ram
-m 512M
Migration compatibility note:
* as backend id one shall use value of 'default-ram-id', advertised by
machine type (available via ``query-machines`` QMP command), if migration
to/from old QEMU (<5.0) is expected.
* for machine types 4.0 and older, user shall
use ``x-use-canonical-path-for-ramblock-id=off`` backend option
if migration to/from old QEMU (<5.0) is expected.
For example:
::
-object memory-backend-ram,id=pc.ram,size=512M,x-use-canonical-path-for-ramblock-id=off
-machine memory-backend=pc.ram
-m 512M
ERST
vl: Add sgx compound properties to expose SGX EPC sections to guest Because SGX EPC is enumerated through CPUID, EPC "devices" need to be realized prior to realizing the vCPUs themselves, i.e. long before generic devices are parsed and realized. From a virtualization perspective, the CPUID aspect also means that EPC sections cannot be hotplugged without paravirtualizing the guest kernel (hardware does not support hotplugging as EPC sections must be locked down during pre-boot to provide EPC's security properties). So even though EPC sections could be realized through the generic -devices command, they need to be created much earlier for them to actually be usable by the guest. Place all EPC sections in a contiguous block, somewhat arbitrarily starting after RAM above 4g. Ensuring EPC is in a contiguous region simplifies calculations, e.g. device memory base, PCI hole, etc..., allows dynamic calculation of the total EPC size, e.g. exposing EPC to guests does not require -maxmem, and last but not least allows all of EPC to be enumerated in a single ACPI entry, which is expected by some kernels, e.g. Windows 7 and 8. The new compound properties command for sgx like below: ...... -object memory-backend-epc,id=mem1,size=28M,prealloc=on \ -object memory-backend-epc,id=mem2,size=10M \ -M sgx-epc.0.memdev=mem1,sgx-epc.1.memdev=mem2 Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Yang Zhong <yang.zhong@intel.com> Message-Id: <20210719112136.57018-6-yang.zhong@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-09-28 10:40:58 +02:00
DEF("M", HAS_ARG, QEMU_OPTION_M,
" sgx-epc.0.memdev=memid,sgx-epc.0.node=numaid\n",
vl: Add sgx compound properties to expose SGX EPC sections to guest Because SGX EPC is enumerated through CPUID, EPC "devices" need to be realized prior to realizing the vCPUs themselves, i.e. long before generic devices are parsed and realized. From a virtualization perspective, the CPUID aspect also means that EPC sections cannot be hotplugged without paravirtualizing the guest kernel (hardware does not support hotplugging as EPC sections must be locked down during pre-boot to provide EPC's security properties). So even though EPC sections could be realized through the generic -devices command, they need to be created much earlier for them to actually be usable by the guest. Place all EPC sections in a contiguous block, somewhat arbitrarily starting after RAM above 4g. Ensuring EPC is in a contiguous region simplifies calculations, e.g. device memory base, PCI hole, etc..., allows dynamic calculation of the total EPC size, e.g. exposing EPC to guests does not require -maxmem, and last but not least allows all of EPC to be enumerated in a single ACPI entry, which is expected by some kernels, e.g. Windows 7 and 8. The new compound properties command for sgx like below: ...... -object memory-backend-epc,id=mem1,size=28M,prealloc=on \ -object memory-backend-epc,id=mem2,size=10M \ -M sgx-epc.0.memdev=mem1,sgx-epc.1.memdev=mem2 Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Yang Zhong <yang.zhong@intel.com> Message-Id: <20210719112136.57018-6-yang.zhong@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-09-28 10:40:58 +02:00
QEMU_ARCH_ALL)
SRST
``sgx-epc.0.memdev=@var{memid},sgx-epc.0.node=@var{numaid}``
vl: Add sgx compound properties to expose SGX EPC sections to guest Because SGX EPC is enumerated through CPUID, EPC "devices" need to be realized prior to realizing the vCPUs themselves, i.e. long before generic devices are parsed and realized. From a virtualization perspective, the CPUID aspect also means that EPC sections cannot be hotplugged without paravirtualizing the guest kernel (hardware does not support hotplugging as EPC sections must be locked down during pre-boot to provide EPC's security properties). So even though EPC sections could be realized through the generic -devices command, they need to be created much earlier for them to actually be usable by the guest. Place all EPC sections in a contiguous block, somewhat arbitrarily starting after RAM above 4g. Ensuring EPC is in a contiguous region simplifies calculations, e.g. device memory base, PCI hole, etc..., allows dynamic calculation of the total EPC size, e.g. exposing EPC to guests does not require -maxmem, and last but not least allows all of EPC to be enumerated in a single ACPI entry, which is expected by some kernels, e.g. Windows 7 and 8. The new compound properties command for sgx like below: ...... -object memory-backend-epc,id=mem1,size=28M,prealloc=on \ -object memory-backend-epc,id=mem2,size=10M \ -M sgx-epc.0.memdev=mem1,sgx-epc.1.memdev=mem2 Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Yang Zhong <yang.zhong@intel.com> Message-Id: <20210719112136.57018-6-yang.zhong@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-09-28 10:40:58 +02:00
Define an SGX EPC section.
ERST
DEF("cpu", HAS_ARG, QEMU_OPTION_cpu,
"-cpu cpu select CPU ('-cpu help' for list)\n", QEMU_ARCH_ALL)
SRST
``-cpu model``
Select CPU model (``-cpu help`` for list and additional feature
selection)
ERST
DEF("accel", HAS_ARG, QEMU_OPTION_accel,
"-accel [accel=]accelerator[,prop[=value][,...]]\n"
" select accelerator (kvm, xen, hax, hvf, nvmm, whpx or tcg; use 'help' for a list)\n"
" igd-passthru=on|off (enable Xen integrated Intel graphics passthrough, default=off)\n"
" kernel-irqchip=on|off|split controls accelerated irqchip support (default=on)\n"
" kvm-shadow-mem=size of KVM shadow MMU in bytes\n"
" split-wx=on|off (enable TCG split w^x mapping)\n"
" tb-size=n (TCG translation block cache size)\n"
" dirty-ring-size=n (KVM dirty ring GFN count, default 0)\n"
" thread=single|multi (enable multi-threaded TCG)\n", QEMU_ARCH_ALL)
SRST
``-accel name[,prop=value[,...]]``
This is used to enable an accelerator. Depending on the target
architecture, kvm, xen, hax, hvf, nvmm, whpx or tcg can be available. By
default, tcg is used. If there is more than one accelerator
specified, the next one is used if the previous one fails to
initialize.
``igd-passthru=on|off``
When Xen is in use, this option controls whether Intel
integrated graphics devices can be passed through to the guest
(default=off)
``kernel-irqchip=on|off|split``
Controls KVM in-kernel irqchip support. The default is full
acceleration of the interrupt controllers. On x86, split irqchip
reduces the kernel attack surface, at a performance cost for
non-MSI interrupts. Disabling the in-kernel irqchip completely
is not recommended except for debugging purposes.
``kvm-shadow-mem=size``
Defines the size of the KVM shadow MMU.
``split-wx=on|off``
Controls the use of split w^x mapping for the TCG code generation
buffer. Some operating systems require this to be enabled, and in
such a case this will default on. On other operating systems, this
will default off, but one may enable this for testing or debugging.
``tb-size=n``
Controls the size (in MiB) of the TCG translation block cache.
``thread=single|multi``
Controls number of TCG threads. When the TCG is multi-threaded
there will be one thread per vCPU therefore taking advantage of
additional host cores. The default is to enable multi-threading
where both the back-end and front-ends support it and no
incompatible TCG features have been enabled (e.g.
icount/replay).
``dirty-ring-size=n``
When the KVM accelerator is used, it controls the size of the per-vCPU
dirty page ring buffer (number of entries for each vCPU). It should
be a value that is power of two, and it should be 1024 or bigger (but
still less than the maximum value that the kernel supports). 4096
could be a good initial value if you have no idea which is the best.
Set this value to 0 to disable the feature. By default, this feature
is disabled (dirty-ring-size=0). When enabled, KVM will instead
record dirty pages in a bitmap.
ERST
DEF("smp", HAS_ARG, QEMU_OPTION_smp,
hw/core/machine: Introduce CPU cluster topology support The new Cluster-Aware Scheduling support has landed in Linux 5.16, which has been proved to benefit the scheduling performance (e.g. load balance and wake_affine strategy) on both x86_64 and AArch64. So now in Linux 5.16 we have four-level arch-neutral CPU topology definition like below and a new scheduler level for clusters. struct cpu_topology { int thread_id; int core_id; int cluster_id; int package_id; int llc_id; cpumask_t thread_sibling; cpumask_t core_sibling; cpumask_t cluster_sibling; cpumask_t llc_sibling; } A cluster generally means a group of CPU cores which share L2 cache or other mid-level resources, and it is the shared resources that is used to improve scheduler's behavior. From the point of view of the size range, it's between CPU die and CPU core. For example, on some ARM64 Kunpeng servers, we have 6 clusters in each NUMA node, and 4 CPU cores in each cluster. The 4 CPU cores share a separate L2 cache and a L3 cache tag, which brings cache affinity advantage. In virtualization, on the Hosts which have pClusters (physical clusters), if we can design a vCPU topology with cluster level for guest kernel and have a dedicated vCPU pinning. A Cluster-Aware Guest kernel can also make use of the cache affinity of CPU clusters to gain similar scheduling performance. This patch adds infrastructure for CPU cluster level topology configuration and parsing, so that the user can specify cluster parameter if their machines support it. Signed-off-by: Yanan Wang <wangyanan55@huawei.com> Message-Id: <20211228092221.21068-3-wangyanan55@huawei.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> [PMD: Added '(since 7.0)' to @clusters in qapi/machine.json] Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
2021-12-28 10:22:09 +01:00
"-smp [[cpus=]n][,maxcpus=maxcpus][,sockets=sockets][,dies=dies][,clusters=clusters][,cores=cores][,threads=threads]\n"
" set the number of initial CPUs to 'n' [default=1]\n"
" maxcpus= maximum number of total CPUs, including\n"
" offline CPUs for hotplug, etc\n"
" sockets= number of sockets on the machine board\n"
" dies= number of dies in one socket\n"
hw/core/machine: Introduce CPU cluster topology support The new Cluster-Aware Scheduling support has landed in Linux 5.16, which has been proved to benefit the scheduling performance (e.g. load balance and wake_affine strategy) on both x86_64 and AArch64. So now in Linux 5.16 we have four-level arch-neutral CPU topology definition like below and a new scheduler level for clusters. struct cpu_topology { int thread_id; int core_id; int cluster_id; int package_id; int llc_id; cpumask_t thread_sibling; cpumask_t core_sibling; cpumask_t cluster_sibling; cpumask_t llc_sibling; } A cluster generally means a group of CPU cores which share L2 cache or other mid-level resources, and it is the shared resources that is used to improve scheduler's behavior. From the point of view of the size range, it's between CPU die and CPU core. For example, on some ARM64 Kunpeng servers, we have 6 clusters in each NUMA node, and 4 CPU cores in each cluster. The 4 CPU cores share a separate L2 cache and a L3 cache tag, which brings cache affinity advantage. In virtualization, on the Hosts which have pClusters (physical clusters), if we can design a vCPU topology with cluster level for guest kernel and have a dedicated vCPU pinning. A Cluster-Aware Guest kernel can also make use of the cache affinity of CPU clusters to gain similar scheduling performance. This patch adds infrastructure for CPU cluster level topology configuration and parsing, so that the user can specify cluster parameter if their machines support it. Signed-off-by: Yanan Wang <wangyanan55@huawei.com> Message-Id: <20211228092221.21068-3-wangyanan55@huawei.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> [PMD: Added '(since 7.0)' to @clusters in qapi/machine.json] Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
2021-12-28 10:22:09 +01:00
" clusters= number of clusters in one die\n"
" cores= number of cores in one cluster\n"
" threads= number of threads in one core\n"
"Note: Different machines may have different subsets of the CPU topology\n"
" parameters supported, so the actual meaning of the supported parameters\n"
" will vary accordingly. For example, for a machine type that supports a\n"
" three-level CPU hierarchy of sockets/cores/threads, the parameters will\n"
" sequentially mean as below:\n"
" sockets means the number of sockets on the machine board\n"
" cores means the number of cores in one socket\n"
" threads means the number of threads in one core\n"
" For a particular machine type board, an expected CPU topology hierarchy\n"
" can be defined through the supported sub-option. Unsupported parameters\n"
" can also be provided in addition to the sub-option, but their values\n"
" must be set as 1 in the purpose of correct parsing.\n",
QEMU_ARCH_ALL)
SRST
hw/core/machine: Introduce CPU cluster topology support The new Cluster-Aware Scheduling support has landed in Linux 5.16, which has been proved to benefit the scheduling performance (e.g. load balance and wake_affine strategy) on both x86_64 and AArch64. So now in Linux 5.16 we have four-level arch-neutral CPU topology definition like below and a new scheduler level for clusters. struct cpu_topology { int thread_id; int core_id; int cluster_id; int package_id; int llc_id; cpumask_t thread_sibling; cpumask_t core_sibling; cpumask_t cluster_sibling; cpumask_t llc_sibling; } A cluster generally means a group of CPU cores which share L2 cache or other mid-level resources, and it is the shared resources that is used to improve scheduler's behavior. From the point of view of the size range, it's between CPU die and CPU core. For example, on some ARM64 Kunpeng servers, we have 6 clusters in each NUMA node, and 4 CPU cores in each cluster. The 4 CPU cores share a separate L2 cache and a L3 cache tag, which brings cache affinity advantage. In virtualization, on the Hosts which have pClusters (physical clusters), if we can design a vCPU topology with cluster level for guest kernel and have a dedicated vCPU pinning. A Cluster-Aware Guest kernel can also make use of the cache affinity of CPU clusters to gain similar scheduling performance. This patch adds infrastructure for CPU cluster level topology configuration and parsing, so that the user can specify cluster parameter if their machines support it. Signed-off-by: Yanan Wang <wangyanan55@huawei.com> Message-Id: <20211228092221.21068-3-wangyanan55@huawei.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> [PMD: Added '(since 7.0)' to @clusters in qapi/machine.json] Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
2021-12-28 10:22:09 +01:00
``-smp [[cpus=]n][,maxcpus=maxcpus][,sockets=sockets][,dies=dies][,clusters=clusters][,cores=cores][,threads=threads]``
Simulate a SMP system with '\ ``n``\ ' CPUs initially present on
the machine type board. On boards supporting CPU hotplug, the optional
'\ ``maxcpus``\ ' parameter can be set to enable further CPUs to be
added at runtime. When both parameters are omitted, the maximum number
of CPUs will be calculated from the provided topology members and the
initial CPU count will match the maximum number. When only one of them
is given then the omitted one will be set to its counterpart's value.
Both parameters may be specified, but the maximum number of CPUs must
be equal to or greater than the initial CPU count. Product of the
CPU topology hierarchy must be equal to the maximum number of CPUs.
Both parameters are subject to an upper limit that is determined by
the specific machine type chosen.
To control reporting of CPU topology information, values of the topology
parameters can be specified. Machines may only support a subset of the
parameters and different machines may have different subsets supported
which vary depending on capacity of the corresponding CPU targets. So
for a particular machine type board, an expected topology hierarchy can
be defined through the supported sub-option. Unsupported parameters can
also be provided in addition to the sub-option, but their values must be
set as 1 in the purpose of correct parsing.
Either the initial CPU count, or at least one of the topology parameters
must be specified. The specified parameters must be greater than zero,
explicit configuration like "cpus=0" is not allowed. Values for any
omitted parameters will be computed from those which are given.
For example, the following sub-option defines a CPU topology hierarchy
(2 sockets totally on the machine, 2 cores per socket, 2 threads per
core) for a machine that only supports sockets/cores/threads.
Some members of the option can be omitted but their values will be
automatically computed:
::
-smp 8,sockets=2,cores=2,threads=2,maxcpus=8
The following sub-option defines a CPU topology hierarchy (2 sockets
totally on the machine, 2 dies per socket, 2 cores per die, 2 threads
per core) for PC machines which support sockets/dies/cores/threads.
Some members of the option can be omitted but their values will be
automatically computed:
::
-smp 16,sockets=2,dies=2,cores=2,threads=2,maxcpus=16
The following sub-option defines a CPU topology hierarchy (2 sockets
totally on the machine, 2 clusters per socket, 2 cores per cluster,
2 threads per core) for ARM virt machines which support sockets/clusters
/cores/threads. Some members of the option can be omitted but their values
will be automatically computed:
::
-smp 16,sockets=2,clusters=2,cores=2,threads=2,maxcpus=16
Historically preference was given to the coarsest topology parameters
when computing missing values (ie sockets preferred over cores, which
were preferred over threads), however, this behaviour is considered
liable to change. Prior to 6.2 the preference was sockets over cores
over threads. Since 6.2 the preference is cores over sockets over threads.
For example, the following option defines a machine board with 2 sockets
of 1 core before 6.2 and 1 socket of 2 cores after 6.2:
::
-smp 2
ERST
DEF("numa", HAS_ARG, QEMU_OPTION_numa,
"-numa node[,mem=size][,cpus=firstcpu[-lastcpu]][,nodeid=node][,initiator=node]\n"
"-numa node[,memdev=id][,cpus=firstcpu[-lastcpu]][,nodeid=node][,initiator=node]\n"
"-numa dist,src=source,dst=destination,val=distance\n"
"-numa cpu,node-id=node[,socket-id=x][,core-id=y][,thread-id=z]\n"
"-numa hmat-lb,initiator=node,target=node,hierarchy=memory|first-level|second-level|third-level,data-type=access-latency|read-latency|write-latency[,latency=lat][,bandwidth=bw]\n"
"-numa hmat-cache,node-id=node,size=size,level=level[,associativity=none|direct|complex][,policy=none|write-back|write-through][,line=size]\n",
QEMU_ARCH_ALL)
SRST
``-numa node[,mem=size][,cpus=firstcpu[-lastcpu]][,nodeid=node][,initiator=initiator]``
\
``-numa node[,memdev=id][,cpus=firstcpu[-lastcpu]][,nodeid=node][,initiator=initiator]``
\
``-numa dist,src=source,dst=destination,val=distance``
\
``-numa cpu,node-id=node[,socket-id=x][,core-id=y][,thread-id=z]``
\
``-numa hmat-lb,initiator=node,target=node,hierarchy=hierarchy,data-type=tpye[,latency=lat][,bandwidth=bw]``
\
``-numa hmat-cache,node-id=node,size=size,level=level[,associativity=str][,policy=str][,line=size]``
Define a NUMA node and assign RAM and VCPUs to it. Set the NUMA
distance from a source node to a destination node. Set the ACPI
Heterogeneous Memory Attributes for the given nodes.
Legacy VCPU assignment uses '\ ``cpus``\ ' option where firstcpu and
lastcpu are CPU indexes. Each '\ ``cpus``\ ' option represent a
contiguous range of CPU indexes (or a single VCPU if lastcpu is
omitted). A non-contiguous set of VCPUs can be represented by
providing multiple '\ ``cpus``\ ' options. If '\ ``cpus``\ ' is
omitted on all nodes, VCPUs are automatically split between them.
For example, the following option assigns VCPUs 0, 1, 2 and 5 to a
NUMA node:
::
-numa node,cpus=0-2,cpus=5
'\ ``cpu``\ ' option is a new alternative to '\ ``cpus``\ ' option
which uses '\ ``socket-id|core-id|thread-id``\ ' properties to
assign CPU objects to a node using topology layout properties of
CPU. The set of properties is machine specific, and depends on used
machine type/'\ ``smp``\ ' options. It could be queried with
'\ ``hotpluggable-cpus``\ ' monitor command. '\ ``node-id``\ '
property specifies node to which CPU object will be assigned, it's
required for node to be declared with '\ ``node``\ ' option before
it's used with '\ ``cpu``\ ' option.
For example:
::
-M pc \
-smp 1,sockets=2,maxcpus=2 \
-numa node,nodeid=0 -numa node,nodeid=1 \
-numa cpu,node-id=0,socket-id=0 -numa cpu,node-id=1,socket-id=1
Legacy '\ ``mem``\ ' assigns a given RAM amount to a node (not supported
for 5.1 and newer machine types). '\ ``memdev``\ ' assigns RAM from
a given memory backend device to a node. If '\ ``mem``\ ' and
'\ ``memdev``\ ' are omitted in all nodes, RAM is split equally between them.
'\ ``mem``\ ' and '\ ``memdev``\ ' are mutually exclusive.
Furthermore, if one node uses '\ ``memdev``\ ', all of them have to
use it.
'\ ``initiator``\ ' is an additional option that points to an
initiator NUMA node that has best performance (the lowest latency or
largest bandwidth) to this NUMA node. Note that this option can be
set only when the machine property 'hmat' is set to 'on'.
Following example creates a machine with 2 NUMA nodes, node 0 has
CPU. node 1 has only memory, and its initiator is node 0. Note that
because node 0 has CPU, by default the initiator of node 0 is itself
and must be itself.
::
-machine hmat=on \
-m 2G,slots=2,maxmem=4G \
-object memory-backend-ram,size=1G,id=m0 \
-object memory-backend-ram,size=1G,id=m1 \
-numa node,nodeid=0,memdev=m0 \
-numa node,nodeid=1,memdev=m1,initiator=0 \
-smp 2,sockets=2,maxcpus=2 \
-numa cpu,node-id=0,socket-id=0 \
-numa cpu,node-id=0,socket-id=1
source and destination are NUMA node IDs. distance is the NUMA
distance from source to destination. The distance from a node to
itself is always 10. If any pair of nodes is given a distance, then
all pairs must be given distances. Although, when distances are only
given in one direction for each pair of nodes, then the distances in
the opposite directions are assumed to be the same. If, however, an
asymmetrical pair of distances is given for even one node pair, then
all node pairs must be provided distance values for both directions,
even when they are symmetrical. When a node is unreachable from
another node, set the pair's distance to 255.
Note that the -``numa`` option doesn't allocate any of the specified
resources, it just assigns existing resources to NUMA nodes. This
means that one still has to use the ``-m``, ``-smp`` options to
allocate RAM and VCPUs respectively.
Use '\ ``hmat-lb``\ ' to set System Locality Latency and Bandwidth
Information between initiator and target NUMA nodes in ACPI
Heterogeneous Attribute Memory Table (HMAT). Initiator NUMA node can
create memory requests, usually it has one or more processors.
Target NUMA node contains addressable memory.
In '\ ``hmat-lb``\ ' option, node are NUMA node IDs. hierarchy is
the memory hierarchy of the target NUMA node: if hierarchy is
'memory', the structure represents the memory performance; if
hierarchy is 'first-level\|second-level\|third-level', this
structure represents aggregated performance of memory side caches
for each domain. type of 'data-type' is type of data represented by
this structure instance: if 'hierarchy' is 'memory', 'data-type' is
'access\|read\|write' latency or 'access\|read\|write' bandwidth of
the target memory; if 'hierarchy' is
'first-level\|second-level\|third-level', 'data-type' is
'access\|read\|write' hit latency or 'access\|read\|write' hit
bandwidth of the target memory side cache.
lat is latency value in nanoseconds. bw is bandwidth value, the
possible value and units are NUM[M\|G\|T], mean that the bandwidth
value are NUM byte per second (or MB/s, GB/s or TB/s depending on
used suffix). Note that if latency or bandwidth value is 0, means
the corresponding latency or bandwidth information is not provided.
In '\ ``hmat-cache``\ ' option, node-id is the NUMA-id of the memory
belongs. size is the size of memory side cache in bytes. level is
the cache level described in this structure, note that the cache
level 0 should not be used with '\ ``hmat-cache``\ ' option.
associativity is the cache associativity, the possible value is
'none/direct(direct-mapped)/complex(complex cache indexing)'. policy
is the write policy. line is the cache Line size in bytes.
For example, the following options describe 2 NUMA nodes. Node 0 has
2 cpus and a ram, node 1 has only a ram. The processors in node 0
access memory in node 0 with access-latency 5 nanoseconds,
access-bandwidth is 200 MB/s; The processors in NUMA node 0 access
memory in NUMA node 1 with access-latency 10 nanoseconds,
access-bandwidth is 100 MB/s. And for memory side cache information,
NUMA node 0 and 1 both have 1 level memory cache, size is 10KB,
policy is write-back, the cache Line size is 8 bytes:
::
-machine hmat=on \
-m 2G \
-object memory-backend-ram,size=1G,id=m0 \
-object memory-backend-ram,size=1G,id=m1 \
-smp 2,sockets=2,maxcpus=2 \
-numa node,nodeid=0,memdev=m0 \
-numa node,nodeid=1,memdev=m1,initiator=0 \
-numa cpu,node-id=0,socket-id=0 \
-numa cpu,node-id=0,socket-id=1 \
-numa hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-latency,latency=5 \
-numa hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-bandwidth,bandwidth=200M \
-numa hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-latency,latency=10 \
-numa hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-bandwidth,bandwidth=100M \
-numa hmat-cache,node-id=0,size=10K,level=1,associativity=direct,policy=write-back,line=8 \
-numa hmat-cache,node-id=1,size=10K,level=1,associativity=direct,policy=write-back,line=8
ERST
hw/cxl/host: Add support for CXL Fixed Memory Windows. The concept of these is introduced in [1] in terms of the description the CEDT ACPI table. The principal is more general. Unlike once traffic hits the CXL root bridges, the host system memory address routing is implementation defined and effectively static once observable by standard / generic system software. Each CXL Fixed Memory Windows (CFMW) is a region of PA space which has fixed system dependent routing configured so that accesses can be routed to the CXL devices below a set of target root bridges. The accesses may be interleaved across multiple root bridges. For QEMU we could have fully specified these regions in terms of a base PA + size, but as the absolute address does not matter it is simpler to let individual platforms place the memory regions. ExampleS: -cxl-fixed-memory-window targets.0=cxl.0,size=128G -cxl-fixed-memory-window targets.0=cxl.1,size=128G -cxl-fixed-memory-window targets.0=cxl0,targets.1=cxl.1,size=256G,interleave-granularity=2k Specifies * 2x 128G regions not interleaved across root bridges, one for each of the root bridges with ids cxl.0 and cxl.1 * 256G region interleaved across root bridges with ids cxl.0 and cxl.1 with a 2k interleave granularity. When system software enumerates the devices below a given root bridge it can then decide which CFMW to use. If non interleave is desired (or possible) it can use the appropriate CFMW for the root bridge in question. If there are suitable devices to interleave across the two root bridges then it may use the 3rd CFMS. A number of other designs were considered but the following constraints made it hard to adapt existing QEMU approaches to this particular problem. 1) The size must be known before a specific architecture / board brings up it's PA memory map. We need to set up an appropriate region. 2) Using links to the host bridges provides a clean command line interface but these links cannot be established until command line devices have been added. Hence the two step process used here of first establishing the size, interleave-ways and granularity + caching the ids of the host bridges and then, once available finding the actual host bridges so they can be used later to support interleave decoding. [1] CXL 2.0 ECN: CEDT CFMWS & QTG DSM (computeexpresslink.org / specifications) Signed-off-by: Jonathan Cameron <jonathan.cameron@huawei.com> Acked-by: Markus Armbruster <armbru@redhat.com> # QAPI Schema Message-Id: <20220429144110.25167-28-Jonathan.Cameron@huawei.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-04-29 16:40:52 +02:00
DEF("cxl-fixed-memory-window", HAS_ARG, QEMU_OPTION_cxl_fixed_memory_window,
"-cxl-fixed-memory-window targets.0=firsttarget,targets.1=secondtarget,size=size[,interleave-granularity=granularity]\n",
QEMU_ARCH_ALL)
SRST
``-cxl-fixed-memory-window targets.0=firsttarget,targets.1=secondtarget,size=size[,interleave-granularity=granularity]``
Define a CXL Fixed Memory Window (CFMW).
Described in the CXL 2.0 ECN: CEDT CFMWS & QTG _DSM.
They are regions of Host Physical Addresses (HPA) on a system which
may be interleaved across one or more CXL host bridges. The system
software will assign particular devices into these windows and
configure the downstream Host-managed Device Memory (HDM) decoders
in root ports, switch ports and devices appropriately to meet the
interleave requirements before enabling the memory devices.
``targets.X=firsttarget`` provides the mapping to CXL host bridges
which may be identified by the id provied in the -device entry.
Multiple entries are needed to specify all the targets when
the fixed memory window represents interleaved memory. X is the
target index from 0.
``size=size`` sets the size of the CFMW. This must be a multiple of
256MiB. The region will be aligned to 256MiB but the location is
platform and configuration dependent.
``interleave-granularity=granularity`` sets the granularity of
interleave. Default 256KiB. Only 256KiB, 512KiB, 1024KiB, 2048KiB
4096KiB, 8192KiB and 16384KiB granularities supported.
Example:
::
-cxl-fixed-memory-window targets.0=cxl.0,targets.1=cxl.1,size=128G,interleave-granularity=512k
ERST
DEF("add-fd", HAS_ARG, QEMU_OPTION_add_fd,
"-add-fd fd=fd,set=set[,opaque=opaque]\n"
" Add 'fd' to fd 'set'\n", QEMU_ARCH_ALL)
SRST
``-add-fd fd=fd,set=set[,opaque=opaque]``
Add a file descriptor to an fd set. Valid options are:
``fd=fd``
This option defines the file descriptor of which a duplicate is
added to fd set. The file descriptor cannot be stdin, stdout, or
stderr.
``set=set``
This option defines the ID of the fd set to add the file
descriptor to.
``opaque=opaque``
This option defines a free-form string that can be used to
describe fd.
You can open an image using pre-opened file descriptors from an fd
set:
.. parsed-literal::
|qemu_system| \\
-add-fd fd=3,set=2,opaque="rdwr:/path/to/file" \\
-add-fd fd=4,set=2,opaque="rdonly:/path/to/file" \\
-drive file=/dev/fdset/2,index=0,media=disk
ERST
DEF("set", HAS_ARG, QEMU_OPTION_set,
"-set group.id.arg=value\n"
" set <arg> parameter for item <id> of type <group>\n"
" i.e. -set drive.$id.file=/path/to/image\n", QEMU_ARCH_ALL)
SRST
``-set group.id.arg=value``
Set parameter arg for item id of type group
ERST
DEF("global", HAS_ARG, QEMU_OPTION_global,
"-global driver.property=value\n"
"-global driver=driver,property=property,value=value\n"
" set a global default for a driver property\n",
QEMU_ARCH_ALL)
SRST
``-global driver.prop=value``
\
``-global driver=driver,property=property,value=value``
Set default value of driver's property prop to value, e.g.:
.. parsed-literal::
|qemu_system_x86| -global ide-hd.physical_block_size=4096 disk-image.img
In particular, you can use this to set driver properties for devices
which are created automatically by the machine model. To create a
device which is not created automatically and set properties on it,
use -``device``.
-global driver.prop=value is shorthand for -global
driver=driver,property=prop,value=value. The longhand syntax works
even when driver contains a dot.
ERST
DEF("boot", HAS_ARG, QEMU_OPTION_boot,