qemu-e2k

Author	SHA1	Message	Date
Markus Armbruster	4f7ec696f4	numa: Clean up error reporting in parse_numa() Calling error_report() in a function that takes an Error ** argument is suspicious. parse_numa() does that, and then fails without setting an error. Its caller main(), via qemu_opts_foreach(), is fine with it, but clean it up anyway. While there, give parse_numa() internal linkage. Cc: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Message-Id: <20181017082702.5581-29-armbru@redhat.com>	2018-10-19 14:51:34 +02:00
Markus Armbruster	a22528b918	numa: Fix QMP command set-numa-node error handling Calling error_report() in a function that takes an Error ** argument is suspicious. parse_numa_node() does that, and then exit()s. It also passes &error_fatal to machine_set_cpu_numa_node(). Both wrong. Attempting to configure numa when the machine doesn't support it kills the VM: $ qemu-system-x86_64 -nodefaults -S -display none -M none -preconfig -qmp stdio {"QMP": {"version": {"qemu": {"micro": 50, "minor": 0, "major": 3}, "package": "v3.0.0-837-gc5e4e49258"}, "capabilities": []}} {"execute": "qmp_capabilities"} {"return": {}} {"execute": "set-numa-node", "arguments": {"type": "node"}} NUMA is not supported by this machine-type $ echo $? 1 Messed up when commit `64c2a8f6d3` and `7c88e65d9e` (v2.10.0) added incorrect error handling right next to correct examples. Latent bug until commit `f3be67812c` (v3.0.0) made it accessible via QMP. Fairly harmless in practice, because it's limited to RUN_STATE_PRECONFIG. The fix is obvious: replace error_report(); exit() by error_setg(); return. This affects parse_numa_node()'s other caller numa_complete_configuration(): since it ignores errors, the "NUMA is not supported by this machine-type" is now ignored, too. But that error is as unexpected there as any other. Change it to abort on error instead. Fixes: `f3be67812c` Cc: Igor Mammedov <imammedo@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <20181017082702.5581-16-armbru@redhat.com>	2018-10-19 14:51:34 +02:00
Junyan He	cbfc017103	memory, exec: switch file ram allocation functions to 'flags' parameters As more flag parameters besides the existing 'share' are going to be added to following functions memory_region_init_ram_from_file qemu_ram_alloc_from_fd qemu_ram_alloc_from_file let's switch them to use the 'flags' parameters so as to ease future flag additions. The existing 'share' flag is converted to the RAM_SHARED bit in ram_flags, and other flag bits are ignored by above functions right now. Signed-off-by: Junyan He <junyan.he@intel.com> Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>	2018-08-10 13:29:39 +03:00
David Hildenbrand	178003ea49	numa: report all DIMM/NVDIMMs as plugged memory Right now, there is some inconsistency between hotplugged and coldplugged memory. DIMMs added via "-device" result in different stats than DIMMs added using "device_add". E.g. [...] -numa node,nodeid=0,cpus=0-1 -numa node,nodeid=1,cpus=2-3 \ -m 4G,maxmem=20G,slots=2 \ -object memory-backend-ram,id=mem0,size=8G \ -device pc-dimm,id=dimm0,memdev=mem0 \ -object memory-backend-ram,id=mem1,size=8G \ -device nvdimm,id=dimm1,memdev=mem1,node=1 Results in NUMA info (qemu) info numa info numa 2 nodes node 0 cpus: 0 1 node 0 size: 10240 MB node 0 plugged: 0 MB node 1 cpus: 2 3 node 1 size: 10240 MB node 1 plugged: 0 MB But in memory size summary: (qemu) info memory_size_summary info memory_size_summary base memory: 4294967296 plugged memory: 17179869184 Make this consistent by reporting all hot and coldplugged memory a.k.a. DIMM and NVDIMM as "plugged". Fixes: `31959e82fb` ("hmp: extend "info numa" with hotplugged memory information") Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20180622144045.737-1-david@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-06-28 19:05:34 +02:00
David Hildenbrand	7943e97b85	hostmem: drop error variable from host_memory_backend_get_memory() Unused, so let's remove it. Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20180619134141.29478-8-david@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-06-28 19:05:33 +02:00
Igor Mammedov	f3be67812c	qmp: add set-numa-node command Command is allowed to run only in preconfig stage and will allow to configure numa mapping for CPUs depending on possible CPUs layout (query-hotpluggable-cpus) for given machine instance. Example of configuration session: $QEMU -smp 2 --preconfig ... QMP: -> {'execute': 'query-hotpluggable-cpus' } <- {'return': [ {'props': {'core-id': 0, 'thread-id': 0, 'socket-id': 1}, ... }, {'props': {'core-id': 0, 'thread-id': 0, 'socket-id': 0}, ... } ]} -> {'execute': 'set-numa-node', 'arguments': { 'type': 'node', 'nodeid': 0 } } <- {'return': {}} -> {'execute': 'set-numa-node', 'arguments': { 'type': 'cpu', 'node-id': 0, 'core-id': 0, 'thread-id': 0, 'socket-id': 1, } } <- {'return': {}} -> {'execute': 'set-numa-node', 'arguments': { 'type': 'node', 'nodeid': 1 } } -> {'execute': 'set-numa-node', 'arguments': { 'type': 'cpu', 'node-id': 1, 'core-id': 0, 'thread-id': 0, 'socket-id': 0 } } <- {'return': {}} -> {'execute': 'query-hotpluggable-cpus' } <- {'return': [ {'props': {'core-id': 0, 'thread-id': 0, 'node-id': 0, 'socket-id': 1}, ... }, {'props': {'core-id': 0, 'thread-id': 0, 'node-id': 1, 'socket-id': 0}, ... } ]} Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <1525423069-61903-11-git-send-email-imammedo@redhat.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> [ehabkost: Changed "since 2.13" to "since 3.0"] Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2018-05-30 13:19:14 -03:00
Igor Mammedov	3319b4efc2	numa: split out NumaOptions parsing into set_numa_options() it will allow to reuse set_numa_options() for parsing configuration commands received via QMP interface Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <1525423069-61903-3-git-send-email-imammedo@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2018-05-30 13:08:11 -03:00
Igor Mammedov	7a3099fc9c	numa: postpone options post-processing till machine_run_board_init() in preparation for numa options to being handled via QMP before machine_run_board_init(), move final numa configuration checks and processing to machine_run_board_init() so it could take into account both CLI (via parse_numa_opts()) and QMP input Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Message-Id: <1525423069-61903-2-git-send-email-imammedo@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2018-05-30 13:08:11 -03:00
Igor Mammedov	74f38e96b3	numa: clarify error message when node index is out of range in -numa dist, ... When using following CLI: -numa dist,src=128,dst=1,val=20 user gets a rather confusing error message: "Invalid node 128, max possible could be 128" Where 128 is number of nodes that QEMU supports (MAX_NODES), while src/dst is an index up to that limit, so it should be MAX_NODES - 1 in error message. Make error message to explicitly state valid range for node index to be more clear. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <1526483174-169008-1-git-send-email-imammedo@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2018-05-30 13:08:11 -03:00
Paolo Bonzini	29de4ec164	memdev: remove "id" property The "id" property is unnecessary and can be replaced simply with object_get_canonical_path_component. This patch mostly undoes commit `e1ff3c67e8` ("monitor: fix qmp/hmp query-memdev not reporting IDs of memory backends", 2017-01-12). Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-05-09 00:13:37 +02:00
David Hildenbrand	2cc0e2e814	pc-dimm: factor out MemoryDevice interface On the qmp level, we already have the concept of memory devices: "query-memory-devices" Right now, we only support NVDIMM and PCDIMM. We want to map other devices later into the address space of the guest. Such device could e.g. be virtio devices. These devices will have a guest memory range assigned but won't be exposed via e.g. ACPI. We want to make them look like memory device, but not glued to pc-dimm. Especially, it will not always be possible to have TYPE_PC_DIMM as a parent class (e.g. virtio devices). Let's use an interface instead. As a first part, convert handling of - qmp_pc_dimm_device_list - get_plugged_memory_size to our new model. plug/unplug stuff etc. will follow later. A memory device will have to provide the following functions: - get_addr(): Necessary, as the property "addr" can e.g. not be used for virtio devices (already defined). - get_plugged_size(): The amount this device offers to the guest as of now. - get_region_size(): Because this can later on be bigger than the plugged size. - fill_device_info(): Fill MemoryDeviceInfo, e.g. for qmp. Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20180423165126.15441-2-david@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2018-05-07 10:00:02 -03:00
David Gibson	6233b679ca	Clear mem_path if we fall back to anonymous RAM allocation If the -mem-path option is set, we attempt to map the guest's RAM from a file in the given path; it's usually used to back guest RAM with hugepages. If we're unable to (e.g. not enough free hugepages) then we fall back to allocating normal anonymous pages. This behaviour can be surprising, but a comment in allocate_system_memory_nonnuma() suggests it's legacy behaviour we can't change. What really isn't ok, though, is that in this case we leave mem_path set. That means functions which attempt to determine the pagesize of main RAM can erroneously think it is hugepage based on the requested path, even though it's not. This is particular bad for the pseries machine type. KVM HV limitations mean the guest can't use pagesizes larger than the host page size used to back RAM. That means that such a fallback, rather than merely giving poorer performance than expected will cause the guest to freeze up early in boot as it attempts to use large page mappings that can't work. This patch addresses the problem by clearing the mem_path variable when we fall back to anonymous pages, meaning that subsequent attempts to determine the RAM page size will get an accurate result. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2018-04-27 18:05:23 +10:00
Haozhong Zhang	6388e18de9	qmp: distinguish PC-DIMM and NVDIMM in MemoryDeviceInfoList It may need to treat PC-DIMM and NVDIMM differently, e.g., when deciding the necessity of non-volatile flag bit in SRAT memory affinity structures. A new field 'nvdimm' is added to the union type MemoryDeviceInfo for such purpose. Its type is currently PCDIMMDeviceInfo and will be updated when necessary in the future. It also fixes "info memory-devices"/query-memory-devices which currently show nvdimm devices as dimm devices since object_dynamic_cast(obj, TYPE_PC_DIMM) happily cast nvdimm to TYPE_PC_DIMM which it's been inherited from. Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2018-03-20 03:34:52 +02:00
Haozhong Zhang	52c95cae4e	pc-dimm: make qmp_pc_dimm_device_list() sort devices by address Make qmp_pc_dimm_device_list() return sorted by start address list of devices so that it could be reused in places that would need sorted list. Reuse existing pc_dimm_built_list() to get sorted list. While at it hide recursive callbacks from callers, so that: qmp_pc_dimm_device_list(qdev_get_machine(), &list); could be replaced with simpler: list = qmp_pc_dimm_device_list(); follow up patch will use it in build_srat() Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Acked-by: David Gibson <david@gibson.dropbear.id.au> for ppc part Reviewed-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2018-03-20 03:34:52 +02:00
David Hildenbrand	81ce6aa51a	numa: we don't implement NUMA for s390x Right now it is possible to crash QEMU for s390x by providing e.g. -numa node,nodeid=0,cpus=0-1 Problem is, that numa.c uses mc->cpu_index_to_instance_props as an indicator whether NUMA is supported by a machine type. We don't implement NUMA for s390x ("topology") yet. However we need mc->cpu_index_to_instance_props for query-cpus. So let's fix this case by also checking for mc->get_default_cpu_node_id, which will be needed by machine_set_cpu_numa_node(). qemu-system-s390x: -numa node,nodeid=0,cpus=0-1: NUMA is not supported by this machine-type While at it, make s390_cpu_index_to_props() look like on other architectures. Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20180227110255.20999-1-david@redhat.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Cornelia Huck <cohuck@redhat.com>	2018-03-08 15:49:23 +01:00
Markus Armbruster	112ed241f5	qapi: Empty out qapi-schema.json The previous commit improved compile time by including less of the generated QAPI headers. This is impossible for stuff defined directly in qapi-schema.json, because that ends up in headers that that pull in everything. Move everything but include directives from qapi-schema.json to new sub-module qapi/misc.json, then include just the "misc" shard where possible. It's possible everywhere, except: * monitor.c needs qmp-command.h to get qmp_init_marshal() * monitor.c, ui/vnc.c and the generated qapi-event-FOO.c need qapi-event.h to get enum QAPIEvent Perhaps we'll get rid of those some other day. Adding a type to qapi/migration.json now recompiles some 120 instead of 2300 out of 5100 objects. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20180211093607.27351-25-armbru@redhat.com> [eblake: rebase to master] Signed-off-by: Eric Blake <eblake@redhat.com>	2018-03-02 13:45:50 -06:00
Markus Armbruster	e688df6bc4	Include qapi/error.h exactly where needed This cleanup makes the number of objects depending on qapi/error.h drop from 1910 (out of 4743) to 1612 in my "build everything" tree. While there, separate #include from file comment with a blank line, and drop a useless comment on why qemu/osdep.h is included first. Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20180201111846.21846-5-armbru@redhat.com> [Semantic conflict with commit `34e304e975` resolved, OSX breakage fixed]	2018-02-09 13:50:17 +01:00
Marcelo Tosatti	e85687ffe2	qemu: improve hugepage allocation failure message Improve hugepage allocation failure message, indicating what is happening to the user. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Message-Id: <20180115201700.GA4439@amt.cnet> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-02-05 13:54:38 +01:00
Haozhong Zhang	9837684316	hostmem-file: add "align" option When mmap(2) the backend files, QEMU uses the host page size (getpagesize(2)) by default as the alignment of mapping address. However, some backends may require alignments different than the page size. For example, mmap a device DAX (e.g., /dev/dax0.0) on Linux kernel 4.13 to an address, which is 4K-aligned but not 2M-aligned, fails with a kernel message like [617494.969768] dax dax0.0: qemu-system-x86: dax_mmap: fail, unaligned vma (0x7fa37c579000 - 0x7fa43c579000, 0x1fffff) Because there is no common approach to get such alignment requirement, we add the 'align' option to 'memory-backend-file', so that users or management utils, which have enough knowledge about the backend, can specify a proper alignment via this option. Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Message-Id: <20171211072806.2812-2-haozhong.zhang@intel.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> [ehabkost: fixed typo, fixed error_setg() format string] Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2018-01-19 11:18:51 -02:00
Philippe Mathieu-Daudé	1330f1e2b7	numa: remove unused #include Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2017-12-18 17:07:02 +03:00
Igor Mammedov	f47bd1c839	spapr: replace numa_get_node() with lookup in pc-dimm list SPAPR is the last user of numa_get_node() and a bunch of supporting code to maintain numa_info[x].addr list. Get LMB node id from pc-dimm list, which allows to remove ~80LOC maintaining dynamic address range lookup list. It also removes pc-dimm dependency on numa_[un]set_mem_node_id() and makes pc-dimms a sole source of information about which node it belongs to and removes duplicate data from global numa_info. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-12-15 09:49:24 +11:00
Dou Liyang	7b8be49d36	NUMA: Enable adding NUMA node implicitly Linux and Windows need ACPI SRAT table to make memory hotplug work properly, however currently QEMU doesn't create SRAT table if numa options aren't present on CLI. Which breaks both linux and windows guests in certain conditions: * Windows: won't enable memory hotplug without SRAT table at all * Linux: if QEMU is started with initial memory all below 4Gb and no SRAT table present, guest kernel will use nommu DMA ops, which breaks 32bit hw drivers when memory is hotplugged and guest tries to use it with that drivers. Fix above issues by automatically creating a numa node when QEMU is started with memory hotplug enabled but without '-numa' options on CLI. (PS: auto-create numa node only for new machine types so not to break migration). Which would provide SRAT table to guests without explicit -numa options on CLI and would allow: * Windows: to enable memory hotplug * Linux: switch to SWIOTLB DMA ops, to bounce DMA transfers to 32bit allocated buffers that legacy drivers/hw can handle. [Rewritten by Igor] Reported-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Suggested-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Marcel Apfelbaum <marcel@redhat.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: David Hildenbrand <david@redhat.com> Cc: Thomas Huth <thuth@redhat.com> Cc: Alistair Francis <alistair23@gmail.com> Cc: Takao Indoh <indou.takao@jp.fujitsu.com> Cc: Izumi Taku <izumi.taku@jp.fujitsu.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2017-11-16 17:46:53 +02:00
Igor Mammedov	cc001888b7	numa: fixup parsed NumaNodeOptions earlier numa 'mem' option with suffix or without one is possible only on CLI/HMP. Instead of fixing up special suffix less CLI case deep in parse_numa_node() do it earlier right after option is parsed into NumaNodeOptions with OptVisistor so that the rest of the code would use valid values in NumaNodeOptions and won't have to reparse QemuOpts. It will help to isolate CLI/HMP parts in parse_numa() and split out parsed NumaNodeOptions processing into separate function that could be reused by QMP handler where we have only NumaNodeOptions and don't need any fixups. While at it reuse qemu_strtosz_MiB() instead of manually checking for suffixes. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <1507801198-98182-1-git-send-email-imammedo@redhat.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2017-10-27 16:04:28 +02:00
Dou Liyang	f51878ba86	NUMA: Replace MAX_NODES with nb_numa_nodes in for loop In QEMU, the number of the NUMA nodes is determined by parse_numa_opts(). Then, QEMU uses it for iteration, for example: for (i = 0; i < nb_numa_nodes; i++) However, in memory_region_allocate_system_memory(), it uses MAX_NODES not nb_numa_nodes. So, replace MAX_NODES with nb_numa_nodes to keep code consistency and reduce the loop times. Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com> Message-Id: <1503387936-3483-1-git-send-email-douly.fnst@cn.fujitsu.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2017-09-19 16:51:33 -03:00
Vadim Galitsyn	31959e82fb	hmp: extend "info numa" with hotplugged memory information Report amount of hotplugged memory in addition to total amount per NUMA node. Signed-off-by: Vadim Galitsyn <vadim.galitsyn@profitbricks.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: David Hildenbrand <david@redhat.com> Cc: qemu-devel@nongnu.org Message-Id: <20170829153022.27004-2-vadim.galitsyn@profitbricks.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2017-09-14 15:52:10 +01:00
Peter Maydell	1cfe48c1ce	memory: Rename memory_region_init_ram() to memory_region_init_ram_nomigrate() Rename memory_region_init_ram() to memory_region_init_ram_nomigrate(). This leaves the way clear for us to provide a memory_region_init_ram() which does handle migration. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 1499438577-7674-4-git-send-email-peter.maydell@linaro.org	2017-07-14 17:59:42 +01:00
Marc-André Lureau	61d7c14437	numa: use get_uint() for "size" property "size" is a property of TYPE_MEMORY_BACKEND. host_memory_backend_get_size() and host_memory_backend_set_size() use visit_type_size(). Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20170607163635.17635-39-marcandre.lureau@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>	2017-06-20 14:31:33 +02:00
Igor Mammedov	d41f3e750d	numa: make sure that all cpus have has_node_id set if numa is enabled It fixes/add missing _PXM object for non mapped CPU (x86) and missing fdt node (virt-arm). It ensures that possible_cpus contains complete mapping if numa is enabled by the time machine_init() is executed. As result non completely mapped CPUs: 1) appear in ACPI/fdt blobs 2) QMP query-hotpluggable-cpus command shows bound nodes for such CPUs 3) allows to drop checks for has_node_id in numa only code, reducing number of invariants incomplete mapping could produce 4) moves fixup/implicit node init from runtime numa_cpu_pre_plug() (when CPU object is created) to machine_numa_finish_init() which helps to fix [1, 2] and make possible_cpus complete source of numa mapping available even before CPUs are created. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <1496161442-96665-4-git-send-email-imammedo@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2017-06-05 14:59:08 -03:00
Igor Mammedov	60bed6a30a	numa: move default mapping init to machine there is no need use cpu_index_to_instance_props() for setting default cpu -> node mapping. Generic machine code can do it without cpu_index by just enabling already preset defaults in possible_cpus. PS: as bonus it makes one less user of cpu_index_to_instance_props() Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <1496161442-96665-3-git-send-email-imammedo@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2017-06-05 14:59:08 -03:00
Igor Mammedov	a0ceb640d0	numa: consolidate cpu_preplug fixups/checks for pc/arm/spapr Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Message-Id: <1496161442-96665-2-git-send-email-imammedo@redhat.com> [ehabkost: Fix indentation] Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2017-06-05 14:59:08 -03:00
Eduardo Habkost	f892291eee	numa: Fix format string for "Invalid node" message Some compilers complain about the PRIu16 format string with the MAX(src, dst) and MAX_NODES arguments. Example output from Apple LLVM version 7.3.0 (clang-703.0.31): numa.c:236:20: warning: format specifies type 'unsigned short' but the argument has type 'int' [-Wformat] MAX(src, dst), MAX_NODES); ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~ include/qapi/error.h:163:35: note: expanded from macro 'error_setg' (fmt), ## __VA_ARGS__) ^~~~~~~~~~~ glib/2.52.2/include/glib-2.0/glib/gmacros.h:288:20: note: expanded from macro 'MAX' #define MAX(a, b) (((a) > (b)) ? (a) : (b)) ^~~~~~~~~~~~~~~~~~~~~~~~~ numa.c:236:35: warning: format specifies type 'unsigned short' but the argument has type 'int' [-Wformat] MAX(src, dst), MAX_NODES); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~ include/qapi/error.h:163:35: note: expanded from macro 'error_setg' (fmt), ## __VA_ARGS__) ^~~~~~~~~~~ include/sysemu/sysemu.h:165:19: note: expanded from macro 'MAX_NODES' #define MAX_NODES 128 ^~~ MAX(src, dst) promotes the src and dst arguments to int, and MAX_NODES is an int. Use %d to silence those warnings. Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Message-Id: <20170530184013.31044-1-ehabkost@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2017-05-30 16:09:58 -03:00
Stefan Hajnoczi	ba9915e1f8	x86 and machine queue, 2017-05-11 Highlights: * New "-numa cpu" option * NUMA distance configuration * migration/i386 vmstatification -----BEGIN PGP SIGNATURE----- iQIcBAABCAAGBQJZFLh3AAoJECgHk2+YTcWmppQQAJe9Y5a3VwHXqvHbwBHX2ysn RZDAUPd9DWpbM+UUydyKOVIZ7u5RXbbVq4E0NeCD8VYYd+grZB5Wo1cAzy3b4U2j 2s+MDqaPMtZtGoqxTsyQOVoVxazT5Kf1zglK+iUEzik44J7LGdro+ty2Z7Ut2c11 q9rE/GNS78czBm7c4lxgkxXW4N95K/tEGlLtDQ7uct//3U/ZimF+mO6GcbVFlOWT 4iEbOz2sqvBVv22nLJRufiPgFNIW4hizAz5KBWxwGFCCKvT3N6yYNKKjzEpCw+jE lpjIRODU02yIZZZY841fLRtyrk7p4zORS8jRaHTdEJgb5bGc/YazxxVL8nzRQT1W VxFwAMd+UNrDkV24hpN++Ln2O+b3kwcGZ7uA/qu9d5WvSYUKXlHqcMJ35q6zuhAI /ecfYO7EZfVP86VjIt5IH04iV8RChA9Q6de+kQEFa6wHUxufeCOwCFqukGo8zj07 plX8NcjnzYmSXKnYjHOHao4rKT+DiJhRB60rFiMeKP/qvKbZPjtgsIeonhHm53qZ /QwkhowahHKkpAnetIl0QHm8KS4YudAofMi/Fl+he4gRkEbSQVAo6iQb2L4cjcLC LNSDDsIVWGem4gCR+vcsFqB3lggRDfltHXm15JKh92UMpOr6RI6s8pD55T7EdnPC CfdxWB5kYM6/lLbOHj94 =48wH -----END PGP SIGNATURE----- Merge remote-tracking branch 'ehabkost/tags/x86-and-machine-pull-request' into staging x86 and machine queue, 2017-05-11 Highlights: * New "-numa cpu" option * NUMA distance configuration * migration/i386 vmstatification # gpg: Signature made Thu 11 May 2017 08:16:07 PM BST # gpg: using RSA key 0x2807936F984DC5A6 # gpg: Good signature from "Eduardo Habkost <ehabkost@redhat.com>" # gpg: Note: This key has expired! # Primary key fingerprint: 5A32 2FD5 ABC4 D3DB ACCF D1AA 2807 936F 984D C5A6 * ehabkost/tags/x86-and-machine-pull-request: (29 commits) migration/i386: Remove support for pre-0.12 formats vmstatification: i386 FPReg migration/i386: Remove old non-softfloat 64bit FP support tests: check -numa node,cpu=props_list usecase numa: add '-numa cpu,...' option for property based node mapping numa: remove node_cpu bitmaps as they are no longer used numa: use possible_cpus for not mapped CPUs check machine: call machine init from wrapper numa: remove no longer need numa_post_machine_init() tests: numa: add case for QMP command query-cpus QMP: include CpuInstanceProperties into query_cpus output output virt-arm: get numa node mapping from possible_cpus instead of numa_get_node_for_cpu() spapr: get numa node mapping from possible_cpus instead of numa_get_node_for_cpu() pc: get numa node mapping from possible_cpus instead of numa_get_node_for_cpu() numa: do default mapping based on possible_cpus instead of node_cpu bitmaps numa: mirror cpu to node mapping in MachineState::possible_cpus numa: add check that board supports cpu_index to node mapping virt-arm: add node-id property to CPU pc: add node-id property to CPU spapr: add node-id property to sPAPR core ... Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-05-15 14:12:03 +01:00
Igor Mammedov	419fcdec3c	numa: add '-numa cpu,...' option for property based node mapping legacy cpu to node mapping is using cpu index values to map VCPU to node with help of '-numa node,nodeid=node,cpus=x[-y]' option. However cpu index is internal concept and QEMU users have to guess /reimplement qemu's logic/ to map it to a concrete cpu socket/core/thread to make sane CPUs placement across numa nodes. This patch allows to map cpu objects to numa nodes using the same properties as used for cpus with -device/device_add (socket-id/core-id/thread-id/node-id). At present valid properties/values to address CPUs could be fetched using hotpluggable-cpus monitor/qmp command, it will require user to start qemu twice when creating domain to fetch possible CPUs for a machine type/-smp layout first and then the second time with numa explicit mapping for actual usage. The first step results could be saved and reused to set/change mapping later as far as machine type/-smp stays the same. Proposed impl. supports exact and wildcard matching to simplify CLI and allow to set mapping for a specific cpu or group of cpu objects specified by matched properties. For example: # exact mapping x86 -numa cpu,node-id=x,socket-id=y,core-id=z,thread-id=n # exact mapping SPAPR -numa cpu,node-id=x,core-id=y # wildcard mapping, all cpu objects that match socket-id=y # are mapped to node-id=x -numa cpu,node-id=x,socket-id=y Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <1494415802-227633-18-git-send-email-imammedo@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2017-05-11 16:08:50 -03:00
Igor Mammedov	1171ae9a5b	numa: remove node_cpu bitmaps as they are no longer used Postfactum "CPU(s) present in multiple NUMA nodes" check was the last user of node_cpu bitmaps, but it's not need as machine_set_cpu_numa_node() does the similar check at the time mapping is set for cpus (i.e. when -numa cpus= is parsed) and ensures that cpu can be mapped only to one node. Remove duplicate check based on node_cpu bitmaps and since the last user is gone remove node_cpu as well, which completes internal transition from legacy bitmap based mapping storage to possible_cpus storage. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Andrew Jones <drjones@redhat.com> Message-Id: <1494415802-227633-17-git-send-email-imammedo@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2017-05-11 16:08:50 -03:00
Igor Mammedov	ec78f8114b	numa: use possible_cpus for not mapped CPUs check and remove corresponding part in numa.c that uses node_cpu bitmaps. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Andrew Jones <drjones@redhat.com> Message-Id: <1494415802-227633-16-git-send-email-imammedo@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2017-05-11 16:08:50 -03:00
Igor Mammedov	3b8a8557f7	numa: remove no longer need numa_post_machine_init() CPUState::numa_node is still in use but now it's set by board when it creates CPU objects. So there isn't any need to set it again after all CPU's are created, since it's been already set. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Andrew Jones <drjones@redhat.com> Message-Id: <1494415802-227633-14-git-send-email-imammedo@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2017-05-11 16:08:50 -03:00
Igor Mammedov	6accfb7823	tests: numa: add case for QMP command query-cpus Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Message-Id: <1494415802-227633-13-git-send-email-imammedo@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2017-05-11 16:08:50 -03:00
Igor Mammedov	af9b20e8d2	numa: do default mapping based on possible_cpus instead of node_cpu bitmaps Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Andrew Jones <drjones@redhat.com> Message-Id: <1494415802-227633-8-git-send-email-imammedo@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2017-05-11 16:08:49 -03:00
Igor Mammedov	7c88e65d9e	numa: mirror cpu to node mapping in MachineState::possible_cpus Introduce machine_set_cpu_numa_node() helper that stores node mapping for CPU in MachineState::possible_cpus. CPU and node it belongs to is specified by 'props' argument. Patch doesn't remove old way of storing mapping in numa_info[X].node_cpu as removing it at the same time makes patch rather big. Instead it just mirrors mapping in possible_cpus and follow up per target patches will switch to possible_cpus and numa_info[X].node_cpu will be removed once there isn't any users left. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Andrew Jones <drjones@redhat.com> Message-Id: <1494415802-227633-7-git-send-email-imammedo@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2017-05-11 16:08:49 -03:00
Igor Mammedov	64c2a8f6d3	numa: add check that board supports cpu_index to node mapping Default node mapping initialization already checks that board supports cpu_index to node mapping and refuses to start if it's not supported. Do the same for explicitly provided mapping "-numa node,cpus=..." Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Andrew Jones <drjones@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Message-Id: <1494415802-227633-6-git-send-email-imammedo@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2017-05-11 16:08:49 -03:00
Igor Mammedov	ea089eebbd	numa: move source of default CPUs to NUMA node mapping into boards Originally CPU threads were by default assigned in round-robin fashion. However it was causing issues in guest since CPU threads from the same socket/core could be placed on different NUMA nodes. Commit `fb43b73b` (pc: fix default VCPU to NUMA node mapping) fixed it by grouping threads within a socket on the same node introducing cpu_index_to_socket_id() callback and commit `20bb648d` (spapr: Fix default NUMA node allocation for threads) reused callback to fix similar issues for SPAPR machine even though socket doesn't make much sense there. As result QEMU ended up having 3 default distribution rules used by 3 targets /virt-arm, spapr, pc/. In effort of moving NUMA mapping for CPUs into possible_cpus, generalize default mapping in numa.c by making boards decide on default mapping and let them explicitly tell generic numa code to which node a CPU thread belongs to by replacing cpu_index_to_socket_id() with @cpu_index_to_instance_props() which provides default node_id assigned by board to specified cpu_index. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Message-Id: <1494415802-227633-2-git-send-email-imammedo@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2017-05-11 16:08:48 -03:00
Laurent Vivier	3bfe57165b	numa: equally distribute memory on nodes When there are more nodes than available memory to put the minimum allowed memory by node, all the memory is put on the last node. This is because we put (ram_size / nb_numa_nodes) & ~((1 << mc->numa_mem_align_shift) - 1); on each node, and in this case the value is 0. This is particularly true with pseries, as the memory must be aligned to 256MB. To avoid this problem, this patch uses an error diffusion algorithm [1] to distribute equally the memory on nodes. We introduce numa_auto_assign_ram() function in MachineClass to keep compatibility between machine type versions. The legacy function is used with pseries-2.9, pc-q35-2.9 and pc-i440fx-2.9 (and previous), the new one with all others. Example: qemu-system-ppc64 -S -nographic -nodefaults -monitor stdio -m 1G -smp 8 \ -numa node -numa node -numa node \ -numa node -numa node -numa node Before: (qemu) info numa 6 nodes node 0 cpus: 0 6 node 0 size: 0 MB node 1 cpus: 1 7 node 1 size: 0 MB node 2 cpus: 2 node 2 size: 0 MB node 3 cpus: 3 node 3 size: 0 MB node 4 cpus: 4 node 4 size: 0 MB node 5 cpus: 5 node 5 size: 1024 MB After: (qemu) info numa 6 nodes node 0 cpus: 0 6 node 0 size: 0 MB node 1 cpus: 1 7 node 1 size: 256 MB node 2 cpus: 2 node 2 size: 0 MB node 3 cpus: 3 node 3 size: 256 MB node 4 cpus: 4 node 4 size: 256 MB node 5 cpus: 5 node 5 size: 256 MB [1] https://en.wikipedia.org/wiki/Error_diffusion Signed-off-by: Laurent Vivier <lvivier@redhat.com> Message-Id: <20170502162955.1610-2-lvivier@redhat.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> [ehabkost: s/ram_size/size/ at numa_default_auto_assign_ram()] Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2017-05-11 16:08:47 -03:00
He Chen	0f203430dd	numa: Allow setting NUMA distance for different NUMA nodes This patch is going to add SLIT table support in QEMU, and provides additional option `dist` for command `-numa` to allow user set vNUMA distance by QEMU command. With this patch, when a user wants to create a guest that contains several vNUMA nodes and also wants to set distance among those nodes, the QEMU command would like: ``` -numa node,nodeid=0,cpus=0 \ -numa node,nodeid=1,cpus=1 \ -numa node,nodeid=2,cpus=2 \ -numa node,nodeid=3,cpus=3 \ -numa dist,src=0,dst=1,val=21 \ -numa dist,src=0,dst=2,val=31 \ -numa dist,src=0,dst=3,val=41 \ -numa dist,src=1,dst=2,val=21 \ -numa dist,src=1,dst=3,val=31 \ -numa dist,src=2,dst=3,val=21 \ ``` Signed-off-by: He Chen <he.chen@linux.intel.com> Message-Id: <1493260558-20728-1-git-send-email-he.chen@linux.intel.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2017-05-11 16:08:37 -03:00
Ishani Chugh	d0e31a105e	Remove reduntant qemu: from error functions This patch removes redundant "qemu:" from error functions. The link to the bitesized task is: http://wiki.qemu-project.org/Contribute/BiteSizedTasks#Error_checking Signed-off-by: Ishani Chugh <chugh.ishani@research.iiit.ac.in> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2017-05-07 09:57:51 +03:00
Laurent Vivier	55641213fc	numa,spapr: align default numa node memory size to 256MB Since commit `224245b` ("spapr: Add LMB DR connectors"), NUMA node memory size must be aligned to 256MB (SPAPR_MEMORY_BLOCK_SIZE). But when "-numa" option is provided without "mem" parameter, the memory is equally divided between nodes, but 8MB aligned. This can be not valid for pseries. In that case we can have: $ ./ppc64-softmmu/qemu-system-ppc64 -m 4G -numa node -numa node -numa node qemu-system-ppc64: Node 0 memory size 0x55000000 is not aligned to 256 MiB With this patch, we have: (qemu) info numa 3 nodes node 0 cpus: 0 node 0 size: 1280 MB node 1 cpus: node 1 size: 1280 MB node 2 cpus: node 2 size: 1536 MB Signed-off-by: Laurent Vivier <lvivier@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-03-22 11:32:42 +11:00
Markus Armbruster	d081a49af8	numa: Flatten simple union NumaOptions Simple unions are simpler than flat unions in the schema, but more complicated in C and on the QMP wire: there's extra indirection in C and extra nesting on the wire, both pointless. They're best avoided in new code. NumaOptions isn't new, but it's only used internally, not in QMP. Convert it to a flat union. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <1487709988-14322-2-git-send-email-armbru@redhat.com>	2017-02-22 19:50:46 +01:00
Paolo Bonzini	0987d735a3	ramblock-notifier: new This adds a notify interface of ram block additions and removals. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-01-16 17:52:35 +01:00
Igor Mammedov	cdda2018e3	numa: make -numa parser dynamically allocate CPUs masks so it won't impose an additional limits on max_cpus limits supported by different targets. It removes global MAX_CPUMASK_BITS constant and need to bump it up whenever max_cpus is being increased for a target above MAX_CPUMASK_BITS value. Use runtime max_cpus value instead to allocate sufficiently sized node_cpu bitmasks in numa parser. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <1479466974-249781-1-git-send-email-imammedo@redhat.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> [ehabkost: Added asserts to ensure cpu_index < max_cpus] Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2017-01-12 15:51:36 -02:00
Igor Mammedov	e1ff3c67e8	monitor: fix qmp/hmp query-memdev not reporting IDs of memory backends Considering 'id' is mandatory for user_creatable objects/backends and user_creatable_add_type() always has it as an argument regardless of where from it is called CLI/monitor or QMP, Fix issue by adding 'id' property to hostmem backends and set it in user_creatable_add_type() for every object that implements 'id' property. Then later at query-memdev time get 'id' from object directly. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <1484052795-158195-4-git-send-email-imammedo@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2017-01-12 15:35:06 -02:00
Igor Mammedov	6bea1ddf8b	numa: reduce code duplication by adding helper numa_get_node_for_cpu() Replace repeated pattern for (i = 0; i < nb_numa_nodes; i++) { if (test_bit(idx, numa_info[i].node_cpu)) { ... break; with a helper function to lookup numa node index for cpu. Suggested-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Shannon Zhao <shannon.zhao@linaro.org> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2016-10-10 01:16:57 +03:00

1 2 3

108 Commits