qemu-e2k

Author	SHA1	Message	Date
Eric Blake	a73049b2a1	numa: Check for qemu_strtosz_MiB error As shown in the previous commit, qemu_strtosz_MiB sometimes leaves the result value untouched (we have to audit further to learn that in that case, the QAPI generator says that visit_type_NumaOptions() will have zero-initialized it), and sometimes leaves it with the value of a partial parse before -EINVAL occurs because of trailing garbage. Rather than blindly treating any string the user may throw at us as valid, we should check for parse failures. Fixes: `cc001888` ("numa: fixup parsed NumaNodeOptions earlier", v2.11.0) Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Hanna Czenczek <hreitz@redhat.com> Message-Id: <20230522190441.64278-14-eblake@redhat.com>	2023-06-02 12:29:27 -05:00
Markus Armbruster	fe8ac1fa49	qapi machine: Elide redundant has_FOO in generated C The has_FOO for pointer-valued FOO are redundant, except for arrays. They are also a nuisance to work with. Recent commit "qapi: Start to elide redundant has_FOO in generated C" provided the means to elide them step by step. This is the step for qapi/machine*.json. Said commit explains the transformation in more detail. The invariant violations mentioned there do not occur here. Cc: Eduardo Habkost <eduardo@habkost.net> Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com> Cc: Philippe Mathieu-Daudé <f4bug@amsat.org> Cc: Yanan Wang <wangyanan55@huawei.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20221104160712.3005652-16-armbru@redhat.com>	2022-12-14 20:04:47 +01:00
Stefan Hajnoczi	4fdd0a1a7e	numa: use QLIST_FOREACH_SAFE() for RAM block notifiers Make list traversal work when a callback removes a notifier mid-traversal. This is a cleanup to prevent bugs in the future. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Message-id: 20221013185908.1297568-9-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2022-10-26 14:56:42 -04:00
Stefan Hajnoczi	1f0fea38f4	numa: call ->ram_block_removed() in ram_block_notifer_remove() When a RAMBlockNotifier is added, ->ram_block_added() is called with all existing RAMBlocks. There is no equivalent ->ram_block_removed() call when a RAMBlockNotifier is removed. The util/vfio-helpers.c code (the sole user of RAMBlockNotifier) is fine with this asymmetry because it does not rely on RAMBlockNotifier for cleanup. It walks its internal list of DMA mappings and unmaps them by itself. Future users of RAMBlockNotifier may not have an internal data structure that records added RAMBlocks so they will need ->ram_block_removed() callbacks. This patch makes ram_block_notifier_remove() symmetric with respect to callbacks. Now util/vfio-helpers.c needs to unmap remaining DMA mappings after ram_block_notifier_remove() has been called. This is necessary since users like block/nvme.c may create additional DMA mappings that do not originate from the RAMBlockNotifier. Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20221013185908.1297568-4-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2022-10-26 14:56:42 -04:00
Paolo Bonzini	26f88d84da	machine: make memory-backend a link property Handle HostMemoryBackend creation and setting of ms->ram entirely in machine_run_board_init. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20220414165300.555321-5-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-05-12 12:29:44 +02:00
Yang Zhong	1105812382	numa: Enable numa for SGX EPC sections The basic SGX did not enable numa for SGX EPC sections, which result in all EPC sections located in numa node 0. This patch enable SGX numa function in the guest and the EPC section can work with RAM as one numa node. The Guest kernel related log: [ 0.009981] ACPI: SRAT: Node 0 PXM 0 [mem 0x180000000-0x183ffffff] [ 0.009982] ACPI: SRAT: Node 1 PXM 1 [mem 0x184000000-0x185bfffff] The SRAT table can normally show SGX EPC sections menory info in different numa nodes. The SGX EPC numa related command: ...... -m 4G,maxmem=20G \ -smp sockets=2,cores=2 \ -cpu host,+sgx-provisionkey \ -object memory-backend-ram,size=2G,host-nodes=0,policy=bind,id=node0 \ -object memory-backend-epc,id=mem0,size=64M,prealloc=on,host-nodes=0,policy=bind \ -numa node,nodeid=0,cpus=0-1,memdev=node0 \ -object memory-backend-ram,size=2G,host-nodes=1,policy=bind,id=node1 \ -object memory-backend-epc,id=mem1,size=28M,prealloc=on,host-nodes=1,policy=bind \ -numa node,nodeid=1,cpus=2-3,memdev=node1 \ -M sgx-epc.0.memdev=mem0,sgx-epc.0.node=0,sgx-epc.1.memdev=mem1,sgx-epc.1.node=1 \ ...... Signed-off-by: Yang Zhong <yang.zhong@intel.com> Message-Id: <20211101162009.62161-2-yang.zhong@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-12-10 09:47:18 +01:00
Paolo Bonzini	bd989ed44f	numa: avoid crash with SGX and "info numa" Add the MEMORY_DEVICE_INFO_KIND_SGX_EPC case, so that enclave memory is included in the output of "info numa" instead of crashing the monitor. Fixes: `a7c565a941` ("sgx-epc: Add the fill_device_info() callback support", 2021-09-30) Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-11-10 13:37:36 +01:00
Michal Privoznik	294aa0437b	numa: Parse initiator= attribute before cpus= attribute When parsing cpus= attribute of -numa object couple of checks is performed, such as correct initiator setting (see the if() statement at the end of for() loop in machine_set_cpu_numa_node()). However, with the current code cpus= attribute is parsed before initiator= attribute and thus the check may fail even though it is not obvious why. But since parsing the initiator= attribute does not depend on the cpus= attribute we can swap the order of the two. It's fairly easy to reproduce with the following command line (snippet of an actual cmd line): -smp 4,sockets=4,cores=1,threads=1 \ -object '{"qom-type":"memory-backend-ram","id":"ram-node0","size":2147483648}' \ -numa node,nodeid=0,cpus=0-1,initiator=0,memdev=ram-node0 \ -object '{"qom-type":"memory-backend-ram","id":"ram-node1","size":2147483648}' \ -numa node,nodeid=1,cpus=2-3,initiator=1,memdev=ram-node1 \ -numa hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-latency,latency=5 \ -numa hmat-lb,initiator=0,target=0,hierarchy=first-level,data-type=access-latency,latency=10 \ -numa hmat-lb,initiator=1,target=1,hierarchy=memory,data-type=access-latency,latency=5 \ -numa hmat-lb,initiator=1,target=1,hierarchy=first-level,data-type=access-latency,latency=10 \ -numa hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-bandwidth,bandwidth=204800K \ -numa hmat-lb,initiator=0,target=0,hierarchy=first-level,data-type=access-bandwidth,bandwidth=208896K \ -numa hmat-lb,initiator=1,target=1,hierarchy=memory,data-type=access-bandwidth,bandwidth=204800K \ -numa hmat-lb,initiator=1,target=1,hierarchy=first-level,data-type=access-bandwidth,bandwidth=208896K \ -numa hmat-cache,node-id=0,size=10K,level=1,associativity=direct,policy=write-back,line=8 \ -numa hmat-cache,node-id=1,size=10K,level=1,associativity=direct,policy=write-back,line=8 \ Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <b27a6a88986d63e3f610a728c845e01ff8d92e2e.1625662776.git.mprivozn@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2021-07-13 09:21:01 -04:00
David Hildenbrand	e15c7d1e8c	numa: Make all callbacks of ram block notifiers optional Let's make add/remove optional. We want to introduce a RAM block notifier for RAM migration that is only interested in resize events. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20210429112708.12291-4-david@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-05-13 18:21:13 +01:00
David Hildenbrand	8f44304c76	numa: Teach ram block notifiers about resizeable ram blocks Ram block notifiers are currently not aware of resizes. To properly handle resizes during migration, we want to teach ram block notifiers about resizeable ram. Introduce the basic infrastructure but keep using max_size in the existing notifiers. Supply the max_size when adding and removing ram blocks. Also, notify on resizes. Acked-by: Paul Durrant <paul@xen.org> Reviewed-by: Peter Xu <peterx@redhat.com> Cc: xen-devel@lists.xenproject.org Cc: haxm-team@intel.com Cc: Paul Durrant <paul@xen.org> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Anthony Perard <anthony.perard@citrix.com> Cc: Wenchao Wang <wenchao.wang@intel.com> Cc: Colin Xu <colin.xu@intel.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20210429112708.12291-3-david@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-05-13 18:21:13 +01:00
David Hildenbrand	082851a3af	util: vfio-helpers: Factor out and fix processing of existing ram blocks Factor it out into common code when a new notifier is registered, just as done with the memory region notifier. This keeps logic about how to process existing ram blocks at a central place. Just like when adding a new ram block, we have to register the max_length. Ram blocks are only "fake resized". All memory (max_length) is mapped. Print the warning from inside qemu_vfio_ram_block_added(). Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20210429112708.12291-2-david@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-05-13 18:21:13 +01:00
Thomas Huth	4c386f8064	Do not include sysemu/sysemu.h if it's not really necessary Stop including sysemu/sysemu.h in files that don't need it. Signed-off-by: Thomas Huth <thuth@redhat.com> Message-Id: <20210416171314.2074665-2-thuth@redhat.com> Signed-off-by: Laurent Vivier <laurent@vivier.eu>	2021-05-02 17:24:50 +02:00
Paolo Bonzini	b326b6ea79	make ram_size local to vl.c Use the machine properties for the leftovers too. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-12-10 12:15:10 -05:00
Igor Mammedov	1b5e843ab6	numa: hmat: require parent cache description before the next level one Spec[1] defines 0 - 3 level memory side cache, however QEMU CLI allows to specify an intermediate cache level without specifying previous level. Such option(s) silently ignored when building HMAT table, which leads to incomplete cache information. Make sure that previous level exists and error out if it hasn't been provided. 1) ACPI 6.2A 5.2.27.5 Memory Side Cache Information Structure Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1842877 Signed-off-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Message-Id: <20201006150002.1601845-1-imammedo@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2020-10-06 11:09:41 -04:00
Igor Mammedov	270b33cc1c	numa: remove fixup numa_state->num_nodes to MAX_NODES current code permits only nodeids in [0..MAX_NODES) range due to nodeid check in parse_numa_node() if (nodenr >= MAX_NODES) { error_setg(errp, "Max number of NUMA nodes reached: %" so subj fixup is not reachable, drop it. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <20200911084410.788171-4-imammedo@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-09-30 19:09:20 +02:00
Igor Mammedov	b21aa7e01e	numa: drop support for '-numa node' (without memory specified) it was deprecated since 4.1 commit `4bb4a2732e` (numa: deprecate implict memory distribution between nodes) Users of existing VMs, wishing to preserve the same RAM distribution, should configure it explicitly using ``-numa node,memdev`` options. Current RAM distribution can be retrieved using HMP command `info numa` and if separate memory devices (pc\|nv-dimm) are present use `info memory-device` and subtract device memory from output of `info numa`. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <20200911084410.788171-2-imammedo@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-09-30 19:09:20 +02:00
Igor Mammedov	fadb055bd4	numa: hmat: fix cache size check when QEMU is started like: qemu-system-x86_64 -smp 2 -machine hmat=on \ -m 2G \ -object memory-backend-ram,size=1G,id=m0 \ -object memory-backend-ram,size=1G,id=m1 \ -numa node,nodeid=0,memdev=m0 \ -numa node,nodeid=1,memdev=m1,initiator=0 \ -numa cpu,node-id=0,socket-id=0 \ -numa cpu,node-id=0,socket-id=1 \ -numa hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-latency,latency=5 \ -numa hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-bandwidth,bandwidth=200M \ -numa hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-latency,latency=10 \ -numa hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-bandwidth,bandwidth=100M \ -numa hmat-cache,node-id=0,size=8K,level=1,associativity=direct,policy=write-back,line=5 \ -numa hmat-cache,node-id=0,size=16K,level=2,associativity=direct,policy=write-back,line=5 it errors out with: -numa hmat-cache,node-id=0,size=16K,level=2,associativity=direct,policy=write-back,line=5: Invalid size=16384, the size of level=2 should be less than the size(8192) of level=1 which doesn't look right as one would expect that L1 < L2 < L3 ... Fix it by sawpping relevant size checks. Fixes: `c412a48d4d` (numa: Extend CLI to provide memory side cache information) Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <20200821100519.1325691-1-imammedo@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2020-08-27 13:54:37 -04:00
Markus Armbruster	b11a093c60	qapi: Smooth another visitor error checking pattern Convert visit_type_FOO(v, ..., &ptr, &err); ... if (err) { ... } to visit_type_FOO(v, ..., &ptr, errp); ... if (!ptr) { ... } for functions that set @ptr to non-null / null on success / error. Eliminate error_propagate() that are now unnecessary. Delete @err that are now unused. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200707160613.848843-40-armbru@redhat.com>	2020-07-10 15:18:08 +02:00
Markus Armbruster	992861fb1e	error: Eliminate error_propagate() manually When all we do with an Error we receive into a local variable is propagating to somewhere else, we can just as well receive it there right away. The previous two commits did that for sufficiently simple cases with Coccinelle. Do it for several more manually. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200707160613.848843-37-armbru@redhat.com>	2020-07-10 15:18:08 +02:00
Markus Armbruster	5325cc34a2	qom: Put name parameter before value / visitor parameter The object_property_set_FOO() setters take property name and value in an unusual order: void object_property_set_FOO(Object obj, FOO_TYPE value, const char name, Error **errp) Having to pass value before name feels grating. Swap them. Same for object_property_set(), object_property_get(), and object_property_parse(). Convert callers with this Coccinelle script: @@ identifier fun = { object_property_get, object_property_parse, object_property_set_str, object_property_set_link, object_property_set_bool, object_property_set_int, object_property_set_uint, object_property_set, object_property_set_qobject }; expression obj, v, name, errp; @@ - fun(obj, v, name, errp) + fun(obj, name, v, errp) Chokes on hw/arm/musicpal.c's lcd_refresh() with the unhelpful error message "no position information". Convert that one manually. Fails to convert hw/arm/armsse.c, because Coccinelle gets confused by ARMSSE being used both as typedef and function-like macro there. Convert manually. Fails to convert hw/rx/rx-gdbsim.c, because Coccinelle gets confused by RXCPU being used both as typedef and function-like macro there. Convert manually. The other files using RXCPU that way don't need conversion. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20200707160613.848843-27-armbru@redhat.com> [Straightforwad conflict with commit `2336172d9b` "audio: set default value for pcspk.iobase property" resolved]	2020-07-10 15:18:08 +02:00
David Hildenbrand	195784a0cf	numa: Auto-enable NUMA when any memory devices are possible Let's auto-enable it also when maxmem is specified but no slots are defined. This will result in us properly creating ACPI srat tables, indicating the maximum possible PFN to the guest OS. Based on this, e.g., Linux will enable the swiotlb properly. This avoids having to manually force the switolb on (swiotlb=force) in Linux in case we're booting only using DMA memory (e.g., 2GB on x86-64), and virtio-mem adds memory later on that really needs the swiotlb to be used for DMA. Let's take care of backwards compatibility if somebody has a setup that specifies "maxram" without "slots". Reported-by: Alex Shi <alex.shi@linux.alibaba.com> Cc: Peter Maydell <peter.maydell@linaro.org> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com> Cc: Sergio Lopez <slp@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Richard Henderson <rth@twiddle.net> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: qemu-arm@nongnu.org <qemu-arm@nongnu.org> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20200626072248.78761-22-david@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2020-07-03 07:57:04 -04:00
David Hildenbrand	16647a8224	numa: Handle virtio-mem in NUMA stats Account the memory to the configured nid. Reviewed-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20200626072248.78761-15-david@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2020-07-03 07:57:04 -04:00
Igor Mammedov	32a354dc6c	numa: forbid '-numa node, mem' for 5.1 and newer machine types Deprecation period is run out and it's a time to flip the switch introduced by `cd5ff8333a`. Disable legacy option for new machine types (since 5.1) and amend documentation. '-numa node,memdev' shall be used instead of disabled option with new machine types. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org> Message-Id: <20200609135635.761587-1-imammedo@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-06-26 09:39:39 -04:00
Igor Mammedov	ea81f98bce	numa: prevent usage of -M memory-backend and -numa memdev at the same time Options -M memory-backend and -numa memdev are mutually exclusive, and if used together, it might lead to a crash in the worst case. For example when the same backend is used with these options together: -m 4G \ -object memory-backend-ram,id=mem0,size=4G \ -M pc,memory-backend=mem0 \ -numa node,memdev=mem0 QEMU will abort with: exec.c:2006: qemu_ram_set_idstr: Assertion `!new_block->idstr[0]' failed. and following backtrace: abort () qemu_ram_set_idstr () vmstate_register_ram () vmstate_register_ram_global () machine_consume_memdev () numa_init_memdev_container () numa_complete_configuration () machine_run_board_init () add a check to error out in case the user tries to use both options at the same time. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <20200511141103.43768-3-imammedo@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-06-10 12:09:34 -04:00
Igor Mammedov	f0530f14c7	remove no longer used memory_region_allocate_system_memory() all boards were switched to using memdev backend for main RAM, so we can drop no longer used memory_region_allocate_system_memory() Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200219160953.13771-73-imammedo@redhat.com>	2020-02-19 16:50:01 +00:00
Igor Mammedov	6b61c2c596	initialize MachineState::ram in NUMA case In case of NUMA there are 2 cases to consider: 1. '-numa node,memdev', the only one that will be available for 5.0 and newer machine types. In this case reuse current behavior, with only difference memdevs are put into MachineState::ram container + a temporary glue to keep memory_region_allocate_system_memory() working until all boards converted. 2. fake NUMA ("-numa node mem" and default RAM splitting) the later has been deprecated and will be removed but the former is going to stay available for compat reasons for 5.0 and older machine types it takes allocate_system_memory_nonnuma() path, like non-NUMA case and falls under conversion to memdev. So extend non-NUMA MachineState::ram initialization introduced in previous patch to take care of fake NUMA case. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <20200219160953.13771-6-imammedo@redhat.com>	2020-02-19 16:49:53 +00:00
Igor Mammedov	82b911aaff	machine: introduce convenience MachineState::ram the new field will be used by boards to get access to main RAM memory region and will help to save boiler plate in boards which often introduce a field or variable just for this purpose. Memory region will be equivalent to what currently used memory_region_allocate_system_memory() is returning apart from that it will come from hostmem backend. Followup patches will incrementally switch boards to using RAM from MachineState::ram. Patch takes care of non-NUMA case and follow up patch will initialize MachineState::ram for NUMA case. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20200219160953.13771-5-imammedo@redhat.com>	2020-02-19 16:49:53 +00:00
Igor Mammedov	68a86dc15c	numa: remove deprecated -mem-path fallback to anonymous RAM it has been deprecated since 4.0 by commit `cb79224b7` (deprecate -mem-path fallback to anonymous RAM) Deprecation period ran out and it's time to remove it so it won't get in a way of switching to using hostmem backend for RAM. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20200219160953.13771-2-imammedo@redhat.com>	2020-02-19 16:49:53 +00:00
Peter Maydell	973d306dd6	virtio, pci, pc: fixes, features Bugfixes all over the place. HMAT support. New flags for vhost-user-blk utility. Auto-tuning of seg max for virtio storage. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> -----BEGIN PGP SIGNATURE----- iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAl4TaMEPHG1zdEByZWRo YXQuY29tAAoJECgfDbjSjVRpvzgH/2LyDAzCa9h93ikSJjmyUk5FUaqve38daEb3 S3JYjwKxQx7u1ydooKhvBQnBCZ2i3S+k62gfYyKB+nBv8xvjs0Eg5D1YJ5E8hciy lf5OFGWWtX2iPDjZwQwT13kiJe0o3JRGxJJ6XqTEG+1EYOp7cky/FEv4PD030b9m I2wROZ/Am+onB9YJX8c0Vv1CG+AryuJNXnvwQzTXEjj4U7bEYUyJwVZaCRyAdWQ3 uYXIZN9VwjVX6BFvy9ZAJbEsUVJvOM1/aQaDqcrLz+VlzRT7bRkKHi2G3vakrm1I r5OpgyLo84132awCncbSykKDH5o8WaxLaJBjGmuBfasMz9wPzAg= =uL1o -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging virtio, pci, pc: fixes, features Bugfixes all over the place. HMAT support. New flags for vhost-user-blk utility. Auto-tuning of seg max for virtio storage. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> # gpg: Signature made Mon 06 Jan 2020 17:05:05 GMT # gpg: using RSA key 5D09FD0871C8F85B94CA8A0D281F0DB8D28D5469 # gpg: issuer "mst@redhat.com" # gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" [full] # gpg: aka "Michael S. Tsirkin <mst@redhat.com>" [full] # Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67 # Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469 * remotes/mst/tags/for_upstream: (32 commits) intel_iommu: add present bit check for pasid table entries intel_iommu: a fix to vtd_find_as_from_bus_num() virtio-net: delete also control queue when TX/RX deleted virtio: reset region cache when on queue deletion virtio-mmio: update queue size on guest write tests: add virtio-scsi and virtio-blk seg_max_adjust test virtio: make seg_max virtqueue size dependent hw: fix using 4.2 compat in 5.0 machine types for i440fx/q35 vhost-user-scsi: reset the device if supported vhost-user: add VHOST_USER_RESET_DEVICE to reset devices hw/pci/pci_host: Let pci_data_[read/write] use unsigned 'size' argument hw/pci/pci_host: Remove redundant PCI_DPRINTF() virtio-mmio: Clear v2 transport state on soft reset ACPI: add expected files for HMAT tests (acpihmat) tests/bios-tables-test: add test cases for ACPI HMAT tests/numa: Add case for QMP build HMAT hmat acpi: Build Memory Side Cache Information Structure(s) hmat acpi: Build System Locality Latency and Bandwidth Information Structure(s) hmat acpi: Build Memory Proximity Domain Attributes Structure(s) numa: Extend CLI to provide memory side cache information ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2020-01-07 16:25:00 +00:00
Liu Jingqi	c412a48d4d	numa: Extend CLI to provide memory side cache information Add -numa hmat-cache option to provide Memory Side Cache Information. These memory attributes help to build Memory Side Cache Information Structure(s) in ACPI Heterogeneous Memory Attribute Table (HMAT). Before using hmat-cache option, enable HMAT with -machine hmat=on. Acked-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Liu Jingqi <jingqi.liu@intel.com> Signed-off-by: Tao Xu <tao3.xu@intel.com> Message-Id: <20191213011929.2520-4-tao3.xu@intel.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com>	2020-01-05 07:03:03 -05:00
Liu Jingqi	9b12dfa03a	numa: Extend CLI to provide memory latency and bandwidth information Add -numa hmat-lb option to provide System Locality Latency and Bandwidth Information. These memory attributes help to build System Locality Latency and Bandwidth Information Structure(s) in ACPI Heterogeneous Memory Attribute Table (HMAT). Before using hmat-lb option, enable HMAT with -machine hmat=on. Acked-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Liu Jingqi <jingqi.liu@intel.com> Signed-off-by: Tao Xu <tao3.xu@intel.com> Message-Id: <20191213011929.2520-3-tao3.xu@intel.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com>	2020-01-05 07:03:03 -05:00
Tao Xu	244b3f4485	numa: Extend CLI to provide initiator information for numa nodes In ACPI 6.3 chapter 5.2.27 Heterogeneous Memory Attribute Table (HMAT), The initiator represents processor which access to memory. And in 5.2.27.3 Memory Proximity Domain Attributes Structure, the attached initiator is defined as where the memory controller responsible for a memory proximity domain. With attached initiator information, the topology of heterogeneous memory can be described. Add new machine property 'hmat' to enable all HMAT specific options. Extend CLI of "-numa node" option to indicate the initiator numa node-id. In the linux kernel, the codes in drivers/acpi/hmat/hmat.c parse and report the platform's HMAT tables. Before using initiator option, enable HMAT with -machine hmat=on. Acked-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Jingqi Liu <jingqi.liu@intel.com> Suggested-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Tao Xu <tao3.xu@intel.com> Message-Id: <20191213011929.2520-2-tao3.xu@intel.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2020-01-05 07:03:03 -05:00
Igor Mammedov	5275db59aa	numa: remove not needed check Currently parse_numa_node() is always called from already numa enabled context. Drop unnecessary check if numa is supported. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <1576154936-178362-2-git-send-email-imammedo@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2019-12-19 14:54:11 -03:00
Greg Kurz	88ed5db16c	numa: Add missing \n to error message If memory allocation fails when using -mem-path, QEMU is supposed to print out a message to indicate that fallback to anonymous RAM is deprecated. This is done with error_printf() which does output buffering. As a consequence, the message is only printed at the next flush, eg. when quiting QEMU, and it also lacks a trailing newline: qemu-system-ppc64: unable to map backing store for guest RAM: Cannot allocate memory qemu-system-ppc64: warning: falling back to regular RAM allocation QEMU 4.1.50 monitor - type 'help' for more information (qemu) q This is deprecated. Make sure that -mem-path specified path has sufficient resources to allocate -m specified RAM amountgreg@boss02:~/Work/qemu/qemu-spapr$ Add the missing \n to fix both issues. Fixes: `cb79224b7e` "deprecate -mem-path fallback to anonymous RAM" Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Laurent Vivier <laurent@vivier.eu> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <157304440026.351774.14607704217028190097.stgit@bahia.lan> Signed-off-by: Laurent Vivier <laurent@vivier.eu>	2019-11-12 10:34:23 +01:00
Tao Xu	0533ef5f20	numa: Introduce MachineClass::auto_enable_numa for implicit NUMA node Add MachineClass::auto_enable_numa field. When it is true, a NUMA node is expected to be created implicitly. Acked-by: David Gibson <david@gibson.dropbear.id.au> Suggested-by: Igor Mammedov <imammedo@redhat.com> Suggested-by: Eduardo Habkost <ehabkost@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: Tao Xu <tao3.xu@intel.com> Message-Id: <20190905083238.1799-1-tao3.xu@intel.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2019-10-15 18:18:08 -03:00
Tao Xu	7e721e7b10	numa: move numa global variable numa_info into MachineState Move existing numa global numa_info (renamed as "nodes") into NumaState. Reviewed-by: Igor Mammedov <imammedo@redhat.com> Suggested-by: Igor Mammedov <imammedo@redhat.com> Suggested-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Tao Xu <tao3.xu@intel.com> Message-Id: <20190809065731.9097-5-tao3.xu@intel.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2019-09-03 11:26:55 -03:00
Tao Xu	118154b767	numa: move numa global variable have_numa_distance into MachineState Move existing numa global have_numa_distance into NumaState. Reviewed-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Liu Jingqi <jingqi.liu@intel.com> Suggested-by: Igor Mammedov <imammedo@redhat.com> Suggested-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Tao Xu <tao3.xu@intel.com> Message-Id: <20190809065731.9097-4-tao3.xu@intel.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2019-09-03 11:26:55 -03:00
Tao Xu	aa57020774	numa: move numa global variable nb_numa_nodes into MachineState Add struct NumaState in MachineState and move existing numa global nb_numa_nodes(renamed as "num_nodes") into NumaState. And add variable numa_support into MachineClass to decide which submachines support NUMA. Reviewed-by: Igor Mammedov <imammedo@redhat.com> Suggested-by: Igor Mammedov <imammedo@redhat.com> Suggested-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Tao Xu <tao3.xu@intel.com> Message-Id: <20190809065731.9097-3-tao3.xu@intel.com> [ehabkost: include hw/boards.h again to fix build failures] Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2019-09-03 11:26:55 -03:00
Markus Armbruster	2e5b09fd0e	hw/core: Move cpu.c, cpu.h from qom/ to hw/core/ Suggested-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20190709152053.16670-2-armbru@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com> [Rebased onto merge commit 95a9457fd44; missed instances of qom/cpu.h in comments replaced]	2019-08-21 13:24:01 +02:00
Markus Armbruster	46517dd497	Include sysemu/sysemu.h a lot less In my "build everything" tree, changing sysemu/sysemu.h triggers a recompile of some 5400 out of 6600 objects (not counting tests and objects that don't depend on qemu/osdep.h). hw/qdev-core.h includes sysemu/sysemu.h since recent commit `e965ffa70a` "qdev: add qdev_add_vm_change_state_handler()". This is a bad idea: hw/qdev-core.h is widely included. Move the declaration of qdev_add_vm_change_state_handler() to sysemu/sysemu.h, and drop the problematic include from hw/qdev-core.h. Touching sysemu/sysemu.h now recompiles some 1800 objects. qemu/uuid.h also drops from 5400 to 1800. A few more headers show smaller improvement: qemu/notify.h drops from 5600 to 5200, qemu/timer.h from 5600 to 4500, and qapi/qapi-types-run-state.h from 5500 to 5000. Cc: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20190812052359.30071-28-armbru@redhat.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org>	2019-08-16 13:31:53 +02:00
Markus Armbruster	b58c5c2dd2	numa: Move remaining NUMA declarations from sysemu.h to numa.h Commit `e35704ba9c` "numa: Move NUMA declarations from sysemu.h to numa.h" left a few NUMA-related macros behind. Move them now. Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Message-Id: <20190812052359.30071-26-armbru@redhat.com>	2019-08-16 13:31:53 +02:00
Markus Armbruster	12e9493df9	Include hw/boards.h a bit less hw/boards.h pulls in almost 60 headers. The less we include it into headers, the better. As a first step, drop superfluous inclusions, and downgrade some more to what's actually needed. Gets rid of just one inclusion into a header. Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Message-Id: <20190812052359.30071-23-armbru@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>	2019-08-16 13:31:53 +02:00
Markus Armbruster	d645427057	Include migration/vmstate.h less In my "build everything" tree, changing migration/vmstate.h triggers a recompile of some 2700 out of 6600 objects (not counting tests and objects that don't depend on qemu/osdep.h). hw/hw.h supposedly includes it for convenience. Several other headers include it just to get VMStateDescription. The previous commit made that unnecessary. Include migration/vmstate.h only where it's still needed. Touching it now recompiles only some 1600 objects. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Message-Id: <20190812052359.30071-16-armbru@redhat.com> Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>	2019-08-16 13:31:52 +02:00
Igor Mammedov	b69239e085	numa: allow memory-less nodes when using memdev as backend QEMU fails to start if memory-less node is present when memdev is used qemu-system-x86_64 -object memory-backend-ram,id=ram0,size=128M \ -numa node -numa node,memdev=ram0 with error: "memdev option must be specified for either all or no nodes" which works as expected if legacy 'mem' is used. Fix check to make memory-less nodes valid when memdev option is used but still disallow mix of mem and memdev options. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <20190702140745.27767-2-imammedo@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2019-07-05 17:12:45 -03:00
Eduardo Habkost	f8123f2275	numa: Make deprecation warnings conditional on !qtest_enabled() This will help us avoid spurious warnings during "make check". Note that this will silence the warnings generated by tests/numa-test, but not the ones generated by tests/bios-tables-test. We still need to change tests/bios-tables-test to use "-numa ...,memdev=" to silence these warnings. Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Message-Id: <20190702215726.23661-1-ehabkost@redhat.com>	2019-07-05 17:12:45 -03:00
Igor Mammedov	cb79224b7e	deprecate -mem-path fallback to anonymous RAM Fallback might affect guest or worse whole host performance or functionality if backing file were used to share guest RAM with another process. Patch deprecates fallback so that we could remove it in future and ensure that QEMU will provide expected behavior and fail if it can't use user provided backing file. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20190626074228.11558-1-imammedo@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2019-07-05 17:08:03 -03:00
Igor Mammedov	4bb4a2732e	numa: deprecate implict memory distribution between nodes Implicit RAM distribution between nodes has exactly the same issues as: "numa: deprecate 'mem' parameter of '-numa node' option" only with QEMU being the user that's 'adding' 'mem' parameter. Deprecate it, to get it out of the way so that we could consolidate guest RAM allocation using memory backends making it consistent and possibly later on transition to using memory devices instead of adhoc memory mapping for the initial RAM. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <1559205199-233510-4-git-send-email-imammedo@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2019-07-05 17:08:03 -03:00
Igor Mammedov	cdf8036520	numa: deprecate 'mem' parameter of '-numa node' option The parameter allows to configure fake NUMA topology where guest VM simulates NUMA topology but not actually getting performance benefits from it. The same or better results could be achieved using 'memdev' parameter. Beside of unpredictable performance, '-numa node.mem' option has other issues when it's used with combination of -mem-path + + -mem-prealloc + memdev backends (pc-dimm), breaking binding of memdev backends since mem-path/mem-prealloc are global and affect the most of RAM allocations. It's possible to make memdevs and global -mem-path/mem-prealloc to play nicely together but that will just complicate already complicated code and add unobious ways it could break on 2 different memmory allocation pathes and their combinations. Instead of it, consolidate all guest RAM allocation over memdev which still allows to create fake NUMA configurations if desired and leaves one simplifyed code path to consider when it comes to guest RAM allocation. To achieve desired simplification deprecate 'mem' parameter as its ad-hoc partitioning of initial RAM MemoryRegion can't be translated to memdev based backend transparently to users and in compatible manner (migration wise). Later down the road that will allow to consolidate means of how guest RAM is allocated and would permit us to clean up quite a bit memory allocations and numa code, leaving only 'memdev' implementation in place. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <1559205199-233510-3-git-send-email-imammedo@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2019-07-05 17:08:03 -03:00
Like Xu	5cc8767d05	general: Replace global smp variables with smp machine properties Basically, the context could get the MachineState reference via call chains or unrecommended qdev_get_machine() in !CONFIG_USER_ONLY mode. A local variable of the same name would be introduced in the declaration phase out of less effort OR replace it on the spot if it's only used once in the context. No semantic changes. Signed-off-by: Like Xu <like.xu@linux.intel.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Message-Id: <20190518205428.90532-4-like.xu@linux.intel.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2019-07-05 17:07:36 -03:00
Peter Maydell	c35d17cabc	virtio, pc, pci: features, fixes, cleanups virtio-pmem support. libvhost user mq support. A bunch of fixes all over the place. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> -----BEGIN PGP SIGNATURE----- iQEbBAABAgAGBQJdHmkBAAoJECgfDbjSjVRpEAIH+Kmy8n5Et9NzsnmNqHAiC/pg 3V5wGyp9M4ZJVPXC0z/Q1sYJ3YYP6dBd4tjj2/7LzYZSlqlQIs83UlQCo0XTiliH /jZD/IaAZABnfB7vAeZW67WNT2a20xG2Jr83083lSaDUI/pfIdvbMelIbBLmo/kd tWdAAWT0kcGYjyz4xQQgtAH6zAQUleKE7ECUJ2TpJQbSMLxdI/YTaoYqek471YdP ju5OLBO3WbNkSE9JYz4MJqTudYK0sKu568UqBVF8JdpFd5Cv+X/OI+bCsc4QK8KN DTtFVVvbm1KGPSceqc9rwsDjO4Wd8ThvuZxrB029AahD6vT82F13IHpi/S29Fw== =WAFb -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging virtio, pc, pci: features, fixes, cleanups virtio-pmem support. libvhost user mq support. A bunch of fixes all over the place. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> # gpg: Signature made Thu 04 Jul 2019 22:00:49 BST # gpg: using RSA key 281F0DB8D28D5469 # gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" [full] # gpg: aka "Michael S. Tsirkin <mst@redhat.com>" [full] # Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67 # Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469 * remotes/mst/tags/for_upstream: (22 commits) docs: avoid vhost-user-net specifics in multiqueue section libvhost-user: implement VHOST_USER_PROTOCOL_F_MQ libvhost-user: support many virtqueues libvhost-user: add vmsg_set_reply_u64() helper pc: Move compat_apic_id_mode variable to PCMachineClass virtio: Don't change "started" flag on virtio_vmstate_change() virtio: Make sure we get correct state of device on handle_aio_output() virtio: Set "start_on_kick" on virtio_set_features() virtio: Set "start_on_kick" for legacy devices virtio: add "use-started" property virtio-pci: fix missing device properties pc: Support for virtio-pmem-pci numa: Handle virtio-pmem in NUMA stats hmp: Handle virtio-pmem when printing memory device infos virtio-pci: Proxy for virtio-pmem virtio-pmem: sync linux headers virtio-pci: Allow to specify additional interfaces for the base type virtio-pmem: add virtio device pcie: minor cleanups for slot control/status pcie: work around for racy guest init ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2019-07-05 09:51:50 +01:00

1 2

52 Commits