Commit Graph

75 Commits

Author SHA1 Message Date
Igor Mammedov 419fcdec3c numa: add '-numa cpu,...' option for property based node mapping
legacy cpu to node mapping is using cpu index values to map
VCPU to node with help of '-numa node,nodeid=node,cpus=x[-y]'
option. However cpu index is internal concept and QEMU users
have to guess /reimplement qemu's logic/ to map it to
a concrete cpu socket/core/thread to make sane CPUs
placement across numa nodes.

This patch allows to map cpu objects to numa nodes using
the same properties as used for cpus with -device/device_add
(socket-id/core-id/thread-id/node-id).

At present valid properties/values to address CPUs could be
fetched using hotpluggable-cpus monitor/qmp command, it will
require user to start qemu twice when creating domain to fetch
possible CPUs for a machine type/-smp layout first and
then the second time with numa explicit mapping for actual
usage. The first step results could be saved and reused to
set/change mapping later as far as machine type/-smp stays
the same.

Proposed impl. supports exact and wildcard matching to
simplify CLI and allow to set mapping for a specific cpu
or group of cpu objects specified by matched properties.

For example:

   # exact mapping x86
   -numa cpu,node-id=x,socket-id=y,core-id=z,thread-id=n

   # exact mapping SPAPR
   -numa cpu,node-id=x,core-id=y

   # wildcard mapping, all cpu objects that match socket-id=y
   # are mapped to node-id=x
   -numa cpu,node-id=x,socket-id=y

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <1494415802-227633-18-git-send-email-imammedo@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-05-11 16:08:50 -03:00
Igor Mammedov 1171ae9a5b numa: remove node_cpu bitmaps as they are no longer used
Postfactum "CPU(s) present in multiple NUMA nodes" check
was the last user of node_cpu bitmaps, but it's not need
as machine_set_cpu_numa_node() does the similar check at
the time mapping is set for cpus (i.e. when -numa cpus=
is parsed) and ensures that cpu can be mapped only to
one node.

Remove duplicate check based on node_cpu bitmaps and
since the last user is gone remove node_cpu as well,
which completes internal transition from legacy bitmap
based mapping storage to possible_cpus storage.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Message-Id: <1494415802-227633-17-git-send-email-imammedo@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-05-11 16:08:50 -03:00
Igor Mammedov ec78f8114b numa: use possible_cpus for not mapped CPUs check
and remove corresponding part in numa.c that uses
node_cpu bitmaps.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Message-Id: <1494415802-227633-16-git-send-email-imammedo@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-05-11 16:08:50 -03:00
Igor Mammedov 3b8a8557f7 numa: remove no longer need numa_post_machine_init()
CPUState::numa_node is still in use but now it's set by
board when it creates CPU objects. So there isn't any
need to set it again after all CPU's are created,
since it's been already set.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Message-Id: <1494415802-227633-14-git-send-email-imammedo@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-05-11 16:08:50 -03:00
Igor Mammedov 6accfb7823 tests: numa: add case for QMP command query-cpus
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Message-Id: <1494415802-227633-13-git-send-email-imammedo@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-05-11 16:08:50 -03:00
Igor Mammedov af9b20e8d2 numa: do default mapping based on possible_cpus instead of node_cpu bitmaps
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Message-Id: <1494415802-227633-8-git-send-email-imammedo@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-05-11 16:08:49 -03:00
Igor Mammedov 7c88e65d9e numa: mirror cpu to node mapping in MachineState::possible_cpus
Introduce machine_set_cpu_numa_node() helper that stores
node mapping for CPU in MachineState::possible_cpus.
CPU and node it belongs to is specified by 'props' argument.

Patch doesn't remove old way of storing mapping in
numa_info[X].node_cpu as removing it at the same time
makes patch rather big. Instead it just mirrors mapping
in possible_cpus and follow up per target patches will
switch to possible_cpus and numa_info[X].node_cpu will
be removed once there isn't any users left.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Message-Id: <1494415802-227633-7-git-send-email-imammedo@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-05-11 16:08:49 -03:00
Igor Mammedov 64c2a8f6d3 numa: add check that board supports cpu_index to node mapping
Default node mapping initialization already checks that board
supports cpu_index to node mapping and refuses to start if
it's not supported. Do the same for explicitly provided
mapping "-numa node,cpus=..."

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Message-Id: <1494415802-227633-6-git-send-email-imammedo@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-05-11 16:08:49 -03:00
Igor Mammedov ea089eebbd numa: move source of default CPUs to NUMA node mapping into boards
Originally CPU threads were by default assigned in
round-robin fashion. However it was causing issues in
guest since CPU threads from the same socket/core could
be placed on different NUMA nodes.
Commit fb43b73b (pc: fix default VCPU to NUMA node mapping)
fixed it by grouping threads within a socket on the same node
introducing cpu_index_to_socket_id() callback and commit
20bb648d (spapr: Fix default NUMA node allocation for threads)
reused callback to fix similar issues for SPAPR machine
even though socket doesn't make much sense there.

As result QEMU ended up having 3 default distribution rules
used by 3 targets /virt-arm, spapr, pc/.

In effort of moving NUMA mapping for CPUs into possible_cpus,
generalize default mapping in numa.c by making boards decide
on default mapping and let them explicitly tell generic
numa code to which node a CPU thread belongs to by replacing
cpu_index_to_socket_id() with @cpu_index_to_instance_props()
which provides default node_id assigned by board to specified
cpu_index.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <1494415802-227633-2-git-send-email-imammedo@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-05-11 16:08:48 -03:00
Laurent Vivier 3bfe57165b numa: equally distribute memory on nodes
When there are more nodes than available memory to put the minimum
allowed memory by node, all the memory is put on the last node.

This is because we put (ram_size / nb_numa_nodes) &
~((1 << mc->numa_mem_align_shift) - 1); on each node, and in this
case the value is 0. This is particularly true with pseries,
as the memory must be aligned to 256MB.

To avoid this problem, this patch uses an error diffusion algorithm [1]
to distribute equally the memory on nodes.

We introduce numa_auto_assign_ram() function in MachineClass
to keep compatibility between machine type versions.
The legacy function is used with pseries-2.9, pc-q35-2.9 and
pc-i440fx-2.9 (and previous), the new one with all others.

Example:

qemu-system-ppc64 -S -nographic  -nodefaults -monitor stdio -m 1G -smp 8 \
                  -numa node -numa node -numa node \
                  -numa node -numa node -numa node

Before:

(qemu) info numa
6 nodes
node 0 cpus: 0 6
node 0 size: 0 MB
node 1 cpus: 1 7
node 1 size: 0 MB
node 2 cpus: 2
node 2 size: 0 MB
node 3 cpus: 3
node 3 size: 0 MB
node 4 cpus: 4
node 4 size: 0 MB
node 5 cpus: 5
node 5 size: 1024 MB

After:
(qemu) info numa
6 nodes
node 0 cpus: 0 6
node 0 size: 0 MB
node 1 cpus: 1 7
node 1 size: 256 MB
node 2 cpus: 2
node 2 size: 0 MB
node 3 cpus: 3
node 3 size: 256 MB
node 4 cpus: 4
node 4 size: 256 MB
node 5 cpus: 5
node 5 size: 256 MB

[1] https://en.wikipedia.org/wiki/Error_diffusion

Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Message-Id: <20170502162955.1610-2-lvivier@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
[ehabkost: s/ram_size/size/ at numa_default_auto_assign_ram()]
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-05-11 16:08:47 -03:00
He Chen 0f203430dd numa: Allow setting NUMA distance for different NUMA nodes
This patch is going to add SLIT table support in QEMU, and provides
additional option `dist` for command `-numa` to allow user set vNUMA
distance by QEMU command.

With this patch, when a user wants to create a guest that contains
several vNUMA nodes and also wants to set distance among those nodes,
the QEMU command would like:

```
-numa node,nodeid=0,cpus=0 \
-numa node,nodeid=1,cpus=1 \
-numa node,nodeid=2,cpus=2 \
-numa node,nodeid=3,cpus=3 \
-numa dist,src=0,dst=1,val=21 \
-numa dist,src=0,dst=2,val=31 \
-numa dist,src=0,dst=3,val=41 \
-numa dist,src=1,dst=2,val=21 \
-numa dist,src=1,dst=3,val=31 \
-numa dist,src=2,dst=3,val=21 \
```

Signed-off-by: He Chen <he.chen@linux.intel.com>
Message-Id: <1493260558-20728-1-git-send-email-he.chen@linux.intel.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-05-11 16:08:37 -03:00
Laurent Vivier 55641213fc numa,spapr: align default numa node memory size to 256MB
Since commit 224245b ("spapr: Add LMB DR connectors"), NUMA node
memory size must be aligned to 256MB (SPAPR_MEMORY_BLOCK_SIZE).

But when "-numa" option is provided without "mem" parameter,
the memory is equally divided between nodes, but 8MB aligned.
This can be not valid for pseries.

In that case we can have:
$ ./ppc64-softmmu/qemu-system-ppc64 -m 4G -numa node -numa node -numa node
qemu-system-ppc64: Node 0 memory size 0x55000000 is not aligned to 256 MiB

With this patch, we have:
(qemu) info numa
3 nodes
node 0 cpus: 0
node 0 size: 1280 MB
node 1 cpus:
node 1 size: 1280 MB
node 2 cpus:
node 2 size: 1536 MB

Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-03-22 11:32:42 +11:00
Markus Armbruster d081a49af8 numa: Flatten simple union NumaOptions
Simple unions are simpler than flat unions in the schema, but more
complicated in C and on the QMP wire: there's extra indirection in C
and extra nesting on the wire, both pointless.  They're best avoided
in new code.

NumaOptions isn't new, but it's only used internally, not in QMP.
Convert it to a flat union.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <1487709988-14322-2-git-send-email-armbru@redhat.com>
2017-02-22 19:50:46 +01:00
Paolo Bonzini 0987d735a3 ramblock-notifier: new
This adds a notify interface of ram block additions and removals.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-01-16 17:52:35 +01:00
Igor Mammedov cdda2018e3 numa: make -numa parser dynamically allocate CPUs masks
so it won't impose an additional limits on max_cpus limits
supported by different targets.

It removes global MAX_CPUMASK_BITS constant and need to
bump it up whenever max_cpus is being increased for
a target above MAX_CPUMASK_BITS value.

Use runtime max_cpus value instead to allocate sufficiently
sized node_cpu bitmasks in numa parser.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <1479466974-249781-1-git-send-email-imammedo@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
[ehabkost: Added asserts to ensure cpu_index < max_cpus]
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-01-12 15:51:36 -02:00
Igor Mammedov e1ff3c67e8 monitor: fix qmp/hmp query-memdev not reporting IDs of memory backends
Considering 'id' is mandatory for user_creatable objects/backends
and user_creatable_add_type() always has it as an argument
regardless of where from it is called CLI/monitor or QMP,
Fix issue by adding 'id' property to hostmem backends and
set it in user_creatable_add_type() for every object that
implements 'id' property. Then later at query-memdev time
get 'id' from object directly.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <1484052795-158195-4-git-send-email-imammedo@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-01-12 15:35:06 -02:00
Igor Mammedov 6bea1ddf8b numa: reduce code duplication by adding helper numa_get_node_for_cpu()
Replace repeated pattern

    for (i = 0; i < nb_numa_nodes; i++) {
        if (test_bit(idx, numa_info[i].node_cpu)) {
           ...
           break;

with a helper function to lookup numa node index for cpu.

Suggested-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Shannon Zhao <shannon.zhao@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-10-10 01:16:57 +03:00
Marc-André Lureau 157e94e8a2 numa: do not leak NumaOptions
In all cases, call qapi_free_NumaOptions(), by using a common ending
block.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2016-08-07 23:59:59 +04:00
Greg Kurz 0b21757124 numa: set the memory backend "is_mapped" field
Commit 2aece63 "hostmem: detect host backend memory is being used properly"
added a way to know if a memory backend is busy or available for use. It
caused a slight regression if we pass the same backend to a NUMA node and
to a pc-dimm device:

-m 1G,slots=2,maxmem=2G \
-object memory-backend-ram,size=1G,id=mem-mem1 \
-device pc-dimm,id=dimm-mem1,memdev=mem-mem1 \
-numa node,nodeid=0,memdev=mem-mem1

Before commit 2aece63, this would cause QEMU to print an error message and
to exit gracefully:

qemu-system-ppc64: -device pc-dimm,id=dimm-mem1,memdev=mem-mem1:
    can't use already busy memdev: mem-mem1

Since commit 2aece63, QEMU hits an assertion in the memory code:

qemu-system-ppc64: memory.c:1934: memory_region_add_subregion_common:
    Assertion `!subregion->container' failed.
Aborted

This happens because pc-dimm devices don't use memory_region_is_mapped()
anymore and cannot guess the backend is already used by a NUMA node.

Let's revert to the previous behavior by turning the NUMA code to also
call host_memory_backend_set_mapped() when it uses a backend.

Fixes: 2aece63c8a
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <146891691503.15642.9817215371777203794.stgit@bahia.lan>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-08-02 12:03:58 +02:00
Eric Blake 09204eac9b opts-visitor: Favor new visit_free() function
Now that we have a polymorphic visit_free(), we no longer need
opts_visitor_cleanup(); which in turn means we no longer need
to return a subtype from opts_visitor_new() nor a public upcast
function.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1465490926-28625-6-git-send-email-eblake@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2016-07-06 10:52:04 +02:00
Eric Blake 32bafa8fdd qapi: Don't special-case simple union wrappers
Simple unions were carrying a special case that hid their 'data'
QMP member from the resulting C struct, via the hack method
QAPISchemaObjectTypeVariant.simple_union_type().  But by using
the work we started by unboxing flat union and alternate
branches, coupled with the ability to visit the members of an
implicit type, we can now expose the simple union's implicit
type in qapi-types.h:

| struct q_obj_ImageInfoSpecificQCow2_wrapper {
|     ImageInfoSpecificQCow2 *data;
| };
|
| struct q_obj_ImageInfoSpecificVmdk_wrapper {
|     ImageInfoSpecificVmdk *data;
| };
...
| struct ImageInfoSpecific {
|     ImageInfoSpecificKind type;
|     union { /* union tag is @type */
|         void *data;
|-        ImageInfoSpecificQCow2 *qcow2;
|-        ImageInfoSpecificVmdk *vmdk;
|+        q_obj_ImageInfoSpecificQCow2_wrapper qcow2;
|+        q_obj_ImageInfoSpecificVmdk_wrapper vmdk;
|     } u;
| };

Doing this removes asymmetry between QAPI's QMP side and its
C side (both sides now expose 'data'), and means that the
treatment of a simple union as sugar for a flat union is now
equivalent in both languages (previously the two approaches used
a different layer of dereferencing, where the simple union could
be converted to a flat union with equivalent C layout but
different {} on the wire, or to an equivalent QMP wire form
but with different C representation).  Using the implicit type
also lets us get rid of the simple_union_type() hack.

Of course, now all clients of simple unions have to adjust from
using su->u.member to using su->u.member.data; while this touches
a number of files in the tree, some earlier cleanup patches
helped minimize the change to the initialization of a temporary
variable rather than every single member access.  The generated
qapi-visit.c code is also affected by the layout change:

|@@ -7393,10 +7393,10 @@ void visit_type_ImageInfoSpecific_member
|     }
|     switch (obj->type) {
|     case IMAGE_INFO_SPECIFIC_KIND_QCOW2:
|-        visit_type_ImageInfoSpecificQCow2(v, "data", &obj->u.qcow2, &err);
|+        visit_type_q_obj_ImageInfoSpecificQCow2_wrapper_members(v, &obj->u.qcow2, &err);
|         break;
|     case IMAGE_INFO_SPECIFIC_KIND_VMDK:
|-        visit_type_ImageInfoSpecificVmdk(v, "data", &obj->u.vmdk, &err);
|+        visit_type_q_obj_ImageInfoSpecificVmdk_wrapper_members(v, &obj->u.vmdk, &err);
|         break;
|     default:
|         abort();

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1458254921-17042-13-git-send-email-eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2016-03-18 10:29:26 +01:00
Eric Blake 96a1616c85 qapi-dealloc: Reduce use outside of generated code
No need to roll our own use of the dealloc visitors when we can
just directly use the qapi_free_FOO() functions that do what we
want in one line.

In net.c, inline net_visit() into its remaining lone caller.

After this patch, test-visitor-serialization.c is the only
non-generated file that needs to use a dealloc visitor, because
it is testing low level aspects of the visitor interface.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1456262075-3311-2-git-send-email-eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2016-03-04 17:16:32 +01:00
Eric Blake 51e72bc1dd qapi: Swap visit_* arguments for consistent 'name' placement
JSON uses "name":value, but many of our visitor interfaces were
called with visit_type_FOO(v, &value, name, errp).  This can be
a bit confusing to have to mentally swap the parameter order to
match JSON order.  It's particularly bad for visit_start_struct(),
where the 'name' parameter is smack in the middle of the
otherwise-related group of 'obj, kind, size' parameters! It's
time to do a global swap of the parameter ordering, so that the
'name' parameter is always immediately after the Visitor argument.

Additional reason in favor of the swap: the existing include/qjson.h
prefers listing 'name' first in json_prop_*(), and I have plans to
unify that file with the qapi visitors; listing 'name' first in
qapi will minimize churn to the (admittedly few) qjson.h clients.

Later patches will then fix docs, object.h, visitor-impl.h, and
those clients to match.

Done by first patching scripts/qapi*.py by hand to make generated
files do what I want, then by running the following Coccinelle
script to affect the rest of the code base:
 $ spatch --sp-file script `git grep -l '\bvisit_' -- '**/*.[ch]'`
I then had to apply some touchups (Coccinelle insisted on TAB
indentation in visitor.h, and botched the signature of
visit_type_enum() by rewriting 'const char *const strings[]' to
the syntactically invalid 'const char*const[] strings').  The
movement of parameters is sufficient to provoke compiler errors
if any callers were missed.

    // Part 1: Swap declaration order
    @@
    type TV, TErr, TObj, T1, T2;
    identifier OBJ, ARG1, ARG2;
    @@
     void visit_start_struct
    -(TV v, TObj OBJ, T1 ARG1, const char *name, T2 ARG2, TErr errp)
    +(TV v, const char *name, TObj OBJ, T1 ARG1, T2 ARG2, TErr errp)
     { ... }

    @@
    type bool, TV, T1;
    identifier ARG1;
    @@
     bool visit_optional
    -(TV v, T1 ARG1, const char *name)
    +(TV v, const char *name, T1 ARG1)
     { ... }

    @@
    type TV, TErr, TObj, T1;
    identifier OBJ, ARG1;
    @@
     void visit_get_next_type
    -(TV v, TObj OBJ, T1 ARG1, const char *name, TErr errp)
    +(TV v, const char *name, TObj OBJ, T1 ARG1, TErr errp)
     { ... }

    @@
    type TV, TErr, TObj, T1, T2;
    identifier OBJ, ARG1, ARG2;
    @@
     void visit_type_enum
    -(TV v, TObj OBJ, T1 ARG1, T2 ARG2, const char *name, TErr errp)
    +(TV v, const char *name, TObj OBJ, T1 ARG1, T2 ARG2, TErr errp)
     { ... }

    @@
    type TV, TErr, TObj;
    identifier OBJ;
    identifier VISIT_TYPE =~ "^visit_type_";
    @@
     void VISIT_TYPE
    -(TV v, TObj OBJ, const char *name, TErr errp)
    +(TV v, const char *name, TObj OBJ, TErr errp)
     { ... }

    // Part 2: swap caller order
    @@
    expression V, NAME, OBJ, ARG1, ARG2, ERR;
    identifier VISIT_TYPE =~ "^visit_type_";
    @@
    (
    -visit_start_struct(V, OBJ, ARG1, NAME, ARG2, ERR)
    +visit_start_struct(V, NAME, OBJ, ARG1, ARG2, ERR)
    |
    -visit_optional(V, ARG1, NAME)
    +visit_optional(V, NAME, ARG1)
    |
    -visit_get_next_type(V, OBJ, ARG1, NAME, ERR)
    +visit_get_next_type(V, NAME, OBJ, ARG1, ERR)
    |
    -visit_type_enum(V, OBJ, ARG1, ARG2, NAME, ERR)
    +visit_type_enum(V, NAME, OBJ, ARG1, ARG2, ERR)
    |
    -VISIT_TYPE(V, OBJ, NAME, ERR)
    +VISIT_TYPE(V, NAME, OBJ, ERR)
    )

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <1454075341-13658-19-git-send-email-eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2016-02-08 17:29:56 +01:00
Peter Maydell d38ea87ac5 all: Clean up includes
Clean up includes so that osdep.h is included first and headers
which it implies are not included manually.

This commit was created with scripts/clean-includes.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 1454089805-5470-16-git-send-email-peter.maydell@linaro.org
2016-02-04 17:41:30 +00:00
Luiz Capitulino fae947b096 memory: exit when hugepage allocation fails if mem-prealloc
When -mem-prealloc is passed on the command-line, the expected
behavior is to exit if the hugepage allocation fails.  However,
this behavior is broken since commit cc57501dee which made
hugepage allocation fall back to regular ram in case of faliure.

This commit restores the expected behavior for -mem-prealloc.

Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Message-Id: <20160122091501.75bbd42a@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-01-26 15:58:14 +01:00
Markus Armbruster 007b06578a Use error_fatal to simplify obvious fatal errors
Done with this Coccinelle semantic patch:

    @@
    type T;
    identifier FUN, RET;
    expression list ARGS;
    expression ERR, EC;
    @@
    (
    -    T RET = FUN(ARGS, &ERR);
    +    T RET = FUN(ARGS, &error_fatal);
    |
    -    RET = FUN(ARGS, &ERR);
    +    RET = FUN(ARGS, &error_fatal);
    |
    -    FUN(ARGS, &ERR);
    +    FUN(ARGS, &error_fatal);
    )
    -    if (ERR != NULL) {
    -        error_report_err(ERR);
    -        exit(EC);
    -    }

This is actually a more elegant version of my initial semantic patch
by courtesy of Eduardo.

It leaves dead Error * variables behind, cleaned up manually.

Cc: qemu-arm@nongnu.org
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
2016-01-13 11:58:58 +01:00
Markus Armbruster 2f6f826e03 numa: Clean up query-memdev error handling
qmp_query_memdev() has two error paths:

* When object_get_objects_root() returns null.  It never does, so
  simply drop the useless error handling.

* When query_memdev() fails.  It leaks err then.  But any failure
  there is actually a programming error.  Switch it to &error_abort,
  and drop the useless error handling.

Messed up in commit 76b5d85 "qmp: add query-memdev".

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2015-12-18 15:50:24 -02:00
Eric Blake 1fd5d4fea4 memory: Convert to new qapi union layout
We have two issues with our qapi union layout:
1) Even though the QMP wire format spells the tag 'type', the
C code spells it 'kind', requiring some hacks in the generator.
2) The C struct uses an anonymous union, which places all tag
values in the same namespace as all non-variant members. This
leads to spurious collisions if a tag value matches a non-variant
member's name.

Make the conversion to the new layout for memory-related code.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1445898903-12082-21-git-send-email-eblake@redhat.com>
[Commit message tweaked slightly]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2015-11-02 08:30:28 +01:00
Markus Armbruster f8ed85ac99 Fix bad error handling after memory_region_init_ram()
Symptom:

    $ qemu-system-x86_64 -m 10000000
    Unexpected error in ram_block_add() at /work/armbru/qemu/exec.c:1456:
    upstream-qemu: cannot set up guest memory 'pc.ram': Cannot allocate memory
    Aborted (core dumped)

Root cause: commit ef701d7 screwed up handling of out-of-memory
conditions.  Before the commit, we report the error and exit(1), in
one place, ram_block_add().  The commit lifts the error handling up
the call chain some, to three places.  Fine.  Except it uses
&error_abort in these places, changing the behavior from exit(1) to
abort(), and thus undoing the work of commit 3922825 "exec: Don't
abort when we can't allocate guest memory".

The three places are:

* memory_region_init_ram()

  Commit 4994653 (right after commit ef701d7) lifted the error
  handling further, through memory_region_init_ram(), multiplying the
  incorrect use of &error_abort.  Later on, imitation of existing
  (bad) code may have created more.

* memory_region_init_ram_ptr()

  The &error_abort is still there.

* memory_region_init_rom_device()

  Doesn't need fixing, because commit 33e0eb5 (soon after commit
  ef701d7) lifted the error handling further, and in the process
  changed it from &error_abort to passing it up the call chain.
  Correct, because the callers are realize() methods.

Fix the error handling after memory_region_init_ram() with a
Coccinelle semantic patch:

    @r@
    expression mr, owner, name, size, err;
    position p;
    @@
            memory_region_init_ram(mr, owner, name, size,
    (
    -                              &error_abort
    +                              &error_fatal
    |
                                   err@p
    )
                                  );
    @script:python@
        p << r.p;
    @@
    print "%s:%s:%s" % (p[0].file, p[0].line, p[0].column)

When the last argument is &error_abort, it gets replaced by
&error_fatal.  This is the fix.

If the last argument is anything else, its position is reported.  This
lets us check the fix is complete.  Four positions get reported:

* ram_backend_memory_alloc()

  Error is passed up the call chain, ultimately through
  user_creatable_complete().  As far as I can tell, it's callers all
  handle the error sanely.

* fsl_imx25_realize(), fsl_imx31_realize(), dp8393x_realize()

  DeviceClass.realize() methods, errors handled sanely further up the
  call chain.

We're good.  Test case again behaves:

    $ qemu-system-x86_64 -m 10000000
    qemu-system-x86_64: cannot set up guest memory 'pc.ram': Cannot allocate memory
    [Exit 1 ]

The next commits will repair the rest of commit ef701d7's damage.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <1441983105-26376-3-git-send-email-armbru@redhat.com>
Reviewed-by: Peter Crosthwaite <crosthwaite.peter@gmail.com>
2015-09-18 14:39:29 +02:00
Daniel P. Berrange a8f15a2775 maint: remove double semicolons in many files
A number of source files have statements accidentally
terminated by a double semicolon - eg 'foo = bar;;'.
This is harmless but a mistake none the less.

The tcg/ia64/tcg-target.c file is whitelisted because
it has valid use of ';;' in a comment containing assembly
code.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2015-09-11 10:21:38 +03:00
Bharata B Rao 672558d2ea numa: Fix memory leak in numa_set_mem_node_id()
Fix a memory leak in numa_set_mem_node_id().

Signed-off-by: Bharata B Rao <bharata@linux.vnet.com>
Reported-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2015-07-15 16:57:50 -03:00
Bharata B Rao e75e2a14d5 numa: API to lookup NUMA node by address
Introduce an API numa_get_node(ram_addr_t addr, Error **errp) that
returns the NUMA node to which the given address belongs to. This
API works uniformly for both boot time as well as hotplugged memory.

This API is needed by sPAPR PowerPC to support
ibm,dynamic-reconfiguration-memory device tree node which is needed for
memory hotplug.

Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Tested-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2015-07-03 17:47:58 -03:00
Bharata B Rao abafabd8c9 numa: Store boot memory address range in node_info
Store memory address range information of boot memory  in address
range list of numa_info.

This helps to have a common NUMA node lookup by address function that
works for both boot-time memory and hotplugged memory.

Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Tested-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2015-07-03 17:47:58 -03:00
Bharata B Rao fa9ea81d15 numa,pc-dimm: Store pc-dimm memory information in numa_info
Start storing the (start_addr, end_addr) of the pc-dimm memory
in corresponding numa_info[node] so that this information can be used
to lookup node by address.

Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Tested-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2015-07-03 17:47:58 -03:00
Markus Armbruster cc7a8ea740 Include qapi/qmp/qerror.h exactly where needed
In particular, don't include it into headers.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Luiz Capitulino <lcapitulino@redhat.com>
2015-06-22 18:20:41 +02:00
Daniel P. Berrange a3590dacce qom: Don't pass string table to object_get_enum() function
Now that properties can be explicitly registered as an enum
type, there is no need to pass the string table to the
object_get_enum() function. The object property registration
already has a pointer to the string table.

In changing this method signature, the hostmem backend object
has to be converted to use the new enum property registration
code, which simplifies it somewhat.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Andreas Färber <afaerber@suse.de>
2015-06-19 18:42:48 +02:00
Daniel P. Berrange bc2256c4ae qom: Add helper function for getting user objects root
Add object_get_objects_root() function which is a convenience for
obtaining the Object * located at /objects in the object
composition tree. Convert existing code over to use the new
API where appropriate.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Andreas Färber <afaerber@suse.de>
2015-06-19 18:40:00 +02:00
Markus Armbruster 28d0de7a4f QemuOpts: Convert qemu_opts_foreach() to Error
Retain the function value for now, to permit selective conversion of
its callers.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Acked-by: Kevin Wolf <kwolf@redhat.com>
2015-06-09 07:37:37 +02:00
Markus Armbruster a4c7367f7d QemuOpts: Drop qemu_opts_foreach() parameter abort_on_failure
When the argument is non-zero, qemu_opts_foreach() stops on callback
returning non-zero, and returns that value.

When the argument is zero, it doesn't stop, and returns the bit-wise
inclusive or of all the return values.  Funky :)

The callers that pass zero could just as well pass one, because their
callbacks can't return anything but zero:

* qemu_add_globals()'s callback qdev_add_one_global()

* qemu_config_write()'s callback config_write_opts()

* main()'s callbacks default_driver_check(), drive_enable_snapshot(),
  vnc_init_func()

Drop the parameter, and always stop.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Acked-by: Kevin Wolf <kwolf@redhat.com>
2015-06-08 19:33:20 +02:00
Eduardo Habkost 549fc54b8c numa: Print warning if no node is assigned to a CPU
We need all possible CPUs (including hotplug ones) to be present in the
SRAT when QEMU starts. QEMU already does that correctly today, the only
problem is that when a CPU is omitted from the NUMA configuration, it is
silently assigned to node 0.

Check if all CPUs up to max_cpus are present in the NUMA configuration
and warn about missing CPUs.

Make it just a warning, to allow management software to be updated if
necessary. In the future we may make it a fatal error instead.

Command-line examples:

* Correct, no warning:

  $ qemu-system-x86_64 -smp 2,maxcpus=4
  $ qemu-system-x86_64 -smp 2,maxcpus=4 -numa node,cpus=0-3

* Incomplete, with warnings:

  $ qemu-system-x86_64 -smp 2,maxcpus=4 -numa node,cpus=0
  qemu-system-x86_64: warning: CPU(s) not present in any NUMA nodes: 1 2 3
  qemu-system-x86_64: warning: All CPU(s) up to maxcpus should be described in NUMA config

  $ qemu-system-x86_64 -smp 2,maxcpus=4 -numa node,cpus=0-2
  qemu-system-x86_64: warning: CPU(s) not present in any NUMA nodes: 3
  qemu-system-x86_64: warning: All CPU(s) up to maxcpus should be described in NUMA config

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
---
v1 -> v2: (no changes)

v2 -> v3:
 * Use enumerate_cpus() and error_report() for error message
 * Simplify logic using bitmap_full()

v3 -> v4:
 * Clarify error message, mention that all CPUs up to
   maxcpus need to be described in NUMA config

v4 -> v5:
 * Commit log update, to make problem description clearer
2015-03-19 16:20:15 -03:00
Igor Mammedov 57924bcd87 numa: introduce machine callback for VCPU to node mapping
Current default round-robin way of distributing VCPUs among
NUMA nodes might be wrong in case on multi-core/threads
CPUs. Making guests confused wrt topology where cores from
the same socket are on different nodes.

Allow a machine to override default mapping by providing
 MachineClass::cpu_index_to_socket_id()
callback which would allow it group VCPUs from a socket
on the same NUMA node.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Andreas Färber <afaerber@suse.de>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2015-03-19 16:12:09 -03:00
Eduardo Habkost 3ef7197505 numa: Reject configuration if CPU appears on multiple nodes
Each CPU can appear in only one NUMA node on the NUMA config. Reject
configuration if a CPU appears in multiple nodes.

Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2015-03-19 16:01:22 -03:00
Eduardo Habkost 8979c945c1 numa: Reject CPU indexes > max_cpus
CPU index is always less than max_cpus, as documented at sysemu.h:

> The following shall be true for all CPUs:
>   cpu->cpu_index < max_cpus <= MAX_CPUMASK_BITS

Reject configuration which uses invalid CPU indexes.

Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2015-03-19 16:01:22 -03:00
Eduardo Habkost ed26b92290 numa: Fix off-by-one error at MAX_CPUMASK_BITS check
Fix the CPU index check to ensure we don't go beyond the size of the
node_cpu bitmap.

CPU index is always less than MAX_CPUMASK_BITS, as documented at
sysemu.h:

> The following shall be true for all CPUs:
>   cpu->cpu_index < max_cpus <= MAX_CPUMASK_BITS

Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2015-03-19 16:01:22 -03:00
Gonglei 01bbbcf41f numa: remove superfluous '\n' around error_setg
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2015-03-10 08:15:33 +03:00
Peter Maydell 2dffe5516e NUMA fixes queue
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABCAAGBQJU639qAAoJECgHk2+YTcWmIC0QAK36G9LpeDOJONxt7LNOCMes
 Oplwp11RRA5zs5YHVCzTL0NMZEf59xRs/35r1iIrt238clmFGpUdZRpdlsZdDMZL
 ThwxgrV7/aZH/+HyZVSTCh5FL+fTyl+DjVQ1cZqNUWQRQ6KO2AjSxA53gCoFEZ+L
 1Vg6A41u1UOdO3tMOosuDcZtkN4fXevHTuIFFzfzW4WhJuctU2P/VPsHKxESKxYH
 1Kz0S9CCr6meIqZ8DE9y5qhCUdXBj9VYbEY2JZUYVaeB8q+smY7Lz1cV7MABrD3T
 EoJa3/53b4DFl4wNkLJsuqQNFHiOrQadgmbhocgEYp3u96/7cKa+pZ8mTKgvZ14D
 kxtpcL1TeiyA3b7UfITPhqLY3mKvN7UIKh/vo0haAxj5B3H6QhemkO76oA1/zkJR
 zI3mm6trnxurZm1gBtkZwPbKoDi7OuS7lYxW3GT7K5nJBMCEAbFGXFzmoHomVDVZ
 phYGwihzUg/LWMwaZRT0bbUd2GOVAXGY7Zy9Kg0p8nvPtcEYDwp0spQDfyrnwEWu
 Bohi80x97ayRXrvY1mP5mdYg1Jj8jR4qPdwnoW6M2UHXvtdQsjfFoSj4/xbVMCb9
 l5y1vH+DHG/ZNQB0URjBoXmttWkZjpbiAbo0daPFDaZ1QRfdXx6JgdkTKtA8ozQ9
 Xvh4hVGPeLAPvRjzaesa
 =0V2i
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/ehabkost/tags/numa-pull-request' into staging

NUMA fixes queue

# gpg: Signature made Mon Feb 23 19:28:42 2015 GMT using RSA key ID 984DC5A6
# gpg: Can't check signature: public key not found

* remotes/ehabkost/tags/numa-pull-request:
  numa: Rename set_numa_modes() to numa_post_machine_init()
  numa: Rename option parsing functions
  numa: Move QemuOpts parsing to set_numa_nodes()
  numa: Make max_numa_nodeid static
  numa: Move NUMA globals to numa.c
  vl.c: Remove unnecessary zero-initialization of NUMA globals
  numa: Move NUMA declarations from sysemu.h to numa.h

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-03-02 12:13:45 +00:00
Eduardo Habkost dde1111678 numa: Rename set_numa_modes() to numa_post_machine_init()
This function does some initialization that needs to be done after
machine init. The function may be eventually removed if we move the
CPUState.numa_node initialization to the CPU init code, but while the
function exists, lets give it a name that makes sense.

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2015-02-23 15:39:27 -03:00
Eduardo Habkost 1c1e673278 numa: Rename option parsing functions
Renaming set_numa_nodes() and numa_init_func() to parse_numa_opts() and
parse_numa() makes the purpose of those functions clearer.

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2015-02-23 15:39:27 -03:00
Eduardo Habkost 7dcd1d70fe numa: Move QemuOpts parsing to set_numa_nodes()
This allows us to make numa_init_func() static.

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2015-02-23 15:39:27 -03:00
Eduardo Habkost 25712ffe84 numa: Make max_numa_nodeid static
Now the only code that uses the variable is inside numa.c.

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2015-02-23 15:39:27 -03:00