QEMU initializes preallocated backend memory as the objects are parsed from
the command line. This is not optimal in some cases (e.g. memory spanning
multiple NUMA nodes) because the memory objects are initialized in series.
Allow the initialization to occur in parallel (asynchronously). In order to
ensure optimal thread placement, asynchronous initialization requires prealloc
context threads to be in use.
Signed-off-by: Mark Kanda <mark.kanda@oracle.com>
Message-ID: <20240131165327.3154970-2-mark.kanda@oracle.com>
Tested-by: Mario Casquero <mcasquer@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
While re-indenting code in host_memory_backend_memory_complete(),
we triggered various "Block comments use a leading /* on a separate
line" warnings from checkpatch.pl. Correct the comments style.
Fixes: e199f7ad4d ("backends: Simplify host_memory_backend_memory_complete()")
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Since qemu_prealloc_mem() returns whether or not an error
occured, we don't need to check the @errp pointer. Remove
local_err uses when we can return directly.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Message-Id: <20231120213301.24349-20-philmd@linaro.org>
Reduce the &local_err variable use and remove the 'out:' label.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Message-Id: <20231120213301.24349-18-philmd@linaro.org>
Following the example documented since commit e3fe3988d7 ("error:
Document Error API usage rules"), have HostMemoryBackendClass::alloc
return a boolean indicating whether an error is set or not.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Message-Id: <20231120213301.24349-17-philmd@linaro.org>
Return early if bc->alloc is NULL. De-indent the if() ladder.
Note, this avoids a pointless call to error_propagate() with
errp=NULL at the 'out:' label.
Change trivial when reviewed with 'git-diff --ignore-all-space'.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Message-Id: <20231120213301.24349-16-philmd@linaro.org>
If a memory-backend is configured with mode
HOST_MEM_POLICY_PREFERRED then
host_memory_backend_memory_complete() calls mbind() as:
mbind(..., MPOL_PREFERRED, nodemask, ...);
Here, 'nodemask' is a bitmap of host NUMA nodes and corresponds
to the .host-nodes attribute. Therefore, there can be multiple
nodes specified. However, the documentation to MPOL_PREFERRED
says:
MPOL_PREFERRED
This mode sets the preferred node for allocation. ...
If nodemask specifies more than one node ID, the first node
in the mask will be selected as the preferred node.
Therefore, only the first node is honored and the rest is
silently ignored. Well, with recent changes to the kernel and
numactl we can do better.
The Linux kernel added in v5.15 via commit cfcaa66f8032
("mm/hugetlb: add support for mempolicy MPOL_PREFERRED_MANY")
support for MPOL_PREFERRED_MANY, which accepts multiple preferred
NUMA nodes instead.
Then, numa_has_preferred_many() API was introduced to numactl
(v2.0.15~26) allowing applications to query kernel support.
Wiring this all together, we can pass MPOL_PREFERRED_MANY to the
mbind() call instead and stop ignoring multiple nodes, silently.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Message-Id: <a0b4adce1af5bd2344c2218eb4a04b3ff7bcfdb4.1671097918.git.mprivozn@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Let's allow for specifying a thread context via the "prealloc-context"
property. When set, preallcoation threads will be crated via the
thread context -- inheriting the same CPU affinity as the thread
context.
Pinning preallcoation threads to CPUs can heavily increase performance
in NUMA setups, because, preallocation from a CPU close to the target
NUMA node(s) is faster then preallocation from a CPU further remote,
simply because of memory bandwidth for initializing memory with zeroes.
This is especially relevant for very large VMs backed by huge/gigantic
pages, whereby preallocation is mandatory.
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Message-Id: <20221014134720.168738-7-david@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
... and implement it under POSIX. When a ThreadContext is provided,
create new threads via the context such that these new threads obtain a
properly configured CPU affinity.
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Message-Id: <20221014134720.168738-6-david@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Let's
* give the function a "qemu_*" style name
* make sure the parameters in the implementation match the prototype
* rename smp_cpus to max_threads, which makes the semantics of that
parameter clearer
... and add a function documentation.
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Message-Id: <20221014134720.168738-2-david@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
It is currently not possible yet to use "memory-backend-memfd" on s390x
with hugepages enabled. This problem is caused by qemu_maxrampagesize()
not taking memory-backend-memfd objects into account yet, so the code
in s390_memory_init() fails to enable the huge page support there via
s390_set_max_pagesize(). Fix it by generalizing the code, so that it
looks at qemu_ram_pagesize(memdev->mr.ram_block) instead of re-trying
to get the information from the filesystem.
Suggested-by: David Hildenbrand <david@redhat.com>
Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2116496
Message-Id: <20220810125720.3849835-2-thuth@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Prior to the introduction of the prealloc-threads property, the amount
of threads used to preallocate memory was derived from the value of
smp-cpus passed to qemu, the amount of physical cpus of the host
and a hardcoded maximum value. When the prealloc-threads property
was introduced, it included a default of 1 in backends/hostmem.c and
a default of smp-cpus using the sugar API for the property itself. The
latter default is not used when the property is not specified on qemu's
command line, so guests that were not adjusted for this change suddenly
started to use the default of 1 thread to preallocate memory, which
resulted in observable slowdowns in guest boots for guests with large
memory (e.g. when using libvirt <8.2.0 or managing guests manually).
This commit restores the original behavior for these cases while not
impacting guests started with the prealloc-threads property in any way.
Fixes: 220c1fd864e9d ("hostmem: introduce "prealloc-threads" property")
Signed-off-by: Jaroslav Jindrak <dzejrou@gmail.com>
Message-Id: <20220517123858.7933-1-dzejrou@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Replace the global variables with inlined helper functions. getpagesize() is very
likely annotated with a "const" function attribute (at least with glibc), and thus
optimization should apply even better.
This avoids the need for a constructor initialization too.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20220323155743.1585078-12-marcandre.lureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The function qemu_madvise() and the QEMU_MADV_* constants associated
with it are used in only 10 files. Move them out of osdep.h to a new
qemu/madvise.h header that is included where it is needed.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20220208200856.3558249-2-peter.maydell@linaro.org
Let's provide a way to control the use of RAM_NORESERVE via memory
backends using the "reserve" property which defaults to true (old
behavior).
Only Linux currently supports clearing the flag (and support is checked at
runtime, depending on the setting of "/proc/sys/vm/overcommit_memory").
Windows and other POSIX systems will bail out with "reserve=false".
The target use case is virtio-mem, which dynamically exposes memory
inside a large, sparse memory area to the VM. This essentially allows
avoiding to set "/proc/sys/vm/overcommit_memory == 0") when using
virtio-mem and also supporting hugetlbfs in the future.
As really only Linux implements RAM_NORESERVE right now, let's expose
the property only with CONFIG_LINUX. Setting the property to "false"
will then only fail in corner cases -- for example on very old kernels
or when memory overcommit was completely disabled by the admin.
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Acked-by: Eduardo Habkost <ehabkost@redhat.com> for memory backend and machine core
Cc: Markus Armbruster <armbru@redhat.com>
Cc: Eric Blake <eblake@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20210510114328.21835-11-david@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Stop including sysemu/sysemu.h in files that don't need it.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20210416171314.2074665-2-thuth@redhat.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Add documentation for '-machine memory-backend' CLI option and
how to use it.
And document that x-use-canonical-path-for-ramblock-id,
is considered to be stable to make sure it won't go away by accident.
x- was intended for unstable/iternal properties, and not supposed to
be stable option. However it's too late to rename (drop x-)
it as it would mean that users will have to mantain both
x-use-canonical-path-for-ramblock-id (for QEMU 5.0-5.2) versions
and prefix-less for later versions.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20210121161504.1007247-1-imammedo@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The easiest spots to use QAPI_LIST_APPEND are where we already have an
obvious pointer to the tail of a list. While at it, consistently use
the variable name 'tail' for that purpose.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20210113221013.390592-5-eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
In host_memory_backend_get_host_nodes, we build host_nodes
list and output it to v (a StringOutputVisitor) but forget
to free the list. This fixes the memory leak.
The memory leak stack:
Direct leak of 32 byte(s) in 2 object(s) allocated from:
#0 0xfffda30b3393 in __interceptor_calloc (/usr/lib64/libasan.so.4+0xd3393)
#1 0xfffda1d28b9b in g_malloc0 (/usr/lib64/libglib-2.0.so.0+0x58b9b)
#2 0xaaab05ca6e43 in host_memory_backend_get_host_nodes backends/hostmem.c:94
#3 0xaaab061ddf83 in object_property_get_uint16List qom/object.c:1478
#4 0xaaab05866513 in query_memdev hw/core/machine-qmp-cmds.c:312
#5 0xaaab061d980b in do_object_child_foreach qom/object.c:1001
#6 0xaaab0586779b in qmp_query_memdev hw/core/machine-qmp-cmds.c:328
#7 0xaaab0615ed3f in qmp_marshal_query_memdev qapi/qapi-commands-machine.c:327
#8 0xaaab0632d647 in do_qmp_dispatch qapi/qmp-dispatch.c:147
#9 0xaaab0632d647 in qmp_dispatch qapi/qmp-dispatch.c:190
#10 0xaaab0610f74b in monitor_qmp_dispatch monitor/qmp.c:120
#11 0xaaab0611074b in monitor_qmp_bh_dispatcher monitor/qmp.c:209
#12 0xaaab063caefb in aio_bh_poll util/async.c:117
#13 0xaaab063d30fb in aio_dispatch util/aio-posix.c:459
#14 0xaaab063cac8f in aio_ctx_dispatch util/async.c:268
#15 0xfffda1d22a6b in g_main_context_dispatch (/usr/lib64/libglib-2.0.so.0+0x52a6b)
#16 0xaaab063d0e97 in glib_pollfds_poll util/main-loop.c:218
#17 0xaaab063d0e97 in os_host_main_loop_wait util/main-loop.c:241
#18 0xaaab063d0e97 in main_loop_wait util/main-loop.c:517
#19 0xaaab05c8bfa7 in main_loop /root/rpmbuild/BUILD/qemu-4.1.0/vl.c:1791
#20 0xaaab05713bc3 in main /root/rpmbuild/BUILD/qemu-4.1.0/vl.c:4473
#21 0xfffda0a83ebf in __libc_start_main (/usr/lib64/libc.so.6+0x23ebf)
#22 0xaaab0571ed5f (aarch64-softmmu/qemu-system-aarch64+0x88ed5f)
SUMMARY: AddressSanitizer: 32 byte(s) leaked in 2 allocation(s).
Fixes: 4cf1b76bf1 (hostmem: add properties for NUMA memory policy)
Reported-by: Euler Robot <euler.robot@huawei.com>
Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com>
Tested-by: Chen Qun <kuhn.chenqun@huawei.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20201210075226.20196-1-zhukeqian1@huawei.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
object_get_canonical_path_component() returns a malloced copy of a
property name on success, null on failure.
19 of its 25 callers immediately free the returned copy.
Change object_get_canonical_path_component() to return the property
name directly. Since modifying the name would be wrong, adjust the
return type to const char *.
Drop the free from the 19 callers become simpler, add the g_strdup()
to the other six.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20200714160202.3121879-4-armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Li Qiang <liq3ea@gmail.com>
When all we do with an Error we receive into a local variable is
propagating to somewhere else, we can just as well receive it there
right away. Convert
if (!foo(..., &err)) {
...
error_propagate(errp, err);
...
return ...
}
to
if (!foo(..., errp)) {
...
...
return ...
}
where nothing else needs @err. Coccinelle script:
@rule1 forall@
identifier fun, err, errp, lbl;
expression list args, args2;
binary operator op;
constant c1, c2;
symbol false;
@@
if (
(
- fun(args, &err, args2)
+ fun(args, errp, args2)
|
- !fun(args, &err, args2)
+ !fun(args, errp, args2)
|
- fun(args, &err, args2) op c1
+ fun(args, errp, args2) op c1
)
)
{
... when != err
when != lbl:
when strict
- error_propagate(errp, err);
... when != err
(
return;
|
return c2;
|
return false;
)
}
@rule2 forall@
identifier fun, err, errp, lbl;
expression list args, args2;
expression var;
binary operator op;
constant c1, c2;
symbol false;
@@
- var = fun(args, &err, args2);
+ var = fun(args, errp, args2);
... when != err
if (
(
var
|
!var
|
var op c1
)
)
{
... when != err
when != lbl:
when strict
- error_propagate(errp, err);
... when != err
(
return;
|
return c2;
|
return false;
|
return var;
)
}
@depends on rule1 || rule2@
identifier err;
@@
- Error *err = NULL;
... when != err
Not exactly elegant, I'm afraid.
The "when != lbl:" is necessary to avoid transforming
if (fun(args, &err)) {
goto out
}
...
out:
error_propagate(errp, err);
even though other paths to label out still need the error_propagate().
For an actual example, see sclp_realize().
Without the "when strict", Coccinelle transforms vfio_msix_setup(),
incorrectly. I don't know what exactly "when strict" does, only that
it helps here.
The match of return is narrower than what I want, but I can't figure
out how to express "return where the operand doesn't use @err". For
an example where it's too narrow, see vfio_intx_enable().
Silently fails to convert hw/arm/armsse.c, because Coccinelle gets
confused by ARMSSE being used both as typedef and function-like macro
there. Converted manually.
Line breaks tidied up manually. One nested declaration of @local_err
deleted manually. Preexisting unwanted blank line dropped in
hw/riscv/sifive_e.c.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200707160613.848843-35-armbru@redhat.com>
Replace
error_setg(&err, ...);
error_propagate(errp, err);
by
error_setg(errp, ...);
Related pattern:
if (...) {
error_setg(&err, ...);
goto out;
}
...
out:
error_propagate(errp, err);
return;
When all paths to label out are that way, replace by
if (...) {
error_setg(errp, ...);
return;
}
and delete the label along with the error_propagate().
When we have at most one other path that actually needs to propagate,
and maybe one at the end that where propagation is unnecessary, e.g.
foo(..., &err);
if (err) {
goto out;
}
...
bar(..., &err);
out:
error_propagate(errp, err);
return;
move the error_propagate() to where it's needed, like
if (...) {
foo(..., &err);
error_propagate(errp, err);
return;
}
...
bar(..., errp);
return;
and transform the error_setg() as above.
In some places, the transformation results in obviously unnecessary
error_propagate(). The next few commits will eliminate them.
Bonus: the elimination of gotos will make later patches in this series
easier to review.
Candidates for conversion tracked down with this Coccinelle script:
@@
identifier err, errp;
expression list args;
@@
- error_setg(&err, args);
+ error_setg(errp, args);
... when != err
error_propagate(errp, err);
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200707160613.848843-34-armbru@redhat.com>
The previous commit enables conversion of
visit_foo(..., &err);
if (err) {
...
}
to
if (!visit_foo(..., errp)) {
...
}
for visitor functions that now return true / false on success / error.
Coccinelle script:
@@
identifier fun =~ "check_list|input_type_enum|lv_start_struct|lv_type_bool|lv_type_int64|lv_type_str|lv_type_uint64|output_type_enum|parse_type_bool|parse_type_int64|parse_type_null|parse_type_number|parse_type_size|parse_type_str|parse_type_uint64|print_type_bool|print_type_int64|print_type_null|print_type_number|print_type_size|print_type_str|print_type_uint64|qapi_clone_start_alternate|qapi_clone_start_list|qapi_clone_start_struct|qapi_clone_type_bool|qapi_clone_type_int64|qapi_clone_type_null|qapi_clone_type_number|qapi_clone_type_str|qapi_clone_type_uint64|qapi_dealloc_start_list|qapi_dealloc_start_struct|qapi_dealloc_type_anything|qapi_dealloc_type_bool|qapi_dealloc_type_int64|qapi_dealloc_type_null|qapi_dealloc_type_number|qapi_dealloc_type_str|qapi_dealloc_type_uint64|qobject_input_check_list|qobject_input_check_struct|qobject_input_start_alternate|qobject_input_start_list|qobject_input_start_struct|qobject_input_type_any|qobject_input_type_bool|qobject_input_type_bool_keyval|qobject_input_type_int64|qobject_input_type_int64_keyval|qobject_input_type_null|qobject_input_type_number|qobject_input_type_number_keyval|qobject_input_type_size_keyval|qobject_input_type_str|qobject_input_type_str_keyval|qobject_input_type_uint64|qobject_input_type_uint64_keyval|qobject_output_start_list|qobject_output_start_struct|qobject_output_type_any|qobject_output_type_bool|qobject_output_type_int64|qobject_output_type_null|qobject_output_type_number|qobject_output_type_str|qobject_output_type_uint64|start_list|visit_check_list|visit_check_struct|visit_start_alternate|visit_start_list|visit_start_struct|visit_type_.*";
expression list args;
typedef Error;
Error *err;
@@
- fun(args, &err);
- if (err)
+ if (!fun(args, &err))
{
...
}
A few line breaks tidied up manually.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20200707160613.848843-19-armbru@redhat.com>
The only way object_property_add() can fail is when a property with
the same name already exists. Since our property names are all
hardcoded, failure is a programming error, and the appropriate way to
handle it is passing &error_abort.
Same for its variants, except for object_property_add_child(), which
additionally fails when the child already has a parent. Parentage is
also under program control, so this is a programming error, too.
We have a bit over 500 callers. Almost half of them pass
&error_abort, slightly fewer ignore errors, one test case handles
errors, and the remaining few callers pass them to their own callers.
The previous few commits demonstrated once again that ignoring
programming errors is a bad idea.
Of the few ones that pass on errors, several violate the Error API.
The Error ** argument must be NULL, &error_abort, &error_fatal, or a
pointer to a variable containing NULL. Passing an argument of the
latter kind twice without clearing it in between is wrong: if the
first call sets an error, it no longer points to NULL for the second
call. ich9_pm_add_properties(), sparc32_ledma_realize(),
sparc32_dma_realize(), xilinx_axidma_realize(), xilinx_enet_realize()
are wrong that way.
When the one appropriate choice of argument is &error_abort, letting
users pick the argument is a bad idea.
Drop parameter @errp and assert the preconditions instead.
There's one exception to "duplicate property name is a programming
error": the way object_property_add() implements the magic (and
undocumented) "automatic arrayification". Don't drop @errp there.
Instead, rename object_property_add() to object_property_try_add(),
and add the obvious wrapper object_property_add().
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20200505152926.18877-15-armbru@redhat.com>
[Two semantic rebase conflicts resolved]
object_property_set_description() and
object_class_property_set_description() fail only when property @name
is not found.
There are 85 calls of object_property_set_description() and
object_class_property_set_description(). None of them can fail:
* 84 immediately follow the creation of the property.
* The one in spapr_rng_instance_init() refers to a property created in
spapr_rng_class_init(), from spapr_rng_properties[].
Every one of them still gets to decide what to pass for @errp.
51 calls pass &error_abort, 32 calls pass NULL, one receives the error
and propagates it to &error_abort, and one propagates it to
&error_fatal. I'm actually surprised none of them violates the Error
API.
What are we gaining by letting callers handle the "property not found"
error? Use when the property is not known to exist is simpler: you
don't have to guard the call with a check. We haven't found such a
use in 5+ years. Until we do, let's make life a bit simpler and drop
the @errp parameter.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20200505152926.18877-8-armbru@redhat.com>
[One semantic rebase conflict resolved]
Since 5.0 QEMU uses hostmem backend for allocating main guest RAM.
The backend however calls mbind() which is typically NOP
in case of default policy/absent host-nodes bitmap.
However when runing in container with black-listed mbind()
syscall, QEMU fails to start with error
"cannot bind memory to host NUMA nodes: Operation not permitted"
even when user hasn't provided host-nodes to pin to explictly
(which is the case with -m option)
To fix issue, call mbind() only in case when user has provided
host-nodes explicitly (i.e. host_nodes bitmap is not empty).
That should allow to run QEMU in containers with black-listed
mbind() without memory pinning. If QEMU provided memory-pinning
is required user still has to white-list mbind() in container
configuration.
Reported-by: Manuel Hohmann <mhohmann@physnet.uni-hamburg.de>
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20200430154606.6421-1-imammedo@redhat.com>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Commit 4ebc74dbbf removed default prealloc_threads initialization
by mistake, and that makes QEMU crash with division on zero at
numpages_per_thread = numpages / memset_num_threads;
when QEMU is started with following backend
-object memory-backend-ram,id=ram-node0,prealloc=yes,size=128M
Return back initialization removed by 4ebc74dbbf to fix issue.
Fixes: 4ebc74dbbf
Reported-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Message-Id: <20200325094423.24293-2-imammedo@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
When option -mem-prealloc is used with one or more memory-backend
objects, created backends may not obey configured bind policy or
creation may fail after kernel attempts to move pages according
to bind policy.
Reason is in file_ram_alloc(), which will pre-allocate
any descriptor based RAM if global mem_prealloc != 0 and that
happens way before bind policy is applied to memory range.
One way to fix it would be to extend memory_region_foo() API
and add more invariants that could broken later due implicit
dependencies that's hard to track.
Another approach is to drop adhoc main RAM allocation and
consolidate it around memory-backend. That allows to have
single place that allocates guest RAM (main and memdev)
in the same way and then global mem_prealloc could be
replaced by backend's property[s] that will affect created
memory-backend objects but only in correct order this time.
With main RAM now converted to hostmem backends, there is no
point in keeping global mem_prealloc around, so alias
-mem-prealloc to "memory-backend.prealloc=on"
machine compat[*] property and make mem_prealloc a local
variable to only stir registration of compat property.
*) currently user accessible -global works only with DEVICE
based objects and extra work is needed to make it work
with hostmem backends. But that is convenience option
and out of scope of this already huge refactoring.
Hence machine compat properties were used.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20200219160953.13771-78-imammedo@redhat.com>
the property will allow user to specify number of threads to use
in pre-allocation stage. It also will allow to reduce implicit
hostmem dependency on current_machine.
On object creation it will default to 1, but via machine
compat property it will be updated to MachineState::smp::cpus
to keep current behavior for hostmem and main RAM (which is
now also hostmem based).
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20200219160953.13771-77-imammedo@redhat.com>
There are three page size in qemu:
real host page size
host page size
target page size
All of them have dedicate variable to represent. For the last two, we
use the same form in the whole qemu project, while for the first one we
use two forms: qemu_real_host_page_size and getpagesize().
qemu_real_host_page_size is defined to be a replacement of
getpagesize(), so let it serve the role.
[Note] Not fully tested for some arch or device.
Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
Message-Id: <20191013021145.16011-3-richardw.yang@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
In my "build everything" tree, changing sysemu/sysemu.h triggers a
recompile of some 5400 out of 6600 objects (not counting tests and
objects that don't depend on qemu/osdep.h).
hw/qdev-core.h includes sysemu/sysemu.h since recent commit e965ffa70a
"qdev: add qdev_add_vm_change_state_handler()". This is a bad idea:
hw/qdev-core.h is widely included.
Move the declaration of qdev_add_vm_change_state_handler() to
sysemu/sysemu.h, and drop the problematic include from hw/qdev-core.h.
Touching sysemu/sysemu.h now recompiles some 1800 objects.
qemu/uuid.h also drops from 5400 to 1800. A few more headers show
smaller improvement: qemu/notify.h drops from 5600 to 5200,
qemu/timer.h from 5600 to 4500, and qapi/qapi-types-run-state.h from
5500 to 5000.
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20190812052359.30071-28-armbru@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Basically, the context could get the MachineState reference via call
chains or unrecommended qdev_get_machine() in !CONFIG_USER_ONLY mode.
A local variable of the same name would be introduced in the declaration
phase out of less effort OR replace it on the spot if it's only used
once in the context. No semantic changes.
Signed-off-by: Like Xu <like.xu@linux.intel.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-Id: <20190518205428.90532-4-like.xu@linux.intel.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
QEMU will crashes with
qapi/qobject-output-visitor.c:210: qobject_output_complete: Assertion `qov->root && ((&qov->stack)->slh_first == ((void *)0))' failed
when trying to get value of not set hostmem's "host-nodes"
property, HostMemoryBackend::host_nodes bitmap doesn't have
any bits set in it, which leads to find_first_bit() returning
MAX_NODES and consequently to an early return from
host_memory_backend_get_host_nodes() without calling visitor.
Fix it by calling visitor even if "host-nodes" property wasn't
set before exiting from property getter to return valid empty
list.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20190214105733.25643-1-imammedo@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
When there are multiple memory backends in use, including the object type
and property name in the error message can help users to locate the error.
Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com>
Message-Id: <97d9193875747d8378c05b9e3b3cb39c1b7d2b4e.1546399191.git.yi.z.zhang@linux.intel.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
[ehabkost: reword commit message]
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
hostmem-file and hostmem-memfd use the whole object path for the
memory region name, and hostname-ram uses only the path component (the
object id, or canonical path basename):
qemu -m 1024 -object memory-backend-file,id=mem,size=1G,mem-path=/tmp/foo -numa node,memdev=mem -monitor stdio
(qemu) info ramblock
Block Name PSize Offset Used Total
/objects/mem 4 KiB 0x0000000000000000 0x0000000040000000 0x0000000040000000
qemu -m 1024 -object memory-backend-memfd,id=mem,size=1G -numa node,memdev=mem -monitor stdio
(qemu) info ramblock
Block Name PSize Offset Used Total
/objects/mem 4 KiB 0x0000000000000000 0x0000000040000000 0x0000000040000000
qemu -m 1024 -object memory-backend-ram,id=mem,size=1G -numa node,memdev=mem -monitor stdio
(qemu) info ramblock
Block Name PSize Offset Used Total
mem 4 KiB 0x0000000000000000 0x0000000040000000 0x0000000040000000
For consistency, change to use object id for -file and -memfd as well
with >= 4.0.
Having a consistent naming allows to migrate to different hostmem
backends.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Acked-by: Eduardo Habkost <ehabkost@redhat.com>
host_memory_backend_set_host_nodes() was not validating
host-nodes before writing to backend->host_nodes, making QEMU
write beyond the end of the bitmap.
Fix the crash and add a simple regression test for the fix.
While at it, fix memory leak of the list returned by
visit_type_uint16List().
Reported-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20181130122844.29103-1-ehabkost@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
[ehabkost: removed test case code]
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Unused, so let's remove it.
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20180619134141.29478-8-david@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The "id" property is unnecessary and can be replaced simply with
object_get_canonical_path_component. This patch mostly undoes commit
e1ff3c67e8 ("monitor: fix qmp/hmp query-memdev not reporting IDs of
memory backends", 2017-01-12).
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
There are a couple places (one generic, one target specific) where we need
to get the host page size associated with a particular memory backend. I
have some upcoming code which will add another place which wants this. So,
for convenience, add a helper function to calculate this.
host_memory_backend_pagesize() returns the host pagesize for a given
HostMemoryBackend object.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Move qapi-schema.json to qapi/, so it's next to its modules, and all
files get generated to qapi/, not just the ones generated for modules.
Consistently name the generated files qapi-MODULE.EXT:
qmp-commands.[ch] become qapi-commands.[ch], qapi-event.[ch] become
qapi-events.[ch], and qmp-introspect.[ch] become qapi-introspect.[ch].
This gets rid of the temporary hacks in scripts/qapi/commands.py,
scripts/qapi/events.py, and scripts/qapi/common.py.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20180211093607.27351-28-armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
[eblake: Fix trailing dot in tpm.c, undo temporary hack for OSX toolchain]
Signed-off-by: Eric Blake <eblake@redhat.com>
In my "build everything" tree, a change to the types in
qapi-schema.json triggers a recompile of about 4800 out of 5100
objects.
The previous commit split up qmp-commands.h, qmp-event.h, qmp-visit.h,
qapi-types.h. Each of these headers still includes all its shards.
Reduce compile time by including just the shards we actually need.
To illustrate the benefits: adding a type to qapi/migration.json now
recompiles some 2300 instead of 4800 objects. The next commit will
improve it further.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20180211093607.27351-24-armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
[eblake: rebase to master]
Signed-off-by: Eric Blake <eblake@redhat.com>
Currently only file backed memory backend can
be created with a "share" flag in order to allow
sharing guest RAM with other processes in the host.
Add the "share" flag also to RAM Memory Backend
in order to allow remapping parts of the guest RAM
to different host virtual addresses. This is needed
by the RDMA devices in order to remap non-contiguous
QEMU virtual addresses to a contiguous virtual address range.
Moved the "share" flag to the Host Memory base class,
modified phys_mem_alloc to include the new parameter
and a new interface memory_region_init_ram_shared_nomigrate.
There are no functional changes if the new flag is not used.
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Marcel Apfelbaum <marcel@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20180201111846.21846-4-armbru@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJZrTlXAAoJEDhwtADrkYZT570P/20EQATHuRMGcO9QIParHqZL
dh3yyBMEoSPNJx/dK2M1psT9vQgfcQ9+W6ouzW8nmu70bmY+NpwwlJfr9DadXE/q
C9ht8ULL6pNfJ4bfLB8SqWqG6WUuyZJEWc3x3dCVuU4921hprwyAkWFEXiw73ThN
Gay4zRooxZSR0pHsuh5wrlUl7iF7UHy/UO1yRvIUWvYMQ4PLyMzF/+o2hP4MvPNe
l/b5B2XYiioMlJOjmi2/kRABOUywt4YWSXTxRi7f9VxWmKa5B8rbeMmxaX5zH3zB
b4Ik8UdYAvW47iKilGQa1JlLqK24tgnA6monzMfQwhCdUtBTHu3rf8jJrvRai1fl
R8vitLy1JJ93GPzVUAknSNOdkNBD5QnEvqn/ZGxvKpEQIHaPURoyOD50i9/NFR0n
6d8WSv2403zZsE6c+JkmeOm0zRhoE3Bkhcg8G9QYhGJu8meCsOVmR+afoBtzrOB9
ziB7dpWyAMwCioY25vHVD2vtMT/YlZX7Ic+vuWKtagFLt6lHROnSraon/iLdXYj3
Dmeb+sdBXL8vUXBK1c5P+ifKqgF6asz918wJgc5aucPWY3xY6lEN9wCGXqJuVw5Q
B3FSIMCIAJhW8KpzmHObRJU1Enj2StkWBymWWNIu5f323v1u7hK57TXYw9HqDETj
cL599N7VasD5azXLjlxk
=eUZW
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/armbru/tags/pull-qapi-2017-09-01-v3' into staging
QAPI patches for 2017-09-01
# gpg: Signature made Mon 04 Sep 2017 12:30:31 BST
# gpg: using RSA key 0x3870B400EB918653
# gpg: Good signature from "Markus Armbruster <armbru@redhat.com>"
# gpg: aka "Markus Armbruster <armbru@pond.sub.org>"
# Primary key fingerprint: 354B C8B3 D7EB 2A6B 6867 4E5F 3870 B400 EB91 8653
* remotes/armbru/tags/pull-qapi-2017-09-01-v3: (47 commits)
qapi: drop the sentinel in enum array
qapi: Change data type of the FOO_lookup generated for enum FOO
qapi: Convert indirect uses of FOO_lookup[...] to qapi_enum_lookup()
qapi: Mechanically convert FOO_lookup[...] to FOO_str(...)
qapi: Generate FOO_str() macro for QAPI enum FOO
qapi: Avoid unnecessary use of enum lookup table's sentinel
qapi: Use qapi_enum_parse() in input_type_enum()
crypto: Use qapi_enum_parse() in qcrypto_block_luks_name_lookup()
quorum: Use qapi_enum_parse() in quorum_open()
block: Use qemu_enum_parse() in blkdebug_debug_breakpoint()
hmp: Use qapi_enum_parse() in hmp_migrate_set_parameter()
hmp: Use qapi_enum_parse() in hmp_migrate_set_capability()
tpm: Clean up model registration & lookup
tpm: Clean up driver registration & lookup
qapi: Drop superfluous qapi_enum_parse() parameter max
qapi: Update qapi-code-gen.txt examples to match current code
qapi-schema: Improve section headings
qapi-schema: Move queries from common.json to qapi-schema.json
qapi-schema: Make block-core.json self-contained
qapi-schema: Fold event.json back into qapi-schema.json
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Currently, a FOO_lookup is an array of strings terminated by a NULL
sentinel.
A future patch will generate enums with "holes". NULL-termination
will cease to work then.
To prepare for that, store the length in the FOO_lookup by wrapping it
in a struct and adding a member for the length.
The sentinel will be dropped next.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20170822132255.23945-13-marcandre.lureau@redhat.com>
[Basically redone]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <1503564371-26090-16-git-send-email-armbru@redhat.com>
[Rebased]
The errp argument is ignored by all implementations of the
method, and user_creatable_del() would break if any
implementation set an error (because it calls error_setg(errp) if
the function returns false). Remove the unused parameter.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20170829220337.23427-1-ehabkost@redhat.com>
Reviewed-by: Gonglei <arei.gonglei@huawei.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Use the new interface to boost readability.
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <1489151370-15453-3-git-send-email-peterx@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>