qemu-e2k

Author	SHA1	Message	Date
David Hildenbrand	dd4fc60585	util/oslib-posix: Fix missing unlock in the error path of os_mem_prealloc() We're missing an unlock in case installing the signal handler failed. Fortunately, we barely see this error in real life. Fixes: `a960d6642d` ("util/oslib-posix: Support concurrent os_mem_prealloc() invocation") Fixes: CID 1468941 Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Pankaj Gupta <pankaj.gupta@ionos.com> Cc: Daniel P. Berrangé <berrange@redhat.com> Cc: Michal Privoznik <mprivozn@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20220111120830.119912-1-david@redhat.com> Reviewed-by: Pankaj Gupta <pankaj.gupta@ionos.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2022-02-06 04:33:50 -05:00
David Hildenbrand	29b838c05d	util/oslib-posix: Forward SIGBUS to MCE handler under Linux Temporarily modifying the SIGBUS handler is really nasty, as we might be unlucky and receive an MCE SIGBUS while having our handler registered. Unfortunately, there is no way around messing with SIGBUS when MADV_POPULATE_WRITE is not applicable or not around. Let's forward SIGBUS that don't belong to us to the already registered handler and document the situation. Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20211217134611.31172-8-david@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2022-01-07 19:30:13 -05:00
David Hildenbrand	a960d6642d	util/oslib-posix: Support concurrent os_mem_prealloc() invocation Add a mutex to protect the SIGBUS case, as we cannot mess concurrently with the sigbus handler and we have to manage the global variable sigbus_memset_context. The MADV_POPULATE_WRITE path can run concurrently. Note that page_mutex and page_cond are shared between concurrent invocations, which shouldn't be a problem. This is a preparation for future virtio-mem prealloc code, which will call os_mem_prealloc() asynchronously from an iothread when handling guest requests. Reviewed-by: Pankaj Gupta <pankaj.gupta@ionos.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20211217134611.31172-7-david@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2022-01-07 05:19:55 -05:00
David Hildenbrand	ac86e5c37d	util/oslib-posix: Avoid creating a single thread with MADV_POPULATE_WRITE Let's simplify the case when we only want a single thread and don't have to mess with signal handlers. Reviewed-by: Pankaj Gupta <pankaj.gupta@ionos.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20211217134611.31172-6-david@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2022-01-07 05:19:55 -05:00
David Hildenbrand	89aec6411c	util/oslib-posix: Don't create too many threads with small memory or little pages Let's limit the number of threads to something sane, especially that - We don't have more threads than the number of pages we have - We don't have threads that initialize small (< 64 MiB) memory Reviewed-by: Pankaj Gupta <pankaj.gupta@ionos.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20211217134611.31172-5-david@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2022-01-07 05:19:55 -05:00
David Hildenbrand	dba506788b	util/oslib-posix: Introduce and use MemsetContext for touch_all_pages() Let's minimize the number of global variables to prepare for os_mem_prealloc() getting called concurrently and make the code a bit easier to read. The only consumer that really needs a global variable is the sigbus handler, which will require protection via a mutex in the future either way as we cannot concurrently mess with the SIGBUS handler. Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20211217134611.31172-4-david@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2022-01-07 05:19:55 -05:00
David Hildenbrand	a384bfa32e	util/oslib-posix: Support MADV_POPULATE_WRITE for os_mem_prealloc() Let's sense support and use it for preallocation. MADV_POPULATE_WRITE does not require a SIGBUS handler, doesn't actually touch page content, and avoids context switches; it is, therefore, faster and easier to handle than our current approach. While MADV_POPULATE_WRITE is, in general, faster than manual prefaulting, and especially faster with 4k pages, there is still value in prefaulting using multiple threads to speed up preallocation. More details on MADV_POPULATE_WRITE can be found in the Linux commits 4ca9b3859dac ("mm/madvise: introduce MADV_POPULATE_(READ\|WRITE) to prefault page tables") and eb2faa513c24 ("mm/madvise: report SIGBUS as -EFAULT for MADV_POPULATE_(READ\|WRITE)"), and in the man page proposal [1]. This resolves the TODO in do_touch_pages(). In the future, we might want to look into using fallocate(), eventually combined with MADV_POPULATE_READ, when dealing with shared file/fd mappings and not caring about memory bindings. [1] https://lkml.kernel.org/r/20210816081922.5155-1-david@redhat.com Reviewed-by: Pankaj Gupta <pankaj.gupta@ionos.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20211217134611.31172-3-david@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2022-01-07 05:19:55 -05:00
David Hildenbrand	6c427ab926	util/oslib-posix: Let touch_all_pages() return an error Let's prepare touch_all_pages() for returning differing errors. Return an error from the thread and report the last processed error. Translate SIGBUS to -EFAULT, as a SIGBUS can mean all different kind of things (memory error, read error, out of memory). When allocating memory fails via the current SIGBUS-based mechanism, we'll get: os_mem_prealloc: preallocating memory failed: Bad address Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20211217134611.31172-2-david@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2022-01-07 05:19:55 -05:00
David Hildenbrand	8dbe22c686	memory: Introduce RAM_NORESERVE and wire it up in qemu_ram_mmap() Let's introduce RAM_NORESERVE, allowing mmap'ing with MAP_NORESERVE. The new flag has the following semantics: " RAM is mmap-ed with MAP_NORESERVE. When set, reserving swap space (or huge pages if applicable) is skipped: will bail out if not supported. When not set, the OS will do the reservation, if supported for the memory type. " Allow passing it into: - memory_region_init_ram_nomigrate() - memory_region_init_resizeable_ram() - memory_region_init_ram_from_file() ... and teach qemu_ram_mmap() and qemu_anon_ram_alloc() about the flag. Bail out if the flag is not supported, which is the case right now for both, POSIX and win32. We will add Linux support next and allow specifying RAM_NORESERVE via memory backends. The target use case is virtio-mem, which dynamically exposes memory inside a large, sparse memory area to the VM. Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Acked-by: Eduardo Habkost <ehabkost@redhat.com> for memory backend and machine core Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20210510114328.21835-9-david@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-15 20:27:38 +02:00
David Hildenbrand	b444f5c079	util/mmap-alloc: Pass flags instead of separate bools to qemu_ram_mmap() Let's pass flags instead of bools to prepare for passing other flags and update the documentation of qemu_ram_mmap(). Introduce new QEMU_MAP_ flags that abstract the mmap() PROT_ and MAP_ flag handling and simplify it. We expose only flags that are currently supported by qemu_ram_mmap(). Maybe, we'll see qemu_mmap() in the future as well that can implement these flags. Note: We don't use MAP_ flags as some flags (e.g., MAP_SYNC) are only defined for some systems and we want to always be able to identify these flags reliably inside qemu_ram_mmap() -- for example, to properly warn when some future flags are not available or effective on a system. Also, this way we can simplify PROT_ handling as well. Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Acked-by: Eduardo Habkost <ehabkost@redhat.com> for memory backend and machine core Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20210510114328.21835-8-david@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-15 20:27:38 +02:00
Brad Smith	29c3d213f4	oslib-posix: Remove OpenBSD workaround for fcntl("/dev/null", F_SETFL, O_NONBLOCK) failure OpenBSD prior to 6.3 required a workaround to utilize fcntl(F_SETFL) on memory devices. Since modern verions of OpenBSD that are only officialy supported and buildable on do not have this issue I am garbage collecting this workaround. Signed-off-by: Brad Smith <brad@comstyle.com> Message-Id: <YGYECGXQhdamEJgC@humpty.home.comstyle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-04 13:47:07 +02:00
Jagannathan Raman	44a4ff31c0	memory: alloc RAM from file at offset Allow RAM MemoryRegion to be created from an offset in a file, instead of allocating at offset of 0 by default. This is needed to synchronize RAM between QEMU & remote process. Signed-off-by: Jagannathan Raman <jag.raman@oracle.com> Signed-off-by: John G Johnson <john.g.johnson@oracle.com> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 609996697ad8617e3b01df38accc5c208c24d74e.1611938319.git.jag.raman@oracle.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2021-02-09 20:53:56 +00:00
Stefan Hajnoczi	369d6dc4de	memory: add readonly support to memory_region_init_ram_from_file() There is currently no way to open(O_RDONLY) and mmap(PROT_READ) when creating a memory region from a file. This functionality is needed since the underlying host file may not allow writing. Add a bool readonly argument to memory_region_init_ram_from_file() and the APIs it calls. Extend memory_region_init_ram_from_file() rather than introducing a memory_region_init_rom_from_file() API so that callers can easily make a choice between read/write and read-only at runtime without calling different APIs. No new RAMBlock flag is introduced for read-only because it's unclear whether RAMBlocks need to know that they are read-only. Pass a bool readonly argument instead. Both of these design decisions can be changed in the future. It just seemed like the simplest approach to me. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Liam Merwick <liam.merwick@oracle.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Message-Id: <20210104171320.575838-2-stefanha@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2021-02-01 17:07:34 -05:00
Philippe Mathieu-Daudé	ed6f53f9ca	util/oslib: Assert qemu_try_memalign() alignment is a power of 2 qemu_try_memalign() expects a power of 2 alignment: - posix_memalign(3): The address of the allocated memory will be a multiple of alignment, which must be a power of two and a multiple of sizeof(void *). - _aligned_malloc() The alignment value, which must be an integer power of 2. Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20201021173803.2619054-3-philmd@redhat.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2021-01-07 05:09:06 -10:00
Daniele Buono	c905a3680d	cfi: Initial support for cfi-icall in QEMU LLVM/Clang, supports runtime checks for forward-edge Control-Flow Integrity (CFI). CFI on indirect function calls (cfi-icall) ensures that, in indirect function calls, the function called is of the right signature for the pointer type defined at compile time. For this check to work, the code must always respect the function signature when using function pointer, the function must be defined at compile time, and be compiled with link-time optimization. This rules out, for example, shared libraries that are dynamically loaded (given that functions are not known at compile time), and code that is dynamically generated at run-time. This patch: 1) Introduces the CONFIG_CFI flag to support cfi in QEMU 2) Introduces a decorator to allow the definition of "sensitive" functions, where a non-instrumented function may be called at runtime through a pointer. The decorator will take care of disabling cfi-icall checks on such functions, when cfi is enabled. 3) Marks functions currently in QEMU that exhibit such behavior, in particular: - The function in TCG that calls pre-compiled TBs - The function in TCI that interprets instructions - Functions in the plugin infrastructures that jump to callbacks - Functions in util that directly call a signal handler Signed-off-by: Daniele Buono <dbuono@linux.vnet.ibm.com> Acked-by: Alex Bennée <alex.bennee@linaro.org Message-Id: <20201204230615.2392-3-dbuono@linux.vnet.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-01-02 21:03:35 +01:00
Paolo Bonzini	fcb4f59c87	oslib-posix: relocate path to /var Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-09-30 19:11:36 +02:00
Paolo Bonzini	9386a4a715	oslib-posix: default exec_dir to bindir If the exec_dir cannot be retrieved, just assume it's the installation directory that was specified at configure time. This makes it simpler to reason about what the callers will do if they get back an empty path. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-09-30 19:11:36 +02:00
Paolo Bonzini	a4c13869f9	oslib: do not call g_strdup from qemu_get_exec_dir Just return the directory without requiring the caller to free it. This also removes a bogus check for NULL in os_find_datadir and module_load_one; g_strdup of a static variable cannot return NULL. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-09-30 19:11:36 +02:00
Daniel P. Berrangé	448058aa99	util: rename qemu_open() to qemu_open_old() We want to introduce a new version of qemu_open() that uses an Error object for reporting problems and make this it the preferred interface. Rename the existing method to release the namespace for the new impl. Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>	2020-09-16 10:33:48 +01:00
Alex Bennée	ad06ef0efb	util: add qemu_get_host_physmem utility function This will be used in a future patch. For POSIX systems _SC_PHYS_PAGES isn't standardised but at least appears in the man pages for Open/FreeBSD. The result is advisory so any users of it shouldn't just fail if we can't work it out. The win32 stub currently returns 0 until someone with a Windows system can develop and test a patch. Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Cc: BALATON Zoltan <balaton@eik.bme.hu> Cc: Christian Ehrhardt <christian.ehrhardt@canonical.com> Message-Id: <20200724064509.331-5-alex.bennee@linaro.org>	2020-07-27 09:40:12 +01:00
David CARLIER	8edbca515c	util: Implement qemu_get_thread_id() for OpenBSD Implement qemu_get_thread_id() for OpenBSD hosts, using getthrid(). Signed-off-by: David Carlier <devnexen@gmail.com> Reviewed-by: Brad Smith <brad@comstyle.com> Message-id: CA+XhMqxD6gQDBaj8tX0CMEj3si7qYKsM8u1km47e_-U7MC37Pg@mail.gmail.com Reviewed-by: Peter Maydell <peter.maydell@linaro.org> [PMM: tidied up commit message] Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2020-07-20 11:35:17 +01:00
Laurent Vivier	894022e616	net: check if the file descriptor is valid before using it qemu_set_nonblock() checks that the file descriptor can be used and, if not, crashes QEMU. An assert() is used for that. The use of assert() is used to detect programming error and the coredump will allow to debug the problem. But in the case of the tap device, this assert() can be triggered by a misconfiguration by the user. At startup, it's not a real problem, but it can also happen during the hot-plug of a new device, and here it's a problem because we can crash a perfectly healthy system. For instance: # ip link add link virbr0 name macvtap0 type macvtap mode bridge # ip link set macvtap0 up # TAP=/dev/tap$(ip -o link show macvtap0 \| cut -d: -f1) # qemu-system-x86_64 -machine q35 -device pcie-root-port,id=pcie-root-port-0 -monitor stdio 9<> $TAP (qemu) netdev_add type=tap,id=hostnet0,vhost=on,fd=9 (qemu) device_add driver=virtio-net-pci,netdev=hostnet0,id=net0,bus=pcie-root-port-0 (qemu) device_del net0 (qemu) netdev_del hostnet0 (qemu) netdev_add type=tap,id=hostnet1,vhost=on,fd=9 qemu-system-x86_64: .../util/oslib-posix.c:247: qemu_set_nonblock: Assertion `f != -1' failed. Aborted (core dumped) To avoid that, add a function, qemu_try_set_nonblock(), that allows to report the problem without crashing. In the same way, we also update the function for vhostfd in net_init_tap_one() and for fd in net_init_socket() (both descriptors are provided by the user and can be wrong). Signed-off-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com>	2020-07-15 21:00:13 +08:00
Michal Privoznik	e47f4765af	util: Introduce qemu_get_host_name() This function offers operating system agnostic way to fetch host name. It is implemented for both POSIX-like and Windows systems. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Cc: qemu-stable@nongnu.org Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>	2020-07-13 17:44:58 -05:00
David CARLIER	2b9b9e7010	util/oslib-posix.c: Implement qemu_init_exec_dir() for Haiku The qemu_init_exec_dir() function is inherently non-portable; provide an implementation for Haiku hosts. Signed-off-by: David Carlier <devnexen@gmail.com> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Message-id: 20200703145614.16684-9-peter.maydell@linaro.org [PMM: Expanded commit message] Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2020-07-13 14:36:10 +01:00
David CARLIER	2a4b472c3c	osdep.h: Always include <sys/signal.h> if it exists Regularize our handling of <sys/signal.h>: currently we include it in osdep.h, but only for OpenBSD, and we include it without an ifdef guard in a couple of C files. This causes problems for Haiku, which doesn't have that header. Instead, check in configure whether sys/signal.h exists, and if it does then always include it from osdep.h. Signed-off-by: David Carlier <devnexen@gmail.com> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Message-id: 20200703145614.16684-5-peter.maydell@linaro.org [PMM: Expanded commit message; rename to HAVE_SYS_SIGNAL_H] Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2020-07-13 14:36:09 +01:00
David CARLIER	2032e243a5	util/oslib-posix : qemu_init_exec_dir implementation for Mac From 3025a0ce3fdf7d3559fc35a52c659f635f5c750c Mon Sep 17 00:00:00 2001 From: David Carlier <devnexen@gmail.com> Date: Tue, 26 May 2020 21:35:27 +0100 Subject: [PATCH] util/oslib-posix : qemu_init_exec_dir implementation for Mac Using dyld API to get the full path of the current process. Signed-off-by: David Carlier <devnexen@gmail.com> Message-id: CA+XhMqxwC10XHVs4Z-JfE0-WLAU3ztDuU9QKVi31mjr59HWCxg@mail.gmail.com Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2020-06-23 11:39:46 +01:00
David Carlier	9548a89173	util/oslib: Returns the real thread identifier on FreeBSD and NetBSD getpid is good enough in a mono thread context, however thr_self/_lwp_self reflects the real current thread identifier from a given process. Signed-off-by: David Carlier <devnexen@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: David Carlier <devnexen@gmail.com>	2020-06-10 12:10:48 -04:00
Bauerchen	278fb16273	oslib-posix: take lock before qemu_cond_broadcast In touch_all_pages, if the mutex is not taken around qemu_cond_broadcast, qemu_cond_broadcast may be called before all touch page threads enter qemu_cond_wait. In this case, the touch page threads wait forever for the main thread to wake them up, causing a deadlock. Signed-off-by: Bauerchen <bauerchen@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-04-11 08:49:20 -04:00
Paolo Bonzini	78b3f67acd	oslib-posix: initialize mutex and condition variable The mutex and condition variable were never initialized, causing -mem-prealloc to abort with an assertion failure. Fixes: `037fb5eb39` Reported-by: Marc Hartmayer <mhartmay@linux.ibm.com> Cc: bauerchen <bauerchen@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-03-16 23:02:22 +01:00
bauerchen	037fb5eb39	mem-prealloc: optimize large guest startup [desc]: Large memory VM starts slowly when using -mem-prealloc, and there are some areas to optimize in current method; 1、mmap will be used to alloc threads stack during create page clearing threads, and it will attempt mm->mmap_sem for write lock, but clearing threads have hold read lock, this competition will cause threads createion very slow; 2、methods of calcuating pages for per threads is not well;if we use 64 threads to split 160 hugepage,63 threads clear 2page,1 thread clear 34 page,so the entire speed is very slow; to solve the first problem,we add a mutex in thread function,and start all threads when all threads finished createion; and the second problem, we spread remainder to other threads,in situation that 160 hugepage and 64 threads, there are 32 threads clear 3 pages,and 32 threads clear 2 pages. [test]: 320G 84c VM start time can be reduced to 10s 680G 84c VM start time can be reduced to 18s Signed-off-by: bauerchen <bauerchen@tencent.com> Reviewed-by: Pan Rui <ruippan@tencent.com> Reviewed-by: Ivan Ren <ivanren@tencent.com> [Simplify computation of the number of pages per thread. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-02-25 09:18:01 +01:00
Wei Yang	038adc2f58	core: replace getpagesize() with qemu_real_host_page_size There are three page size in qemu: real host page size host page size target page size All of them have dedicate variable to represent. For the last two, we use the same form in the whole qemu project, while for the first one we use two forms: qemu_real_host_page_size and getpagesize(). qemu_real_host_page_size is defined to be a replacement of getpagesize(), so let it serve the role. [Note] Not fully tested for some arch or device. Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Message-Id: <20191013021145.16011-3-richardw.yang@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2019-10-26 15:38:06 +02:00
Stefan Hajnoczi	72d41eb4b8	memory: fetch pmem size in get_file_size() Neither stat(2) nor lseek(2) report the size of Linux devdax pmem character device nodes. Commit `314aec4a6e` ("hostmem-file: reject invalid pmem file sizes") added code to hostmem-file.c to fetch the size from sysfs and compare against the user-provided size=NUM parameter: if (backend->size > size) { error_setg(errp, "size property %" PRIu64 " is larger than " "pmem file \"%s\" size %" PRIu64, backend->size, fb->mem_path, size); return; } It turns out that exec.c:qemu_ram_alloc_from_fd() already has an equivalent size check but it skips devdax pmem character devices because lseek(2) returns 0: if (file_size > 0 && file_size < size) { error_setg(errp, "backing store %s size 0x%" PRIx64 " does not match 'size' option 0x" RAM_ADDR_FMT, mem_path, file_size, size); return NULL; } This patch moves the devdax pmem file size code into get_file_size() so that we check the memory size in a single place: qemu_ram_alloc_from_fd(). This simplifies the code and makes it more general. This also fixes the problem that hostmem-file only checks the devdax pmem file size when the pmem=on parameter is given. An unchecked size=NUM parameter can lead to SIGBUS in QEMU so we must always fetch the file size for Linux devdax pmem character device nodes. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20190830093056.12572-1-stefanha@redhat.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2019-09-16 12:32:21 +02:00
Markus Armbruster	db72581598	Include qemu/main-loop.h less In my "build everything" tree, changing qemu/main-loop.h triggers a recompile of some 5600 out of 6600 objects (not counting tests and objects that don't depend on qemu/osdep.h). It includes block/aio.h, which in turn includes qemu/event_notifier.h, qemu/notify.h, qemu/processor.h, qemu/qsp.h, qemu/queue.h, qemu/thread-posix.h, qemu/thread.h, qemu/timer.h, and a few more. Include qemu/main-loop.h only where it's needed. Touching it now recompiles only some 1700 objects. For block/aio.h and qemu/event_notifier.h, these numbers drop from 5600 to 2800. For the others, they shrink only slightly. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20190812052359.30071-21-armbru@redhat.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>	2019-08-16 13:31:52 +02:00
Markus Armbruster	a8d2532645	Include qemu-common.h exactly where needed No header includes qemu-common.h after this commit, as prescribed by qemu-common.h's file comment. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20190523143508.25387-5-armbru@redhat.com> [Rebased with conflicts resolved automatically, except for include/hw/arm/xlnx-zynqmp.h hw/arm/nrf51_soc.c hw/arm/msf2-soc.c block/qcow2-refcount.c block/qcow2-cluster.c block/qcow2-cache.c target/arm/cpu.h target/lm32/cpu.h target/m68k/cpu.h target/mips/cpu.h target/moxie/cpu.h target/nios2/cpu.h target/openrisc/cpu.h target/riscv/cpu.h target/tilegx/cpu.h target/tricore/cpu.h target/unicore32/cpu.h target/xtensa/cpu.h; bsd-user/main.c and net/tap-bsd.c fixed up]	2019-06-12 13:20:20 +02:00
Zhang Yi	2ac0f1621c	util/mmap-alloc: Add a 'is_pmem' parameter to qemu_ram_mmap besides the existing 'shared' flags, we are going to add 'is_pmem' to qemu_ram_mmap(), which indicated the memory backend file is a persist memory. Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com> Reviewed-by: Pankaj Gupta <pagupta@redhat.com> Message-Id: <786c46862cfeb253ee0ea2f44d62ffe76edb7fa4.1549555521.git.yi.z.zhang@linux.intel.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Pankaj Gupta <pagupta@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2019-04-25 14:17:36 -03:00
Peter Maydell	2cb73afa6a	Machine queue, 2019-03-11 * memfd fixes (Ilya Maximets) * Move nvdimms state into struct MachineState (Eric Auger) * hostmem-file: reject invalid pmem file sizes (Stefan Hajnoczi) -----BEGIN PGP SIGNATURE----- iQIcBAABCAAGBQJchwQFAAoJECgHk2+YTcWmhkMP/iyHjvM7eTXcbs+5xidkQpX8 mc9ElHmX/W2ZK1TUeopz2hUuOG12qkt3G4bOKEKgD07h/O5J7HPXSRvT1TU7UbA/ ZkNQiF/TpuyB8JtxIgbYtgh4ZDFIGFy5o/phjCEuejyHMxZXVL8PNKCm9ZUPKgfG XYH1Q7Y+uHH7qQDhLRPdfs5/v8hOKdmHK/SuUn/dq2CqA4GoNjnC9IfxnuvIpDU6 F2Hj2YhPC35zFgR3bIh2Fqz4qv37u50a1L4VPKaCQpPY5YNGj6jPaOVPQbMrviFI 1/yaNr5RGdNrS7aQLcDKKVeclSuFHC7x3uo27JF1RbP8p4tAQi0M89E/RLyBV5lY Y7a9fInmJbxJQifgct6dv8yzTiNoniX5yph81RMXk0CzV74sP+yeKkwkIK2dWAsn 2zsM6qCHFvIv3F7iIy+ONl6TJ/RALvyP4F3Vhd3lT2Y+nwnQOvUdrX6eL4yeYGfZ 4OPCEHIn+xhb3ApYbG+4OrDBYZrPVpr6yYcqc8Ob9paeR08DgaghDX3E23bASwSl e9Cz19nvnIse/zHIAYoWhPFMfSTkWgREzCs+VA07bqPCb1/PNHBQmxv2mvdpB8Rw r/FjZyptCNyXRSfU28HEImAA7dsB9VtZAVK9oVRXaIOk2G6W5bFfAmQmAPETBRaA K9ZExT9oQhQdjKIaya0l =6nAH -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/ehabkost/tags/machine-next-pull-request' into staging Machine queue, 2019-03-11 * memfd fixes (Ilya Maximets) * Move nvdimms state into struct MachineState (Eric Auger) * hostmem-file: reject invalid pmem file sizes (Stefan Hajnoczi) # gpg: Signature made Tue 12 Mar 2019 00:57:41 GMT # gpg: using RSA key 2807936F984DC5A6 # gpg: Good signature from "Eduardo Habkost <ehabkost@redhat.com>" [full] # Primary key fingerprint: 5A32 2FD5 ABC4 D3DB ACCF D1AA 2807 936F 984D C5A6 * remotes/ehabkost/tags/machine-next-pull-request: memfd: improve error messages memfd: set up correct errno if not supported memfd: always check for MFD_CLOEXEC hostmem-memfd: disable for systems without sealing support machine: Move nvdimms state into struct MachineState nvdimm: Rename AcpiNVDIMMState into NVDIMMState hostmem-file: reject invalid pmem file sizes Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2019-03-12 15:25:46 +00:00
Philippe Mathieu-Daudé	02cdcc96be	oslib-posix: Ignore fcntl("/dev/null", F_SETFL, O_NONBLOCK) failure Previous to OpenBSD 6.3 [1], fcntl(F_SETFL) is not permitted on memory devices. Trying this call sets errno to ENODEV ("not a memory device"): 19 ENODEV Operation not supported by device. An attempt was made to apply an inappropriate function to a device, for example, trying to read a write-only device such as a printer. Do not assert fcntl failures in this specific case (errno set to ENODEV) on OpenBSD. This fixes: $ lm32-softmmu/qemu-system-lm32 assertion "f != -1" failed: file "util/oslib-posix.c", line 247, function "qemu_set_nonblock" Abort trap (core dumped) [1] The fix seems https://github.com/openbsd/src/commit/c2a35b387f9d3c "fcntl(F_SETFL) invokes the FIONBIO and FIOASYNC ioctls internally, so the memory devices (/dev/null, /dev/zero, etc) need to permit them." Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20190307142822.8531-2-philmd@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2019-03-11 16:33:49 +01:00
Stefan Hajnoczi	314aec4a6e	hostmem-file: reject invalid pmem file sizes Guests started with NVDIMMs larger than the underlying host file produce confusing errors inside the guest. This happens because the guest accesses pages beyond the end of the file. Check the pmem file size on startup and print a clear error message if the size is invalid. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1669053 Cc: Wei Yang <richardw.yang@linux.intel.com> Cc: Zhang Yi <yi.z.zhang@linux.intel.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Igor Mammedov <imammedo@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20190214031004.32522-3-stefanha@redhat.com> Reviewed-by: Wei Yang <richardw.yang@linux.intel.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Pankaj Gupta <pagupta@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2019-03-11 10:44:19 -03:00
Murilo Opsfelder Araujo	53adb9d43e	mmap-alloc: fix hugetlbfs misaligned length in ppc64 The commit `7197fb4058` ("util/mmap-alloc: fix hugetlb support on ppc64") fixed Huge TLB mappings on ppc64. However, we still need to consider the underlying huge page size during munmap() because it requires that both address and length be a multiple of the underlying huge page size for Huge TLB mappings. Quote from "Huge page (Huge TLB) mappings" paragraph under NOTES section of the munmap(2) manual: "For munmap(), addr and length must both be a multiple of the underlying huge page size." On ppc64, the munmap() in qemu_ram_munmap() does not work for Huge TLB mappings because the mapped segment can be aligned with the underlying huge page size, not aligned with the native system page size, as returned by getpagesize(). This has the side effect of not releasing huge pages back to the pool after a hugetlbfs file-backed memory device is hot-unplugged. This patch fixes the situation in qemu_ram_mmap() and qemu_ram_munmap() by considering the underlying page size on ppc64. After this patch, memory hot-unplug releases huge pages back to the pool. Fixes: `7197fb4058` Signed-off-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-02-04 18:44:20 +11:00
Li Qiang	da93b82079	util: check the return value of fcntl in qemu_set_{block, nonblock} Assert that the return value is not an error. This is like commit `7e6478e7d4` for qemu_set_cloexec. Signed-off-by: Li Qiang <liq3ea@163.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2019-01-14 19:31:04 -05:00
Brad Smith	fc3d1bad1e	oslib-posix: Use MAP_STACK in qemu_alloc_stack() on OpenBSD Use MAP_STACK in qemu_alloc_stack() on OpenBSD. Added to our 6.4 release. MAP_STACK Indicate that the mapping is used as a stack. This flag must be used in combination with MAP_ANON and MAP_PRIVATE. Implement MAP_STACK option for mmap(). Synchronous faults (pagefault and syscall) confirm the stack register points at MAP_STACK memory, otherwise SIGSEGV is delivered. sigaltstack() and pthread_attr_setstack() are modified to create a MAP_STACK sub-region which satisfies alignment requirements. Observe that MAP_STACK can only be set/cleared by mmap(), which zeroes the contents of the region -- there is no mprotect() equivalent operation, so there is no MAP_STACK-adding gadget. Signed-off-by: Brad Smith <brad@comstyle.com> Reviewed-by: Kamil Rytarowski <n54@gmx.com> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Message-id: 20181019125239.GA13884@humpty.home.comstyle.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2018-11-06 10:52:23 +00:00
Marc-André Lureau	35f7f3fb5c	util: use fcntl() for qemu_write_pidfile() locking Daniel BerrangÃ© suggested to use fcntl() locks rather than lockf(). 'man lockf': On Linux, lockf() is just an interface on top of fcntl(2) locking. Many other systems implement lockf() in this way, but note that POSIX.1 leaves the relationship between lockf() and fcntl(2) locks unspecified. A portable application should probably avoid mixing calls to these interfaces. IOW, if its just a shim around fcntl() on many systems, it is clearer if we just use fcntl() directly, as we then know how fcntl() locks will behave if they're on a network filesystem like NFS. Suggested-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20180831145314.14736-3-marcandre.lureau@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-10-02 18:47:55 +02:00
Marc-André Lureau	9e6bdef224	util: add qemu_write_pidfile() There are variants of qemu_create_pidfile() in qemu-pr-helper and qemu-ga. Let's have a common implementation in libqemuutil. The code is initially based from pr-helper write_pidfile(), with various improvements and suggestions from Daniel BerrangÃ©: QEMU will leave the pidfile existing on disk when it exits which initially made me think it avoids the deletion race. The app managing QEMU, however, may well delete the pidfile after it has seen QEMU exit, and even if the app locks the pidfile before deleting it, there is still a race. eg consider the following sequence QEMU 1 libvirtd QEMU 2 1. lock(pidfile) 2. exit() 3. open(pidfile) 4. lock(pidfile) 5. open(pidfile) 6. unlink(pidfile) 7. close(pidfile) 8. lock(pidfile) IOW, at step 8 the new QEMU has successfully acquired the lock, but the pidfile no longer exists on disk because it was deleted after the original QEMU exited. While we could just say no external app should ever delete the pidfile, I don't think that is satisfactory as people don't read docs, and admins don't like stale pidfiles being left around on disk. To make this robust, I think we might want to copy libvirt's approach to pidfile acquisition which runs in a loop and checks that the file on disk /after/ acquiring the lock matches the file that was locked. Then we could in fact safely let QEMU delete its own pidfiles on clean exit.. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20180831145314.14736-2-marcandre.lureau@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-10-02 18:47:55 +02:00
Marcel Apfelbaum	06329ccecf	mem: add share parameter to memory-backend-ram Currently only file backed memory backend can be created with a "share" flag in order to allow sharing guest RAM with other processes in the host. Add the "share" flag also to RAM Memory Backend in order to allow remapping parts of the guest RAM to different host virtual addresses. This is needed by the RDMA devices in order to remap non-contiguous QEMU virtual addresses to a contiguous virtual address range. Moved the "share" flag to the Host Memory base class, modified phys_mem_alloc to include the new parameter and a new interface memory_region_init_ram_shared_nomigrate. There are no functional changes if the new flag is not used. Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Marcel Apfelbaum <marcel@redhat.com>	2018-02-19 13:03:24 +02:00
Andreas Gustafsson	9bc5a7193f	oslib-posix: check for posix_memalign in configure script Check for the presence of posix_memalign() in the configure script, not using "defined(_POSIX_C_SOURCE) && !defined(__sun__)". This lets qemu use posix_memalign() on NetBSD versions that have it, instead of falling back to valloc() which is wasteful when the required alignment is smaller than a page. Signed-off-by: Andreas Gustafsson <gson@gson.org> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru> Reviewed-by: Kamil Rytarowski <n54@gmx.com> Reviewed-by: Peter Maydell <peter.maydell@linaro.org>	2018-02-10 10:21:50 +03:00
Kamil Rytarowski	094611b426	oslib-posix: Use sysctl(2) call to resolve exec_dir on NetBSD NetBSD 8.0(beta) ships with KERN_PROC_PATHNAME in sysctl(2). Older NetBSD versions can use argv[0] parsing fallback. This code section is partly shared with FreeBSD. Signed-off-by: Kamil Rytarowski <n54@gmx.com> Message-id: 20171028194833.23858-1-n54@gmx.com Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2017-11-02 16:19:34 +00:00
Stefan Weil	e947d47da0	oslib-posix: Fix compiler warning and some data types gcc warning: /qemu/util/oslib-posix.c:304:11: error: variable ‘addr’ might be clobbered by ‘longjmp’ or ‘vfork’ [-Werror=clobbered] Fix also some related data types: numpages, hpagesize are used as pointer offset. Always use size_t for them and also for the derived numpages_per_thread and size_per_thread. Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Stefan Weil <sw@weilnetz.de> Message-id: 20171016202912.1117-1-sw@weilnetz.de Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-10-20 11:16:27 +02:00
Eduardo Habkost	e916a6e88a	oslib-posix: Print errors before aborting on qemu_alloc_stack() If QEMU is running on a system that's out of memory and mmap() fails, QEMU aborts with no error message at all, making it hard to debug the reason for the failure. Add perror() calls that will print error information before aborting. Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-id: 20170829212053.6003-1-ehabkost@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-08-30 09:33:49 +01:00
Peter Maydell	02ffa034fb	util/oslib-posix.c: Avoid warning on NetBSD On NetBSD the compiler warns: util/oslib-posix.c: In function 'sigaction_invoke': util/oslib-posix.c:589:5: warning: missing braces around initializer [-Wmissing-braces] siginfo_t si = { 0 }; ^ util/oslib-posix.c:589:5: warning: (near initialization for 'si.si_pad') [-Wmissing-braces] because on this platform siginfo_t is defined as typedef union siginfo { char si_pad[128]; /* Total size; for future expansion */ struct _ksiginfo _info; } siginfo_t; Avoid this warning by initializing the struct with {} instead; this is a GCC extension but we use it all over the codebase already. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 1500568341-8389-1-git-send-email-peter.maydell@linaro.org	2017-07-21 10:32:19 +01:00
Daniel P. Berrange	788cf9f8c8	block: rip out all traces of password prompting Now that qcow & qcow2 are wired up to get encryption keys via the QCryptoSecret object, nothing is relying on the interactive prompting for passwords. All the code related to password prompting can thus be ripped out. Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Message-id: 20170623162419.26068-17-berrange@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-07-11 17:44:56 +02:00

1 2 3

105 Commits