Commit Graph

84 Commits

Author SHA1 Message Date
Laurent Vivier 894022e616 net: check if the file descriptor is valid before using it
qemu_set_nonblock() checks that the file descriptor can be used and, if
not, crashes QEMU. An assert() is used for that. The use of assert() is
used to detect programming error and the coredump will allow to debug
the problem.

But in the case of the tap device, this assert() can be triggered by
a misconfiguration by the user. At startup, it's not a real problem, but it
can also happen during the hot-plug of a new device, and here it's a
problem because we can crash a perfectly healthy system.

For instance:
 # ip link add link virbr0 name macvtap0 type macvtap mode bridge
 # ip link set macvtap0 up
 # TAP=/dev/tap$(ip -o link show macvtap0 | cut -d: -f1)
 # qemu-system-x86_64 -machine q35 -device pcie-root-port,id=pcie-root-port-0 -monitor stdio 9<> $TAP
 (qemu) netdev_add type=tap,id=hostnet0,vhost=on,fd=9
 (qemu) device_add driver=virtio-net-pci,netdev=hostnet0,id=net0,bus=pcie-root-port-0
 (qemu) device_del net0
 (qemu) netdev_del hostnet0
 (qemu) netdev_add type=tap,id=hostnet1,vhost=on,fd=9
 qemu-system-x86_64: .../util/oslib-posix.c:247: qemu_set_nonblock: Assertion `f != -1' failed.
 Aborted (core dumped)

To avoid that, add a function, qemu_try_set_nonblock(), that allows to report the
problem without crashing.

In the same way, we also update the function for vhostfd in net_init_tap_one() and
for fd in net_init_socket() (both descriptors are provided by the user and can
be wrong).

Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2020-07-15 21:00:13 +08:00
Michal Privoznik e47f4765af util: Introduce qemu_get_host_name()
This function offers operating system agnostic way to fetch host
name. It is implemented for both POSIX-like and Windows systems.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2020-07-13 17:44:58 -05:00
David CARLIER 2b9b9e7010 util/oslib-posix.c: Implement qemu_init_exec_dir() for Haiku
The qemu_init_exec_dir() function is inherently non-portable;
provide an implementation for Haiku hosts.

Signed-off-by: David Carlier <devnexen@gmail.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20200703145614.16684-9-peter.maydell@linaro.org
[PMM: Expanded commit message]
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2020-07-13 14:36:10 +01:00
David CARLIER 2a4b472c3c osdep.h: Always include <sys/signal.h> if it exists
Regularize our handling of <sys/signal.h>: currently we include it in
osdep.h, but only for OpenBSD, and we include it without an ifdef
guard in a couple of C files.  This causes problems for Haiku, which
doesn't have that header.

Instead, check in configure whether sys/signal.h exists, and if it
does then always include it from osdep.h.

Signed-off-by: David Carlier <devnexen@gmail.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20200703145614.16684-5-peter.maydell@linaro.org
[PMM: Expanded commit message; rename to HAVE_SYS_SIGNAL_H]
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2020-07-13 14:36:09 +01:00
David CARLIER 2032e243a5 util/oslib-posix : qemu_init_exec_dir implementation for Mac
From 3025a0ce3fdf7d3559fc35a52c659f635f5c750c Mon Sep 17 00:00:00 2001
From: David Carlier <devnexen@gmail.com>
Date: Tue, 26 May 2020 21:35:27 +0100
Subject: [PATCH] util/oslib-posix : qemu_init_exec_dir implementation for Mac

Using dyld API to get the full path of the current process.

Signed-off-by: David Carlier <devnexen@gmail.com>
Message-id: CA+XhMqxwC10XHVs4Z-JfE0-WLAU3ztDuU9QKVi31mjr59HWCxg@mail.gmail.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2020-06-23 11:39:46 +01:00
David Carlier 9548a89173 util/oslib: Returns the real thread identifier on FreeBSD and NetBSD
getpid is good enough in a mono thread context, however thr_self/_lwp_self
reflects the real current thread identifier from a given process.

Signed-off-by: David Carlier <devnexen@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Signed-off-by: David Carlier <devnexen@gmail.com>
2020-06-10 12:10:48 -04:00
Bauerchen 278fb16273 oslib-posix: take lock before qemu_cond_broadcast
In touch_all_pages, if the mutex is not taken around qemu_cond_broadcast,
qemu_cond_broadcast may be called before all touch page threads enter
qemu_cond_wait. In this case, the touch page threads wait forever for the
main thread to wake them up, causing a deadlock.

Signed-off-by: Bauerchen <bauerchen@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-04-11 08:49:20 -04:00
Paolo Bonzini 78b3f67acd oslib-posix: initialize mutex and condition variable
The mutex and condition variable were never initialized, causing
-mem-prealloc to abort with an assertion failure.

Fixes: 037fb5eb39
Reported-by: Marc Hartmayer <mhartmay@linux.ibm.com>
Cc: bauerchen <bauerchen@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 23:02:22 +01:00
bauerchen 037fb5eb39 mem-prealloc: optimize large guest startup
[desc]:
    Large memory VM starts slowly when using -mem-prealloc, and
    there are some areas to optimize in current method;

    1、mmap will be used to alloc threads stack during create page
    clearing threads, and it will attempt mm->mmap_sem for write
    lock, but clearing threads have hold read lock, this competition
    will cause threads createion very slow;

    2、methods of calcuating pages for per threads is not well;if we use
    64 threads to split 160 hugepage,63 threads clear 2page,1 thread
    clear 34 page,so the entire speed is very slow;

    to solve the first problem,we add a mutex in thread function,and
    start all threads when all threads finished createion;
    and the second problem, we spread remainder to other threads,in
    situation that 160 hugepage and 64 threads, there are 32 threads
    clear 3 pages,and 32 threads clear 2 pages.

[test]:
    320G 84c VM start time can be reduced to 10s
    680G 84c VM start time can be reduced to 18s

Signed-off-by: bauerchen <bauerchen@tencent.com>
Reviewed-by: Pan Rui <ruippan@tencent.com>
Reviewed-by: Ivan Ren <ivanren@tencent.com>
[Simplify computation of the number of pages per thread. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-25 09:18:01 +01:00
Wei Yang 038adc2f58 core: replace getpagesize() with qemu_real_host_page_size
There are three page size in qemu:

  real host page size
  host page size
  target page size

All of them have dedicate variable to represent. For the last two, we
use the same form in the whole qemu project, while for the first one we
use two forms: qemu_real_host_page_size and getpagesize().

qemu_real_host_page_size is defined to be a replacement of
getpagesize(), so let it serve the role.

[Note] Not fully tested for some arch or device.

Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
Message-Id: <20191013021145.16011-3-richardw.yang@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-26 15:38:06 +02:00
Stefan Hajnoczi 72d41eb4b8 memory: fetch pmem size in get_file_size()
Neither stat(2) nor lseek(2) report the size of Linux devdax pmem
character device nodes.  Commit 314aec4a6e
("hostmem-file: reject invalid pmem file sizes") added code to
hostmem-file.c to fetch the size from sysfs and compare against the
user-provided size=NUM parameter:

  if (backend->size > size) {
      error_setg(errp, "size property %" PRIu64 " is larger than "
                 "pmem file \"%s\" size %" PRIu64, backend->size,
                 fb->mem_path, size);
      return;
  }

It turns out that exec.c:qemu_ram_alloc_from_fd() already has an
equivalent size check but it skips devdax pmem character devices because
lseek(2) returns 0:

  if (file_size > 0 && file_size < size) {
      error_setg(errp, "backing store %s size 0x%" PRIx64
                 " does not match 'size' option 0x" RAM_ADDR_FMT,
                 mem_path, file_size, size);
      return NULL;
  }

This patch moves the devdax pmem file size code into get_file_size() so
that we check the memory size in a single place:
qemu_ram_alloc_from_fd().  This simplifies the code and makes it more
general.

This also fixes the problem that hostmem-file only checks the devdax
pmem file size when the pmem=on parameter is given.  An unchecked
size=NUM parameter can lead to SIGBUS in QEMU so we must always fetch
the file size for Linux devdax pmem character device nodes.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20190830093056.12572-1-stefanha@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-09-16 12:32:21 +02:00
Markus Armbruster db72581598 Include qemu/main-loop.h less
In my "build everything" tree, changing qemu/main-loop.h triggers a
recompile of some 5600 out of 6600 objects (not counting tests and
objects that don't depend on qemu/osdep.h).  It includes block/aio.h,
which in turn includes qemu/event_notifier.h, qemu/notify.h,
qemu/processor.h, qemu/qsp.h, qemu/queue.h, qemu/thread-posix.h,
qemu/thread.h, qemu/timer.h, and a few more.

Include qemu/main-loop.h only where it's needed.  Touching it now
recompiles only some 1700 objects.  For block/aio.h and
qemu/event_notifier.h, these numbers drop from 5600 to 2800.  For the
others, they shrink only slightly.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20190812052359.30071-21-armbru@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
2019-08-16 13:31:52 +02:00
Markus Armbruster a8d2532645 Include qemu-common.h exactly where needed
No header includes qemu-common.h after this commit, as prescribed by
qemu-common.h's file comment.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20190523143508.25387-5-armbru@redhat.com>
[Rebased with conflicts resolved automatically, except for
include/hw/arm/xlnx-zynqmp.h hw/arm/nrf51_soc.c hw/arm/msf2-soc.c
block/qcow2-refcount.c block/qcow2-cluster.c block/qcow2-cache.c
target/arm/cpu.h target/lm32/cpu.h target/m68k/cpu.h target/mips/cpu.h
target/moxie/cpu.h target/nios2/cpu.h target/openrisc/cpu.h
target/riscv/cpu.h target/tilegx/cpu.h target/tricore/cpu.h
target/unicore32/cpu.h target/xtensa/cpu.h; bsd-user/main.c and
net/tap-bsd.c fixed up]
2019-06-12 13:20:20 +02:00
Zhang Yi 2ac0f1621c util/mmap-alloc: Add a 'is_pmem' parameter to qemu_ram_mmap
besides the existing 'shared' flags, we are going to add
'is_pmem' to qemu_ram_mmap(), which indicated the memory backend
file is a persist memory.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com>
Reviewed-by: Pankaj Gupta <pagupta@redhat.com>
Message-Id: <786c46862cfeb253ee0ea2f44d62ffe76edb7fa4.1549555521.git.yi.z.zhang@linux.intel.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Pankaj Gupta <pagupta@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2019-04-25 14:17:36 -03:00
Peter Maydell 2cb73afa6a Machine queue, 2019-03-11
* memfd fixes (Ilya Maximets)
 * Move nvdimms state into struct MachineState (Eric Auger)
 * hostmem-file: reject invalid pmem file sizes (Stefan Hajnoczi)
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABCAAGBQJchwQFAAoJECgHk2+YTcWmhkMP/iyHjvM7eTXcbs+5xidkQpX8
 mc9ElHmX/W2ZK1TUeopz2hUuOG12qkt3G4bOKEKgD07h/O5J7HPXSRvT1TU7UbA/
 ZkNQiF/TpuyB8JtxIgbYtgh4ZDFIGFy5o/phjCEuejyHMxZXVL8PNKCm9ZUPKgfG
 XYH1Q7Y+uHH7qQDhLRPdfs5/v8hOKdmHK/SuUn/dq2CqA4GoNjnC9IfxnuvIpDU6
 F2Hj2YhPC35zFgR3bIh2Fqz4qv37u50a1L4VPKaCQpPY5YNGj6jPaOVPQbMrviFI
 1/yaNr5RGdNrS7aQLcDKKVeclSuFHC7x3uo27JF1RbP8p4tAQi0M89E/RLyBV5lY
 Y7a9fInmJbxJQifgct6dv8yzTiNoniX5yph81RMXk0CzV74sP+yeKkwkIK2dWAsn
 2zsM6qCHFvIv3F7iIy+ONl6TJ/RALvyP4F3Vhd3lT2Y+nwnQOvUdrX6eL4yeYGfZ
 4OPCEHIn+xhb3ApYbG+4OrDBYZrPVpr6yYcqc8Ob9paeR08DgaghDX3E23bASwSl
 e9Cz19nvnIse/zHIAYoWhPFMfSTkWgREzCs+VA07bqPCb1/PNHBQmxv2mvdpB8Rw
 r/FjZyptCNyXRSfU28HEImAA7dsB9VtZAVK9oVRXaIOk2G6W5bFfAmQmAPETBRaA
 K9ZExT9oQhQdjKIaya0l
 =6nAH
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/ehabkost/tags/machine-next-pull-request' into staging

Machine queue, 2019-03-11

* memfd fixes (Ilya Maximets)
* Move nvdimms state into struct MachineState (Eric Auger)
* hostmem-file: reject invalid pmem file sizes (Stefan Hajnoczi)

# gpg: Signature made Tue 12 Mar 2019 00:57:41 GMT
# gpg:                using RSA key 2807936F984DC5A6
# gpg: Good signature from "Eduardo Habkost <ehabkost@redhat.com>" [full]
# Primary key fingerprint: 5A32 2FD5 ABC4 D3DB ACCF  D1AA 2807 936F 984D C5A6

* remotes/ehabkost/tags/machine-next-pull-request:
  memfd: improve error messages
  memfd: set up correct errno if not supported
  memfd: always check for MFD_CLOEXEC
  hostmem-memfd: disable for systems without sealing support
  machine: Move nvdimms state into struct MachineState
  nvdimm: Rename AcpiNVDIMMState into NVDIMMState
  hostmem-file: reject invalid pmem file sizes

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2019-03-12 15:25:46 +00:00
Philippe Mathieu-Daudé 02cdcc96be oslib-posix: Ignore fcntl("/dev/null", F_SETFL, O_NONBLOCK) failure
Previous to OpenBSD 6.3 [1], fcntl(F_SETFL) is not permitted on
memory devices.
Trying this call sets errno to ENODEV ("not a memory device"):

  19 ENODEV Operation not supported by device.
    An attempt was made to apply an inappropriate function to a device,
    for example, trying to read a write-only device such as a printer.

Do not assert fcntl failures in this specific case (errno set to ENODEV)
on OpenBSD. This fixes:

  $ lm32-softmmu/qemu-system-lm32
  assertion "f != -1" failed: file "util/oslib-posix.c", line 247, function "qemu_set_nonblock"
  Abort trap (core dumped)

[1] The fix seems https://github.com/openbsd/src/commit/c2a35b387f9d3c
  "fcntl(F_SETFL) invokes the FIONBIO and FIOASYNC ioctls internally, so
  the memory devices (/dev/null, /dev/zero, etc) need to permit them."

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20190307142822.8531-2-philmd@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-03-11 16:33:49 +01:00
Stefan Hajnoczi 314aec4a6e hostmem-file: reject invalid pmem file sizes
Guests started with NVDIMMs larger than the underlying host file produce
confusing errors inside the guest.  This happens because the guest
accesses pages beyond the end of the file.

Check the pmem file size on startup and print a clear error message if
the size is invalid.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1669053
Cc: Wei Yang <richardw.yang@linux.intel.com>
Cc: Zhang Yi <yi.z.zhang@linux.intel.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20190214031004.32522-3-stefanha@redhat.com>
Reviewed-by: Wei Yang <richardw.yang@linux.intel.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Pankaj Gupta <pagupta@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2019-03-11 10:44:19 -03:00
Murilo Opsfelder Araujo 53adb9d43e mmap-alloc: fix hugetlbfs misaligned length in ppc64
The commit 7197fb4058 ("util/mmap-alloc:
fix hugetlb support on ppc64") fixed Huge TLB mappings on ppc64.

However, we still need to consider the underlying huge page size
during munmap() because it requires that both address and length be a
multiple of the underlying huge page size for Huge TLB mappings.
Quote from "Huge page (Huge TLB) mappings" paragraph under NOTES
section of the munmap(2) manual:

  "For munmap(), addr and length must both be a multiple of the
  underlying huge page size."

On ppc64, the munmap() in qemu_ram_munmap() does not work for Huge TLB
mappings because the mapped segment can be aligned with the underlying
huge page size, not aligned with the native system page size, as
returned by getpagesize().

This has the side effect of not releasing huge pages back to the pool
after a hugetlbfs file-backed memory device is hot-unplugged.

This patch fixes the situation in qemu_ram_mmap() and
qemu_ram_munmap() by considering the underlying page size on ppc64.

After this patch, memory hot-unplug releases huge pages back to the
pool.

Fixes: 7197fb4058
Signed-off-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-04 18:44:20 +11:00
Li Qiang da93b82079 util: check the return value of fcntl in qemu_set_{block, nonblock}
Assert that the return value is not an error. This is like commit
7e6478e7d4 for qemu_set_cloexec.

Signed-off-by: Li Qiang <liq3ea@163.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2019-01-14 19:31:04 -05:00
Brad Smith fc3d1bad1e oslib-posix: Use MAP_STACK in qemu_alloc_stack() on OpenBSD
Use MAP_STACK in qemu_alloc_stack() on OpenBSD.

Added to our 6.4 release.

MAP_STACK      Indicate that the mapping is used as a stack.  This
               flag must be used in combination with MAP_ANON and
               MAP_PRIVATE.

Implement MAP_STACK option for mmap().  Synchronous faults (pagefault and
syscall) confirm the stack register points at MAP_STACK memory, otherwise
SIGSEGV is delivered. sigaltstack() and pthread_attr_setstack() are modified
to create a MAP_STACK sub-region which satisfies alignment requirements.
Observe that MAP_STACK can only be set/cleared by mmap(), which zeroes the
contents of the region -- there is no mprotect() equivalent operation, so
there is no MAP_STACK-adding gadget.

Signed-off-by: Brad Smith <brad@comstyle.com>
Reviewed-by: Kamil Rytarowski <n54@gmx.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20181019125239.GA13884@humpty.home.comstyle.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-11-06 10:52:23 +00:00
Marc-André Lureau 35f7f3fb5c util: use fcntl() for qemu_write_pidfile() locking
Daniel Berrangé suggested to use fcntl() locks rather than lockf().

'man lockf':

   On Linux, lockf() is just an interface on top of fcntl(2) locking.
   Many other systems implement lockf() in this way, but note that
   POSIX.1 leaves the relationship between lockf() and fcntl(2) locks
   unspecified.  A portable application should probably avoid mixing
   calls to these interfaces.

IOW, if its just a shim around fcntl() on many systems, it is clearer
if we just use fcntl() directly, as we then know how fcntl() locks will
behave if they're on a network filesystem like NFS.

Suggested-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20180831145314.14736-3-marcandre.lureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-10-02 18:47:55 +02:00
Marc-André Lureau 9e6bdef224 util: add qemu_write_pidfile()
There are variants of qemu_create_pidfile() in qemu-pr-helper and
qemu-ga. Let's have a common implementation in libqemuutil.

The code is initially based from pr-helper write_pidfile(), with
various improvements and suggestions from Daniel Berrangé:

  QEMU will leave the pidfile existing on disk when it exits which
  initially made me think it avoids the deletion race. The app
  managing QEMU, however, may well delete the pidfile after it has
  seen QEMU exit, and even if the app locks the pidfile before
  deleting it, there is still a race.

  eg consider the following sequence

        QEMU 1        libvirtd        QEMU 2

  1.    lock(pidfile)

  2.    exit()

  3.                 open(pidfile)

  4.                 lock(pidfile)

  5.                                  open(pidfile)

  6.                 unlink(pidfile)

  7.                 close(pidfile)

  8.                                  lock(pidfile)

  IOW, at step 8 the new QEMU has successfully acquired the lock, but
  the pidfile no longer exists on disk because it was deleted after
  the original QEMU exited.

  While we could just say no external app should ever delete the
  pidfile, I don't think that is satisfactory as people don't read
  docs, and admins don't like stale pidfiles being left around on
  disk.

  To make this robust, I think we might want to copy libvirt's
  approach to pidfile acquisition which runs in a loop and checks that
  the file on disk /after/ acquiring the lock matches the file that
  was locked. Then we could in fact safely let QEMU delete its own
  pidfiles on clean exit..

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20180831145314.14736-2-marcandre.lureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-10-02 18:47:55 +02:00
Marcel Apfelbaum 06329ccecf mem: add share parameter to memory-backend-ram
Currently only file backed memory backend can
be created with a "share" flag in order to allow
sharing guest RAM with other processes in the host.

Add the "share" flag also to RAM Memory Backend
in order to allow remapping parts of the guest RAM
to different host virtual addresses. This is needed
by the RDMA devices in order to remap non-contiguous
QEMU virtual addresses to a contiguous virtual address range.

Moved the "share" flag to the Host Memory base class,
modified phys_mem_alloc to include the new parameter
and a new interface memory_region_init_ram_shared_nomigrate.

There are no functional changes if the new flag is not used.

Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Marcel Apfelbaum <marcel@redhat.com>
2018-02-19 13:03:24 +02:00
Andreas Gustafsson 9bc5a7193f oslib-posix: check for posix_memalign in configure script
Check for the presence of posix_memalign() in the configure script,
not using "defined(_POSIX_C_SOURCE) && !defined(__sun__)".  This
lets qemu use posix_memalign() on NetBSD versions that have it,
instead of falling back to valloc() which is wasteful when the
required alignment is smaller than a page.

Signed-off-by: Andreas Gustafsson <gson@gson.org>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Reviewed-by: Kamil Rytarowski <n54@gmx.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
2018-02-10 10:21:50 +03:00
Kamil Rytarowski 094611b426 oslib-posix: Use sysctl(2) call to resolve exec_dir on NetBSD
NetBSD 8.0(beta) ships with KERN_PROC_PATHNAME in sysctl(2).
Older NetBSD versions can use argv[0] parsing fallback.

This code section is partly shared with FreeBSD.

Signed-off-by: Kamil Rytarowski <n54@gmx.com>
Message-id: 20171028194833.23858-1-n54@gmx.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-11-02 16:19:34 +00:00
Stefan Weil e947d47da0 oslib-posix: Fix compiler warning and some data types
gcc warning:

/qemu/util/oslib-posix.c:304:11: error:
 variable ‘addr’ might be clobbered by ‘longjmp’ or ‘vfork’
 [-Werror=clobbered]

Fix also some related data types:

numpages, hpagesize are used as pointer offset.
Always use size_t for them and also for the derived
numpages_per_thread and size_per_thread.

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Message-id: 20171016202912.1117-1-sw@weilnetz.de
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-10-20 11:16:27 +02:00
Eduardo Habkost e916a6e88a oslib-posix: Print errors before aborting on qemu_alloc_stack()
If QEMU is running on a system that's out of memory and mmap()
fails, QEMU aborts with no error message at all, making it hard
to debug the reason for the failure.

Add perror() calls that will print error information before
aborting.

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20170829212053.6003-1-ehabkost@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-08-30 09:33:49 +01:00
Peter Maydell 02ffa034fb util/oslib-posix.c: Avoid warning on NetBSD
On NetBSD the compiler warns:
util/oslib-posix.c: In function 'sigaction_invoke':
util/oslib-posix.c:589:5: warning: missing braces around initializer [-Wmissing-braces]
     siginfo_t si = { 0 };
     ^
util/oslib-posix.c:589:5: warning: (near initialization for 'si.si_pad') [-Wmissing-braces]

because on this platform siginfo_t is defined as
  typedef union siginfo {
          char    si_pad[128];    /* Total size; for future expansion */
          struct _ksiginfo _info;
  } siginfo_t;

Avoid this warning by initializing the struct with {} instead;
this is a GCC extension but we use it all over the codebase already.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 1500568341-8389-1-git-send-email-peter.maydell@linaro.org
2017-07-21 10:32:19 +01:00
Daniel P. Berrange 788cf9f8c8 block: rip out all traces of password prompting
Now that qcow & qcow2 are wired up to get encryption keys
via the QCryptoSecret object, nothing is relying on the
interactive prompting for passwords. All the code related
to password prompting can thus be ripped out.

Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 20170623162419.26068-17-berrange@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-07-11 17:44:56 +02:00
Peter Maydell 2a8469aaab -----BEGIN PGP SIGNATURE-----
iQEcBAABAgAGBQJZOEC7AAoJEJykq7OBq3PIs4wIAMWtKbkZQkgqpHnSt4ZzpjkQ
 1lRLmM6HOqMO1SarK0PduSRafwgUD2q/24gk6UeqUbArZXK+7U3SnfsfEHkOqCdB
 +ELRPZ+/b77AEZzhCSZ7uHHvrISO5MtLW/z4Av8GB5KYARbO5aZgIyeg7Na29SDk
 vQoANktrtLgLHu0vZfSUTTmPMRJqcC/DMm/EukXapXVW+cEt23V+nohchFmw8VtF
 Uni17u26B7CJGZOGduD11CIIvQ9QX+acyDlknkCIqfwd3Xxle0Mu0S3IV4KH7zjn
 MIcF3hGQWeln5AZzQe998EC0Ko+pmPRooZaeUbrolp+KjK9OcRnNKr7O2jjvGkA=
 =9+yN
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/stefanha/tags/block-pull-request' into staging

# gpg: Signature made Wed 07 Jun 2017 19:06:51 BST
# gpg:                using RSA key 0x9CA4ABB381AB73C8
# gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>"
# gpg:                 aka "Stefan Hajnoczi <stefanha@gmail.com>"
# Primary key fingerprint: 8695 A8BF D3F9 7CDA AC35  775A 9CA4 ABB3 81AB 73C8

* remotes/stefanha/tags/block-pull-request:
  configure: split c and cxx extra flags
  coroutine-lock: do not touch coroutine after another one has been entered
  .gdbinit: load QEMU sub-commands when gdb starts
  coccinelle: fix typo in comment
  oslib: strip trailing '\n' from error_setg() string argument

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-06-12 14:14:42 +01:00
Philippe Mathieu-Daudé 462e5d5065 oslib: strip trailing '\n' from error_setg() string argument
spotted by Coccinelle script scripts/coccinelle/err-bad-newline.cocci

Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-06-07 14:38:44 +01:00
Stefano Stabellini 7e6478e7d4 Check the return value of fcntl in qemu_set_cloexec
Assert that the return value is not an error. This issue was found by
Coverity.

CID: 1374831

Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
CC: groug@kaod.org
CC: pbonzini@redhat.com
CC: Eric Blake <eblake@redhat.com>
Message-Id: <1494356693-13190-2-git-send-email-sstabellini@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-06-06 20:18:35 +02:00
Greg Kurz fcdcf1eed2 util: drop old utimensat() compat code
Now that 9pfs and virtfs-proxy-helper have been converted to utimensat(),
we don't need to keep qemu_utimens() anymore.

Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
2017-05-25 10:30:14 +02:00
Jitendra Kolhe dfd0dcc717 mem-prealloc: fix sysconf(_SC_NPROCESSORS_ONLN) failure case.
This was spotted by Coverity, in case where sysconf(_SC_NPROCESSORS_ONLN)
fails and returns -1. This results in memset_num_threads getting set to -1.
Which we then pass to g_new0().
The patch replaces MAX_MEM_PREALLOC_THREAD_COUNT macro with a function call
get_memset_num_threads() to handle sysconf() failure gracefully. In case
sysconf() fails, we fall back to single threaded.

(Spotted by Coverity, CID 1372465.)

Signed-off-by: Jitendra Kolhe <jitendra.kolhe@hpe.com>
Message-Id: <1490079006-32495-1-git-send-email-jitendra.kolhe@hpe.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-03-24 11:50:11 +01:00
Paolo Bonzini 6b3cca76dd oslib-posix: fix compilation on OpenBSD
si_band is not found in OpenBSD.   It is marked as obsolescent in
POSIX, so we can delete it without any remorse.

Reported-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 20170317152214.6148-1-pbonzini@redhat.com
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Tested-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-03-17 18:27:49 +00:00
Daniel P. Berrange 9dc44aa582 os: don't corrupt pre-existing memory-backend data with prealloc
When using a memory-backend object with prealloc turned on, QEMU
will memset() the first byte in every memory page to zero. While
this might have been acceptable for memory backends associated
with RAM, this corrupts application data for NVDIMMs.

Instead of setting every page to zero, read the current byte
value and then just write that same value back, so we are not
corrupting the original data. Directly write the value instead
of memset()ing it, since there's no benefit to memset for a
single byte write.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Message-id: 20170303113255.28262-1-berrange@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-03-15 11:55:41 +08:00
Jitendra Kolhe 1e356fc14b mem-prealloc: reduce large guest start-up and migration time.
Using "-mem-prealloc" option for a large guest leads to higher guest
start-up and migration time. This is because with "-mem-prealloc" option
qemu tries to map every guest page (create address translations), and
make sure the pages are available during runtime. virsh/libvirt by
default, seems to use "-mem-prealloc" option in case the guest is
configured to use huge pages. The patch tries to map all guest pages
simultaneously by spawning multiple threads. Currently limiting the
change to QEMU library functions on POSIX compliant host only, as we are
not sure if the problem exists on win32. Below are some stats with
"-mem-prealloc" option for guest configured to use huge pages.

------------------------------------------------------------------------
Idle Guest      | Start-up time | Migration time
------------------------------------------------------------------------
Guest stats with 2M HugePage usage - single threaded (existing code)
------------------------------------------------------------------------
64 Core - 4TB   | 54m11.796s    | 75m43.843s
64 Core - 1TB   | 8m56.576s     | 14m29.049s
64 Core - 256GB | 2m11.245s     | 3m26.598s
------------------------------------------------------------------------
Guest stats with 2M HugePage usage - map guest pages using 8 threads
------------------------------------------------------------------------
64 Core - 4TB   | 5m1.027s      | 34m10.565s
64 Core - 1TB   | 1m10.366s     | 8m28.188s
64 Core - 256GB | 0m19.040s     | 2m10.148s
-----------------------------------------------------------------------
Guest stats with 2M HugePage usage - map guest pages using 16 threads
-----------------------------------------------------------------------
64 Core - 4TB   | 1m58.970s     | 31m43.400s
64 Core - 1TB   | 0m39.885s     | 7m55.289s
64 Core - 256GB | 0m11.960s     | 2m0.135s
-----------------------------------------------------------------------

Changed in v2:
 - modify number of memset threads spawned to min(smp_cpus, 16).
 - removed 64GB memory restriction for spawning memset threads.

Changed in v3:
 - limit number of threads spawned based on
   min(sysconf(_SC_NPROCESSORS_ONLN), 16, smp_cpus)
 - implement memset thread specific siglongjmp in SIGBUS signal_handler.

Changed in v4
 - remove sigsetjmp/siglongjmp and SIGBUS unblock/block for main thread
   as main thread no longer touches any pages.
 - simplify code my returning memset_thread_failed status from
   touch_all_pages.

Signed-off-by: Jitendra Kolhe <jitendra.kolhe@hpe.com>
Message-Id: <1487907103-32350-1-git-send-email-jitendra.kolhe@hpe.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-03-14 13:26:36 +01:00
Paolo Bonzini d98d407234 cpus: remove ugly cast on sigbus_handler
The cast is there because sigbus_handler is invoked via sigfd_handler.
But it feels just wrong to use struct qemu_signalfd_siginfo in the
prototype of a function that is passed to sigaction.

Instead, do a simple-minded conversion of qemu_signalfd_siginfo to
siginfo_t.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-03-03 16:40:02 +01:00
Ed Maste a7764f1548 Fix FreeBSD (10.x) build after 7dc9ae43
Include sys/user.h for declaration of 'struct kinfo_proc'.
Add -lutil to qemu-ga link for kinfo_getproc.

Signed-off-by: Ed Maste <emaste@freebsd.org>
Message-id: 1479778365-11315-1-git-send-email-emaste@freebsd.org
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-11-22 10:56:01 +00:00
Anand J 814bb12a56 clean-up: removed duplicate #includes
Some files contain multiple #includes of the same header file.
Removed most of those unnecessary duplicate entries using
scripts/clean-includes.

Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Anand J <anand.indukala@gmail.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2016-10-28 18:17:24 +03:00
Peter Maydell 86e121ae75 * Thread Sanitizer fixes (Alex)
* Coverity fixes (David)
 * test-qht fixes (Emilio)
 * QOM interface for info irq/info pic (Hervé)
 * -rtc clock=rt fix (Junlian)
 * mux chardev fixes (Marc-André)
 * nicer report on death by signal (Michal)
 * qemu-tech TLC (Paolo)
 * MSI support for edu device (Peter)
 * qemu-nbd --offset fix (Tomáš)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQExBAABCAAbBQJX98xmFBxwYm9uemluaUByZWRoYXQuY29tAAoJEL/70l94x66D
 IXsH/idLNlBzbrGhcuZOXEAd4fCyCyhXGMuOAGJXLHgv+EfiqrJ9z4HTn44czdh7
 rJuQDYeDrfl36zc0n8weY7JSEsorCq+JBDomFUFodmCrFUIue2jXYOK6pt5LUrQM
 OTyruQMKHD316SnJFOK8Tkxi5DrAHNRs+ynDcm+IoB65KE9YgBcBWuEJ03mF9cHi
 5sb/SBEqfL49gVlnFXBDTRgXXwA5axS7xKd4+7CWtbVFvJxurImjywGqKI5G/dmC
 TJyP+Dty4iNjFP1E0VvfL6ETovncZlfe4Hx1b971pll/ec88jGL0brqQMPjACrWh
 TyLXLN9oTbEKuDxx1Nh23xRFh+c=
 =sgtZ
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging

* Thread Sanitizer fixes (Alex)
* Coverity fixes (David)
* test-qht fixes (Emilio)
* QOM interface for info irq/info pic (Hervé)
* -rtc clock=rt fix (Junlian)
* mux chardev fixes (Marc-André)
* nicer report on death by signal (Michal)
* qemu-tech TLC (Paolo)
* MSI support for edu device (Peter)
* qemu-nbd --offset fix (Tomáš)

# gpg: Signature made Fri 07 Oct 2016 17:25:10 BST
# gpg:                using RSA key 0xBFFBD25F78C7AE83
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>"
# gpg:                 aka "Paolo Bonzini <pbonzini@redhat.com>"
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4  E2F7 7E15 100C CD36 69B1
#      Subkey fingerprint: F133 3857 4B66 2389 866C  7682 BFFB D25F 78C7 AE83

* remotes/bonzini/tags/for-upstream: (39 commits)
  qemu-doc: merge qemu-tech and qemu-doc
  qemu-tech: rewrite some parts
  qemu-tech: reorganize content
  qemu-tech: move TCG test documentation to tests/tcg/README
  qemu-tech: move user mode emulation features from qemu-tech
  qemu-tech: document lazy condition code evaluation in cpu.h
  qemu-tech: move text from qemu-tech to tcg/README
  qemu-doc: drop installation and compilation notes
  qemu-doc: replace introduction with the one from the internals manual
  qemu-tech: drop index
  test-qht: perform lookups under rcu_read_lock
  qht: fix unlock-after-free segfault upon resizing
  qht: simplify qht_reset_size
  qemu-nbd: Shrink image size by specified offset
  qemu_kill_report: Report PID name too
  util: Introduce qemu_get_pid_name
  char: update read handler in all cases
  char: use a fixed idx for child muxed chr
  i8259: give ISA device when registering ISA ioports
  .travis.yml: add gcc sanitizer build
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-10-10 10:39:29 +01:00
Michal Privoznik 7dc9ae4339 util: Introduce qemu_get_pid_name
This is a small helper that tries to fetch binary name for given
PID.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Message-Id: <4d75d475c1884f8e94ee8b1e57273ddf3ed68bf7.1474987617.git.mprivozn@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-10-04 10:00:27 +02:00
Peter Lieven 7d992e4d5a oslib-posix: add a configure switch to debug stack usage
this adds a knob to track the maximum stack usage of stacks
created by qemu_alloc_stack.

Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-09-29 14:13:39 +02:00
Peter Lieven 8737d9e0c4 oslib-posix: add helpers for stack alloc and free
the allocated stack will be adjusted to the minimum supported stack size
by the OS and rounded up to be a multiple of the system pagesize.
Additionally an architecture dependent guard page is added to the stack
to catch stack overflows.

Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-09-29 14:13:39 +02:00
Igor Mammedov 056b68af77 fix qemu exit on memory hotplug when allocation fails at prealloc time
When adding hostmem backend at runtime, QEMU might exit with error:
  "os_mem_prealloc: Insufficient free host memory pages available to allocate guest RAM"

It happens due to os_mem_prealloc() not handling errors gracefully.

Fix it by passing errp argument so that os_mem_prealloc() could
report error to callers and undo performed allocation when
os_mem_prealloc() fails.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <1469008443-72059-1-git-send-email-imammedo@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-08-02 12:03:58 +02:00
Wei Jiangang 55ad781ca7 use g_path_get_dirname instead of dirname
Use g_path_get_basename to get the directory components of
a file name, and free its return when no longer needed.

Signed-off-by: Wei Jiangang <weijg.fnst@cn.fujitsu.com>
Message-Id: <1459997185-15669-3-git-send-email-weijg.fnst@cn.fujitsu.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-07-17 09:59:21 +02:00
Markus Armbruster a9c94277f0 Use #include "..." for our own headers, <...> for others
Tracked down with an ugly, brittle and probably buggy Perl script.

Also move includes converted to <...> up so they get included before
ours where that's obviously okay.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Tested-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
2016-07-12 16:19:16 +02:00
Paolo Bonzini 02d0e09503 os-posix: include sys/mman.h
qemu/osdep.h checks whether MAP_ANONYMOUS is defined, but this check
is bogus without a previous inclusion of sys/mman.h.  Include it in
sysemu/os-posix.h and remove it from everywhere else.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-06-16 18:39:03 +02:00
Dominik Dingel d2f39add72 exec.c: Ensure right alignment also for file backed ram
While in the anonymous ram case we already take care of the right alignment
such an alignment gurantee does not exist for file backed ram allocation.

Instead, pagesize is used for alignment. On s390 this is not enough for gmap,
as we need to satisfy an alignment up to segments.

Reported-by: Halil Pasic <pasic@linux.vnet.ibm.com>
Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>

Message-Id: <1461585338-45863-1-git-send-email-dingel@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-23 16:53:42 +02:00
Christoffer Dall ee1e0f8e5d util: align memory allocations to 2M on AArch64
For KVM to use Transparent Huge Pages (THP) we have to ensure that the
alignment of the userspace address of the KVM memory slot and the IPA
that the guest sees for a memory region have the same offset from the 2M
huge page size boundary.

One way to achieve this is to always align the IPA region at a 2M
boundary and ensure that the mmap alignment is also at 2M.

Unfortunately, we were only doing this for __arm__, not for __aarch64__,
so add this simple condition.

This fixes a performance regression using KVM/ARM on AArch64 platforms
that showed a performance penalty of more than 50%, introduced by the
following commit:

9fac18f (oslib: allocate PROT_NONE pages on top of RAM, 2015-09-10)

We were only lucky before the above commit, because we were allocating
large regions and naturally getting a 2M alignment on those allocations
then.

Cc: qemu-stable@nongnu.org
Reported-by: Shih-Wei Li <shihwei@cs.columbia.edu>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
[PMM: wrapped long line]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-04-22 12:26:01 +01:00