If we update an existing memslot (e.g., resize, split), we temporarily
remove the memslot to re-add it immediately afterwards. These updates
are not atomic, especially not for KVM VCPU threads, such that we can
get spurious faults.
Let's inhibit most KVM ioctls while performing relevant updates, such
that we can perform the update just as if it would happen atomically
without additional kernel support.
We capture the add/del changes and apply them in the notifier commit
stage instead. There, we can check for overlaps and perform the ioctl
inhibiting only if really required (-> overlap).
To keep things simple we don't perform additional checks that wouldn't
actually result in an overlap -- such as !RAM memory regions in some
cases (see kvm_set_phys_mem()).
To minimize cache-line bouncing, use a separate indicator
(in_ioctl_lock) per CPU. Also, make sure to hold the kvm_slots_lock
while performing both actions (removing+re-adding).
We have to wait until all IOCTLs were exited and block new ones from
getting executed.
This approach cannot result in a deadlock as long as the inhibitor does
not hold any locks that might hinder an IOCTL from getting finished and
exited - something fairly unusual. The inhibitor will always hold the BQL.
AFAIKs, one possible candidate would be userfaultfd. If a page cannot be
placed (e.g., during postcopy), because we're waiting for a lock, or if the
userfaultfd thread cannot process a fault, because it is waiting for a
lock, there could be a deadlock. However, the BQL is not applicable here,
because any other guest memory access while holding the BQL would already
result in a deadlock.
Nothing else in the kernel should block forever and wait for userspace
intervention.
Note: pause_all_vcpus()/resume_all_vcpus() or
start_exclusive()/end_exclusive() cannot be used, as they either drop
the BQL or require to be called without the BQL - something inhibitors
cannot handle. We need a low-level locking mechanism that is
deadlock-free even when not releasing the BQL.
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Tested-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Message-Id: <20221111154758.1372674-4-eesposit@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Using the new accel-blocker API, mark where ioctls are being called
in KVM. Next, we will implement the critical section that will take
care of performing memslots modifications atomically, therefore
preventing any new ioctl from running and allowing the running ones
to finish.
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Message-Id: <20221111154758.1372674-3-eesposit@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This API allows the accelerators to prevent vcpus from issuing
new ioctls while execting a critical section marked with the
accel_ioctl_inhibit_begin/end functions.
Note that all functions submitting ioctls must mark where the
ioctl is being called with accel_{cpu_}ioctl_begin/end().
This API requires the caller to always hold the BQL.
API documentation is in sysemu/accel-blocker.h
Internally, it uses a QemuLockCnt together with a per-CPU QemuLockCnt
(to minimize cache line bouncing) to keep avoid that new ioctls
run when the critical section starts, and a QemuEvent to wait
that all running ioctls finish.
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20221111154758.1372674-2-eesposit@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* Always send errors to logfile when daemonized (Greg)
* Add support for IDE CompactFlash card (Lubomir)
* First round of build system cleanups (myself)
* First round of feature removals (myself)
* Reduce "qemu/accel.h" inclusion (Philippe)
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmO3Ym0UHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroNYmwf+LHEw+4T0fk1+2NfgIzH3+8s1EqDm
Ai56EjxO/p5NUptflXAnhn4P3LawswmmNE0ZIFFFBgwG5E9L+Jj/u5efuLu4uYPg
bboEBDn8nxSNN2l08u9TyS6kSWSxbwwrs7i2+V+4uQIlVIcCHu+A0vpXns4vWwY0
zZGF8CgJKDQdPIxdXrH8+6/xtadQ8uDkYsAWDiY/nhozCsCUTAZGTXWEQbHJLARI
Z4X+Cmz/NFB9G4ka6K/y0HbQw99KA8G/EMPUSglN0ya10yjpyzrmeI7IlIves+5U
8lhCZXyBhaV9GXlIK1vIgEXlHf83C19a+v0DpW0bpxK631n2VR5y3CArBg==
=2Koq
-----END PGP SIGNATURE-----
Merge tag 'for-upstream' of https://gitlab.com/bonzini/qemu into staging
* Atomic memslot updates for KVM (Emanuele, David)
* Always send errors to logfile when daemonized (Greg)
* Add support for IDE CompactFlash card (Lubomir)
* First round of build system cleanups (myself)
* First round of feature removals (myself)
* Reduce "qemu/accel.h" inclusion (Philippe)
# gpg: Signature made Thu 05 Jan 2023 23:51:09 GMT
# gpg: using RSA key F13338574B662389866C7682BFFBD25F78C7AE83
# gpg: issuer "pbonzini@redhat.com"
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [full]
# gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" [full]
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1
# Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83
* tag 'for-upstream' of https://gitlab.com/bonzini/qemu: (24 commits)
i386: SGX: remove deprecated member of SGXInfo
target/i386: Add SGX aex-notify and EDECCSSA support
util: remove support -chardev tty and -chardev parport
util: remove support for hex numbers with a scaling suffix
KVM: remove support for kernel-irqchip=off
docs: do not talk about past removal as happening in the future
meson: accept relative symlinks in "meson introspect --installed" data
meson: cleanup compiler detection
meson: support meson 0.64 -Doptimization=plain
configure: test all warnings
tests/qapi-schema: remove Meson workaround
meson: cleanup dummy-cpus.c rules
meson: tweak hardening options for Windows
configure: remove backwards-compatibility and obsolete options
configure: preserve qemu-ga variables
configure: cleanup $cpu tests
configure: remove dead function
configure: remove useless write_c_skeleton
ide: Add "ide-cf" driver, a CompactFlash card
ide: Add 8-bit data mode
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Now that qtest is available on all targets including Windows, dummy-cpus.c
is included unconditionally in the build. It also does not need to be
compiled per-target.
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
As in page_get_flags, we need to try again with the mmap
lock held if we fail a page lookup.
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Because we allow lockless lookups, we have to be careful
when it is freed. Use rcu to delay the free until safe.
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
When called from syscall(), we are not within a TB and pc == 0.
We can skip the check for invalidating the current TB.
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
We have been allocating a worst case number of arguments
to support calls. Instead, allow the size to vary.
By default leave space for 4 args, to maximize reuse,
but allow calls to increase the number of args to 32.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
[PMD: Split patch in two]
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20221218211832.73312-3-philmd@linaro.org>
In order to have variable size allocated TCGOp, pass the number
of arguments we use (and would allocate) up to tcg_op_alloc().
This alters tcg_emit_op(), tcg_op_insert_before() and
tcg_op_insert_after() prototypes.
In tcg_op_alloc() ensure the number of arguments is in range.
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
[PMD: Extracted from bigger patch]
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20221218211832.73312-2-philmd@linaro.org>
Better to re-use the existing function for copying ops.
Acked-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
We copied all of the arguments in copy_op_nocheck.
We only need to replace the one argument that we change.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
The function pointer is immediately after the output and input
operands; no need to search.
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Narrow the scope of the lock to the actual read/write,
moving the cpu_transation_failed call outside the lock.
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Only the system emulation part of TB maintainance uses the
page_collection structure. Restrict its declaration (and the
functions requiring it) to tb-maint.c.
Convert the 'len' argument of tb_invalidate_phys_page_fast__locked()
from signed to unsigned.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20221209093649.43738-6-philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20221209093649.43738-5-philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Emphasize this function is called with pages locked.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20221209093649.43738-4-philmd@linaro.org>
[rth: Use "__locked" suffix, to match other instances.]
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Commit d9bb58e510 ("tcg: move tcg related files into accel/tcg/
subdirectory") introduced accel/tcg/trace-events, so we don't
need to use the root trace-events anymore.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20221209093649.43738-3-philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Missed in commit 6526919224 ("accel/tcg: Restrict cpu_io_recompile()
from other accelerators").
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20221209093649.43738-2-philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
The only thing that still touches PageDesc in translate-all.c
are some locking routines related to tb-maint.c which have not
yet been moved. Do so now.
Move some code up in tb-maint.c as well, to untangle the maze
of ifdefs, and allow a sensible final ordering.
Move some declarations from exec/translate-all.h to internal.h,
as they are only used within accel/tcg/.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Now that PageDesc is not used for user-only, and for system
it is only used for tb maintenance, move the implementation
into tb-main.c appropriately ifdefed.
We have not yet eliminated all references to PageDesc for
user-only, so retain a typedef to the structure without definition.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
This page tracking implementation is specific to user-only,
since the system softmmu version is in cputlb.c. Move it
out of translate-all.c to user-exec.c.
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Make bsd-user match linux-user in not marking host pages
as reserved. This isn't especially effective anyway, as
it doesn't take into account any heap memory that qemu
may allocate after startup.
Reviewed-by: Warner Losh <imp@bsdimp.com>
Tested-by: Warner Losh <imp@bsdimp.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Continue weaning user-only away from PageDesc.
Use an interval tree to record target data.
Chunk the data, to minimize allocation overhead.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Begin weaning user-only away from PageDesc.
Since, for user-only, all TB (and page) manipulation is done with
a single mutex, and there is no virtual/physical discontinuity to
split a TB across discontinuous pages, place all of the TBs into
a single IntervalTree. This makes it trivial to find all of the
TBs intersecting a range.
Retain the existing PageDesc + linked list implementation for
system mode. Move the portion of the implementation that overlaps
the new user-only code behind the common ifdef.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Rename to tb_remove_all, to remove the PageDesc "page" from the name,
and to avoid suggesting a "flush" in the icache sense.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Commit 012d4c96e2 changed the visitor functions taking Error ** to
return bool instead of void, and the commits following it used the new
return value to simplify error checking. Since then a few more uses
in need of the same treatment crept in. Do that. All pretty
mechanical except for
* balloon_stats_get_all()
This is basically the same transformation commit 012d4c96e2 applied
to the virtual walk example in include/qapi/visitor.h.
* set_max_queue_size()
Additionally replace "goto end of function" by return.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20221121085054.683122-10-armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Isolate the code protected by setjmp. Fixes:
translate-all.c: In function ‘tb_gen_code’:
translate-all.c:748:51: error: argument ‘cflags’ might be clobbered by ‘longjmp’ or ‘vfork’ [-Werror=clobbered]
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
if QEMU is configured with modules enabled, it is possible that the
load of an accelerator module will fail.
Exit in this case, relying on module_object_class_by_name to report
the specific load error if any.
Signed-off-by: Claudio Fontana <cfontana@suse.de>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
[claudio: changed abort() to exit(1)]
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20220929093035.4231-6-cfontana@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Delay cpu_list_add until realize is complete, so that cross-cpu
interaction does not happen with incomplete cpu state. For this,
we must delay plugin initialization out of tcg_exec_realizefn,
because no cpu_index has been assigned.
Fixes a problem with cross-cpu jump cache flushing, when the
jump cache has not yet been allocated.
Fixes: a976a99a29 ("include/hw/core: Create struct CPUJumpCache")
Acked-by: Ilya Leoshkevich <iii@linux.ibm.com>
Reported-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
The value passed is always true, and if the target's
synchronize_from_tb hook is non-trivial, not exiting
may be erroneous.
Reviewed-by: Claudio Fontana <cfontana@suse.de>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Add a way to examine the unwind data without actually
restoring the data back into env.
Reviewed-by: Claudio Fontana <cfontana@suse.de>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Currently signal SIGIPI [=SIGUSR1] is used to kick the dummy CPU
when qtest accelerator is used. However SIGUSR1 is unsupported on
Windows. To support Windows, we add a QemuSemaphore CPUState::sem
to kick the dummy CPU instead for Windows.
Signed-off-by: Xuzhou Cheng <xuzhou.cheng@windriver.com>
Signed-off-by: Bin Meng <bin.meng@windriver.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20221028045736.679903-2-bin.meng@windriver.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
All targets have been updated. Use the tcg_ops target hook
exclusively, which allows the compat code to be removed.
Reviewed-by: Claudio Fontana <cfontana@suse.de>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Add a tcg_ops hook to replace the restore_state_to_opc
function call. Because these generic hooks cannot depend
on target-specific types, temporarily, copy the current
target_ulong data[] into uint64_t d64[].
Reviewed-by: Claudio Fontana <cfontana@suse.de>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Since the only user, Arm MTE, always requires allocation,
merge the get and alloc functions to always produce a
non-null result. Also assume that the user has already
checked page validity.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Since "target data" is always user-only, move it out of
translate-all.c to user-exec.c.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Flush translation blocks in bulk, rather than page-by-page.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Use the existing function for clearing target data.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
When PAGE_RESET is set, we are replacing pages with new
content, which means that we need to invalidate existing
cached data, such as TranslationBlocks. Perform the
reset invalidate while we're doing other invalidates,
which allows us to remove the separate invalidates from
the user-only mmap/munmap/mprotect routines.
In addition, restrict invalidation to PAGE_EXEC pages.
Since cdf7130851, we have validated PAGE_EXEC is present
before translation, which means we can assume that if the
bit is not present, there are no translations to invalidate.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
We do not require detection of overlapping TBs here,
so use the more appropriate function.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
We missed this function when we introduced tb_page_addr_t.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
This function is is never called with a real range,
only for a single page. Drop the second parameter
and rename to tb_invalidate_phys_page.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Rename to tb_invalidate_phys_page_unwind to emphasize that
we also detect invalidating the current TB, and also to free
up that name for other usage.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
This data structure will be replaced for user-only: add accessors.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
When we added the fast path, we initialized page_addr[] early.
These stores in and around tb_page_add() are redundant; remove them.
Fixes: 50627f1b7b ("accel/tcg: Add fast path for translator_ld*")
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>