Commit Graph

91569 Commits

Author SHA1 Message Date
Richard Henderson c8e7fef102 target/s390x: Implement s390_cpu_record_sigsegv
Move the masking of the address from cpu_loop into
s390_cpu_record_sigsegv -- this is governed by hw, not linux.
This does mean we have to raise our own exception, rather
than return to the fallback.

Use maperr to choose between PGM_PROTECTION and PGM_ADDRESSING.
Use the appropriate si_code for each in cpu_loop.

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-11-02 07:00:52 -04:00
Richard Henderson db9aab5783 target/s390x: Use probe_access_flags in s390_probe_access
Not sure why the user-only code wasn't rewritten to use
probe_access_flags at the same time that the sysemu code
was converted.  For the purpose of user-only, this is an
exact replacement.

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-11-02 07:00:52 -04:00
Richard Henderson 263e2ab20c target/riscv: Make riscv_cpu_tlb_fill sysemu only
The fallback code in cpu_loop_exit_sigsegv is sufficient
for riscv linux-user.

Remove the code from cpu_loop that raised SIGSEGV.

Reviewed-by: Warner Losh <imp@bsdimp.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-11-02 07:00:52 -04:00
Richard Henderson 1db8af5c87 target/ppc: Implement ppc_cpu_record_sigsegv
Record DAR, DSISR, and exception_index.  That last means
that we must exit to cpu_loop ourselves, instead of letting
exception_index being overwritten.

This is exactly what the user-mode ppc_cpu_tlb_fill does,
so simply rename it as ppc_cpu_record_sigsegv.

Reviewed-by: Warner Losh <imp@bsdimp.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-11-02 07:00:52 -04:00
Richard Henderson 12f0bc5579 target/openrisc: Make openrisc_cpu_tlb_fill sysemu only
The fallback code in cpu_loop_exit_sigsegv is sufficient for
openrisc linux-user.

This makes all of the code in mmu.c sysemu only, so remove
the ifdefs and move the file to openrisc_softmmu_ss.
Remove the code from cpu_loop that handled EXCP_DPF.

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-11-02 07:00:52 -04:00
Richard Henderson d315712b69 linux-user/openrisc: Abort for EXCP_RANGE, EXCP_FPE
QEMU does not allow the system control bits for either exception to
be enabled in linux-user, therefore both exceptions are dead code.

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-11-02 07:00:52 -04:00
Richard Henderson fac94cb36d target/nios2: Implement nios2_cpu_record_sigsegv
Because the linux-user kuser page handling is currently implemented
by detecting magic addresses in the unnamed 0xaa trap, we cannot
simply remove nios2_cpu_tlb_fill and rely on the fallback code.

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-11-02 07:00:52 -04:00
Richard Henderson 52d4899bf3 target/mips: Make mips_cpu_tlb_fill sysemu only
The fallback code in cpu_loop_exit_sigsegv is sufficient
for mips linux-user.

This means we can remove tcg/user/tlb_helper.c entirely.
Remove the code from cpu_loop that raised SIGSEGV.

Reviewed-by: Warner Losh <imp@bsdimp.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-11-02 07:00:52 -04:00
Richard Henderson fd297732a2 target/microblaze: Make mb_cpu_tlb_fill sysemu only
The fallback code in cpu_loop_exit_sigsegv is sufficient
for microblaze linux-user.

Remove the code from cpu_loop that handled the unnamed 0xaa exception.

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-11-02 07:00:52 -04:00
Richard Henderson 028772c45c target/m68k: Make m68k_cpu_tlb_fill sysemu only
The fallback code in cpu_loop_exit_sigsegv is sufficient
for m68k linux-user.

Remove the code from cpu_loop that handled EXCP_ACCESS.

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-11-02 07:00:52 -04:00
Richard Henderson f74bd157c6 target/i386: Implement x86_cpu_record_sigsegv
Record cr2, error_code, and exception_index.  That last means
that we must exit to cpu_loop ourselves, instead of letting
exception_index being overwritten.

Use the maperr parameter to properly set PG_ERROR_P_MASK.

Reviewed by: Warner Losh <imp@bsdimp.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-11-02 07:00:52 -04:00
Richard Henderson 860e0b965b target/hppa: Make hppa_cpu_tlb_fill sysemu only
The fallback code in cpu_loop_exit_sigsegv is sufficient
for hppa linux-user.

Remove the code from cpu_loop that raised SIGSEGV.
This makes all of the code in mem_helper.c sysemu only,
so remove the ifdefs and move the file to hppa_softmmu_ss.

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-11-02 07:00:52 -04:00
Richard Henderson 70863887a8 target/hexagon: Remove hexagon_cpu_tlb_fill
The fallback code in cpu_loop_exit_sigsegv is sufficient
for hexagon linux-user.

Remove the code from cpu_loop that raises SIGSEGV.

Reviewed-by: Taylor Simpson <tsimpson@quicinc.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-11-02 07:00:52 -04:00
Richard Henderson 5753605412 target/cris: Make cris_cpu_tlb_fill sysemu only
The fallback code in cpu_loop_exit_sigsegv is sufficient
for cris linux-user.

Remove the code from cpu_loop that handled the unnamed 0xaa exception.
This makes all of the code in helper.c sysemu only, so remove the
ifdefs and move the file to cris_softmmu_ss.

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-11-02 07:00:52 -04:00
Richard Henderson 9b12b6b442 target/arm: Implement arm_cpu_record_sigsegv
Because of the complexity of setting ESR, continue to use
arm_deliver_fault.  This means we cannot remove the code
within cpu_loop that decodes EXCP_DATA_ABORT and
EXCP_PREFETCH_ABORT.

But using the new hook means that we don't have to do the
page_get_flags check manually, and we'll be able to restrict
the tlb_fill hook to sysemu later.

Reviewed-by: Warner Losh <imp@bsdimp.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-11-02 07:00:52 -04:00
Richard Henderson 5e98763c0e target/arm: Use cpu_loop_exit_sigsegv for mte tag lookup
Use the new os interface for raising the exception,
rather than calling arm_cpu_tlb_fill directly.

Reviewed-by: Warner Losh <imp@bsdimp.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-11-02 07:00:52 -04:00
Richard Henderson 90113883af target/alpha: Implement alpha_cpu_record_sigsegv
Record trap_arg{0,1,2} for the linux-user signal frame.

Fill in the stores to trap_arg{1,2} that were missing
from the previous user-only alpha_cpu_tlb_fill function.
Use maperr to simplify computation of trap_arg1.

Remove the code for EXCP_MMFAULT from cpu_loop, as
that part is now handled by cpu_loop_exit_sigsegv.

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-11-02 07:00:52 -04:00
Richard Henderson 72d2bbf9ff linux-user: Add cpu_loop_exit_sigsegv
This is a new interface to be provided by the os emulator for
raising SIGSEGV on fault.  Use the new record_sigsegv target hook.

Reviewed by: Warner Losh <imp@bsdimp.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-11-02 07:00:52 -04:00
Richard Henderson 09e94676ad hw/core: Add TCGCPUOps.record_sigsegv
Add a new user-only interface for updating cpu state before
raising a signal.  This will replace tlb_fill for user-only
and should result in less boilerplate for each guest.

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-11-02 07:00:52 -04:00
Richard Henderson 04de121aaf linux-user/signal: Drop HOST_SIGNAL_PLACEHOLDER
Now that all of the linux-user hosts have been converted
to host-signal.h, drop the compatibility code.

Reviewed by: Warner Losh <imp@bsdimp.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-11-02 07:00:52 -04:00
Richard Henderson 4f3bbd9cfb linux-user/host/riscv: Improve host_signal_write
Do not read 4 bytes before we determine the size of the insn.
Simplify triple switches in favor of checking major opcodes.
Include the missing cases of compact fsd and fsdsp.

Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-11-02 07:00:52 -04:00
Richard Henderson 7ce8e389ef target/arm: Fixup comment re handle_cpu_signal
The named function no longer exists.
Refer to host_signal_handler instead.

Reviewed-by: Warner Losh <imp@bsdimp.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-11-02 07:00:52 -04:00
Richard Henderson 97be8c6a95 linux-user/host/riscv: Populate host_signal.h
Split host_signal_pc and host_signal_write out of user-exec.c.

Reviewed-by: Warner Losh <imp@bsdimp.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-11-02 07:00:51 -04:00
Richard Henderson b12161120a linux-user/host/mips: Populate host_signal.h
Split host_signal_pc and host_signal_write out of user-exec.c.

Reviewed-by: Warner Losh <imp@bsdimp.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-11-02 07:00:50 -04:00
Richard Henderson 66ee11d407 linux-user/host/s390: Populate host_signal.h
Split host_signal_pc and host_signal_write out of user-exec.c.

Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-11-02 07:00:46 -04:00
Richard Henderson cf5f42fd07 linux-user/host/aarch64: Populate host_signal.h
Split host_signal_pc and host_signal_write out of user-exec.c.
Drop the *BSD code, to be re-created under bsd-user/ later.

Reviewed-by: Warner Losh <imp@bsdimp.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-11-02 07:00:45 -04:00
Richard Henderson a30bfaa7bd linux-user/host/arm: Populate host_signal.h
Split host_signal_pc and host_signal_write out of user-exec.c.
Drop the *BSD code, to be re-created under bsd-user/ later.

Reviewed-by: Warner Losh <imp@bsdimp.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-11-02 07:00:44 -04:00
Richard Henderson 8b5bd46193 linux-user/host/sparc: Populate host_signal.h
Split host_signal_pc and host_signal_write out of user-exec.c.
Drop the *BSD code, to be re-created under bsd-user/ later.
Drop the Solaris code as completely unused.

Reviewed-by: Warner Losh <imp@bsdimp.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-11-02 07:00:43 -04:00
Richard Henderson 44c8f2cd90 linux-user/host/alpha: Populate host_signal.h
Split host_signal_pc and host_signal_write out of user-exec.c.

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-11-02 07:00:42 -04:00
Richard Henderson 8cc7b85d56 linux-user/host/ppc: Populate host_signal.h
Split host_signal_pc and host_signal_write out of user-exec.c.
Drop the *BSD code, to be re-created under bsd-user/ later.

Reviewed-by: Warner Losh <imp@bsdimp.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-11-02 07:00:41 -04:00
Richard Henderson 85442fce49 linux-user/host/x86: Populate host_signal.h
Split host_signal_pc and host_signal_write out of user-exec.c.
Drop the *BSD code, to be re-created under bsd-user/ later.

Reviewed-by: Warner Losh <imp@bsdimp.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-11-02 07:00:38 -04:00
Richard Henderson 8cb41fda78 Machine core patches
- Move GPIO code out of qdev.c
 - Move hotplug code out of qdev.c
 - Restrict various files to sysemu
 - Move SMP code out of machine.c
 - Add SMP parsing unit tests
 - Move dynamic sysbus device check earlier
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE+qvnXhKRciHc/Wuy4+MsLN6twN4FAmGANZAACgkQ4+MsLN6t
 wN6Afw/+OnBSs8ZjSc/01nWX/lxk0lgS9np69gA6+sfJI7tuYimXUnfU6ZAv4laF
 65dPCAZ4F4nGzo10pQqnpeez+1d6YfxVniPvTO0Z75HiUR6L7E5B43f16HJ2U+zG
 qjXCP45uYH5u/jgzIQ//BeKiPE2hwr7XM6hzJb9DiUPyaLIETyjnzvARSyIx202a
 p/lP6qTsjZvGwt21rAurqoulmwrkbDh6v7znmNUw8siwkGuKlGsvhkJuENZlbRPA
 J3pGWm2d/Xvs8+fZEzRdg+LjFTt6jBdPm1IkSh4G2e2Nu3qtDrlxsg97W0D84sGy
 Fz0JlgVMF/aw+ExSImWc6Dnq26tdCixaZIQDgvBPNFop1SLD9wY7L4VKPdlly7bh
 a1YabtaYqfjS7R0h2sAX7FEBSoAzxa1Nrr2/pB2aCziutVCLsu5Yz87hk8C08oF7
 RyuSsYixZWtVMpmOaxOS9swYviM5OhFQkX6N6cew0d/0/D3ZRyAbr3XMcftqz/FW
 SO6n2ZMXcFQiHFZei39ZV72jx3lV9uAunADKAcSVVCaXfpyXVgGLSOxZ7mhOfcgd
 z63vpAqHhiTp82pzygArz1Lmr9OlhOi5TBt6gQ9dZvjCDRG3yoodVRJIEkMZLYEf
 kaFQ9WgB/apCG49tvjTFSZJvySgzN+aGiD6BriOGSnoYu5Gs3wU=
 =NzoC
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/philmd/tags/machine-20211101' into staging

Machine core patches

- Move GPIO code out of qdev.c
- Move hotplug code out of qdev.c
- Restrict various files to sysemu
- Move SMP code out of machine.c
- Add SMP parsing unit tests
- Move dynamic sysbus device check earlier

# gpg: Signature made Mon 01 Nov 2021 02:44:32 PM EDT
# gpg:                using RSA key FAABE75E12917221DCFD6BB2E3E32C2CDEADC0DE
# gpg: Good signature from "Philippe Mathieu-Daudé (F4BUG) <f4bug@amsat.org>" [full]

* remotes/philmd/tags/machine-20211101:
  machine: remove the done notifier for dynamic sysbus device type check
  qdev-monitor: Check sysbus device type before creating it
  machine: add device_type_is_dynamic_sysbus function
  tests/unit: Add an unit test for smp parsing
  hw/core/machine: Split out the smp parsing code
  hw/core: Restrict hotplug to system emulation
  hw/core: Extract hotplug-related functions to qdev-hotplug.c
  hw/core: Declare meson source set
  hw/core: Restrict sysemu specific files
  machine: Move gpio code to hw/core/gpio.c

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-11-02 05:53:45 -04:00
Pavel Dovgalyuk d99e8b5fcb hw/i386: fix vmmouse registration
According to the logic of vmmouse_update_handler function,
vmmouse should be registered as an event handler when
it's status is zero.
vmmouse_read_id resets the status but does not register
the handler.
This patch adds vmmouse registration and activation when
status is reset.

Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgalyuk@ispras.ru>
Message-Id: <163524204515.1914131.16465061981774791228.stgit@pasha-ThinkPad-X280>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-11-01 19:36:11 -04:00
Peter Xu 2914fc61d5 pci: Export pci_for_each_device_under_bus*()
They're actually more commonly used than the helper without _under_bus, because
most callers do have the pci bus on hand.  After exporting we can switch a lot
of the call sites to use these two helpers.

Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20211028043129.38871-3-peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
2021-11-01 19:36:11 -04:00
Peter Xu b3dcf94f77 pci: Define pci_bus_dev_fn/pci_bus_fn/pci_bus_ret_fn
They're used in quite a few places of pci.[ch] and also in the rest of the code
base.  Define them so that it doesn't need to be defined all over the places.

The pci_bus_fn is similar to pci_bus_dev_fn that only takes a PCIBus* and an
opaque.  The pci_bus_ret_fn is similar to pci_bus_fn but it allows to return a
void* pointer.

Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20211028043129.38871-2-peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-11-01 19:36:11 -04:00
Jean-Philippe Brucker 36efa250a4 hw/i386/pc: Allow instantiating a virtio-iommu device
Allow instantiating a virtio-iommu device by adding an ACPI Virtual I/O
Translation table (VIOT), which describes the relation between the
virtio-iommu and the endpoints it manages.

Add a hotplug handler for virtio-iommu on x86 and set the necessary
reserved region property. On x86, the [0xfee00000, 0xfeefffff] DMA
region is reserved for MSIs. DMA transactions to this range either
trigger IRQ remapping in the IOMMU or bypasses IOMMU translation.

Although virtio-iommu does not support IRQ remapping it must be informed
of the reserved region so that it can forward DMA transactions targeting
this region.

Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Message-Id: <20211026182024.2642038-5-jean-philippe@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-11-01 18:49:10 -04:00
Jean-Philippe Brucker 1b3bf13890 hw/i386/pc: Move IOMMU singleton into PCMachineState
We're about to support a third vIOMMU for x86, virtio-iommu which
doesn't inherit X86IOMMUState. Move the IOMMU singleton into
PCMachineState, so it can be shared between all three vIOMMUs.

The x86_iommu_get_default() helper is still needed by KVM and IOAPIC to
fetch the default IRQ-remapping IOMMU. Since virtio-iommu doesn't
support IRQ remapping, this interface doesn't need to change for the
moment. We could later replace X86IOMMUState with an "IRQ remapping
IOMMU" interface if necessary.

Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Message-Id: <20211026182024.2642038-4-jean-philippe@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-11-01 18:49:10 -04:00
Jean-Philippe Brucker 867e9c9f4c hw/i386/pc: Remove x86_iommu_get_type()
To generate the IOMMU ACPI table, acpi-build.c can use base QEMU types
instead of a special IommuType value.

Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Message-Id: <20211026182024.2642038-3-jean-philippe@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-11-01 18:49:10 -04:00
Jean-Philippe Brucker 1f85d74ac5 hw/acpi: Add VIOT table
Add a function that generates a Virtual I/O Translation table (VIOT),
describing the topology of paravirtual IOMMUs. The table is created if a
virtio-iommu device is present. It contains a virtio-iommu node and PCI
Range nodes for endpoints managed by the IOMMU. By default, a single
node describes all PCI devices. When passing the
"default_bus_bypass_iommu" machine option and "bypass_iommu" PXB option,
only buses that do not bypass the IOMMU are described by PCI Range
nodes.

Reviewed-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Message-Id: <20211026182024.2642038-2-jean-philippe@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-11-01 18:49:10 -04:00
David Hildenbrand e1c1915bef vhost-vdpa: Set discarding of RAM broken when initializing the backend
Similar to VFIO, vDPA will go ahead an map+pin all guest memory. Memory
that used to be discarded will get re-populated and if we
discard+re-access memory after mapping+pinning, the pages mapped into the
vDPA IOMMU will go out of sync with the actual pages mapped into the user
space page tables.

Set discarding of RAM broken such that:
- virtio-mem and vhost-vdpa run mutually exclusive
- virtio-balloon is inhibited and no memory discards will get issued

In the future, we might be able to support coordinated discarding of RAM
as used by virtio-mem and already supported by vfio via the
RamDiscardManager.

Acked-by: Jason Wang <jasowang@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Cindy Lu <lulu@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20211027130324.59791-1-david@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
2021-11-01 18:49:10 -04:00
Igor Mammedov 44d3d89814 qtest: fix 'expression is always false' build failure in qtest_has_accel()
If KVM is disabled or not present, qtest library build
may fail with:
   libqtest.c: In function 'qtest_has_accel':
      comparison of unsigned expression < 0 is always false
      [-Werror=type-limits]
         for (i = 0; i < ARRAY_SIZE(targets); i++) {

due to empty 'targets' array.
Fix it by making sure that CONFIG_KVM_TARGETS isn't empty.

Fixes: e741aff0f4 ("tests: qtest: add qtest_has_accel() to check if tested binary supports accelerator")
Reported-by: Jason Andryuk <jandryuk@gmail.com>
Suggested-by: "Michael S. Tsirkin" <mst@redhat.com>
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20211027151012.2639284-1-imammedo@redhat.com>
Tested-by: Jason Andryuk <jandryuk@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-11-01 18:49:10 -04:00
Hyman Huang(黄勇) 826b8bc80c migration/dirtyrate: implement dirty-bitmap dirtyrate calculation
introduce dirty-bitmap mode as the third method of calc-dirty-rate.
implement dirty-bitmap dirtyrate calculation, which can be used
to measuring dirtyrate in the absence of dirty-ring.

introduce "dirty_bitmap:-b" option in hmp calc_dirty_rate to
indicate dirty bitmap method should be used for calculation.

Signed-off-by: Hyman Huang(黄勇) <huangy81@chinatelecom.cn>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2021-11-01 22:56:44 +01:00
Hyman Huang(黄勇) 4998a37e4b memory: introduce total_dirty_pages to stat dirty pages
introduce global var total_dirty_pages to stat dirty pages
along with memory_global_dirty_log_sync.

Signed-off-by: Hyman Huang(黄勇) <huangy81@chinatelecom.cn>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2021-11-01 22:56:44 +01:00
David Hildenbrand 6fee3a1fd9 migration/ram: Handle RAMBlocks with a RamDiscardManager on background snapshots
We already don't ever migrate memory that corresponds to discarded ranges
as managed by a RamDiscardManager responsible for the mapped memory region
of the RAMBlock.

virtio-mem uses this mechanism to logically unplug parts of a RAMBlock.
Right now, we still populate zeropages for the whole usable part of the
RAMBlock, which is undesired because:

1. Even populating the shared zeropage will result in memory getting
   consumed for page tables.
2. Memory backends without a shared zeropage (like hugetlbfs and shmem)
   will populate an actual, fresh page, resulting in an unintended
   memory consumption.

Discarded ("logically unplugged") parts have to remain discarded. As
these pages are never part of the migration stream, there is no need to
track modifications via userfaultfd WP reliably for these parts.

Further, any writes to these ranges by the VM are invalid and the
behavior is undefined.

Note that Linux only supports userfaultfd WP on private anonymous memory
for now, which usually results in the shared zeropage getting populated.
The issue will become more relevant once userfaultfd WP supports shmem
and hugetlb.

Acked-by: Peter Xu <peterx@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2021-11-01 22:56:44 +01:00
David Hildenbrand f7b9dcfbcf migration/ram: Factor out populating pages readable in ram_block_populate_pages()
Let's factor out prefaulting/populating to make further changes easier to
review and add a comment what we are actually expecting to happen. While at
it, use the actual page size of the ramblock, which defaults to
qemu_real_host_page_size for anonymous memory. Further, rename
ram_block_populate_pages() to ram_block_populate_read() as well, to make
it clearer what we are doing.

In the future, we might want to use MADV_POPULATE_READ to speed up
population.

Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2021-11-01 22:56:44 +01:00
David Hildenbrand 7648297d40 migration: Simplify alignment and alignment checks
Let's use QEMU_ALIGN_DOWN() and friends to make the code a bit easier to
read.

Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2021-11-01 22:56:44 +01:00
David Hildenbrand 9470c5e082 migration/postcopy: Handle RAMBlocks with a RamDiscardManager on the destination
Currently, when someone (i.e., the VM) accesses discarded parts inside a
RAMBlock with a RamDiscardManager managing the corresponding mapped memory
region, postcopy will request migration of the corresponding page from the
source. The source, however, will never answer, because it refuses to
migrate such pages with undefined content ("logically unplugged"): the
pages are never dirty, and get_queued_page() will consequently skip
processing these postcopy requests.

Especially reading discarded ("logically unplugged") ranges is supposed to
work in some setups (for example with current virtio-mem), although it
barely ever happens: still, not placing a page would currently stall the
VM, as it cannot make forward progress.

Let's check the state via the RamDiscardManager (the state e.g.,
of virtio-mem is migrated during precopy) and avoid sending a request
that will never get answered. Place a fresh zero page instead to keep
the VM working. This is the same behavior that would happen
automatically without userfaultfd being active, when accessing virtual
memory regions without populated pages -- "populate on demand".

For now, there are valid cases (as documented in the virtio-mem spec) where
a VM might read discarded memory; in the future, we will disallow that.
Then, we might want to handle that case differently, e.g., warning the
user that the VM seems to be mis-behaving.

Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2021-11-01 22:56:44 +01:00
David Hildenbrand f4578df399 virtio-mem: Drop precopy notifier
Migration code now properly handles RAMBlocks which are indirectly managed
by a RamDiscardManager. No need for manual handling via the free page
optimization interface, let's get rid of it.

Acked-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Peter Xu <peterx@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2021-11-01 22:56:44 +01:00
David Hildenbrand be39b4cd20 migration/ram: Handle RAMBlocks with a RamDiscardManager on the migration source
We don't want to migrate memory that corresponds to discarded ranges as
managed by a RamDiscardManager responsible for the mapped memory region of
the RAMBlock. The content of these pages is essentially stale and
without any guarantees for the VM ("logically unplugged").

Depending on the underlying memory type, even reading memory might populate
memory on the source, resulting in an undesired memory consumption. Of
course, on the destination, even writing a zeropage consumes memory,
which we also want to avoid (similar to free page hinting).

Currently, virtio-mem tries achieving that goal (not migrating "unplugged"
memory that was discarded) by going via qemu_guest_free_page_hint() - but
it's hackish and incomplete.

For example, background snapshots still end up reading all memory, as
they don't do bitmap syncs. Postcopy recovery code will re-add
previously cleared bits to the dirty bitmap and migrate them.

Let's consult the RamDiscardManager after setting up our dirty bitmap
initially and when postcopy recovery code reinitializes it: clear
corresponding bits in the dirty bitmaps (e.g., of the RAMBlock and inside
KVM). It's important to fixup the dirty bitmap *after* our initial bitmap
sync, such that the corresponding dirty bits in KVM are actually cleared.

As colo is incompatible with discarding of RAM and inhibits it, we don't
have to bother.

Note: if a misbehaving guest would use discarded ranges after migration
started we would still migrate that memory: however, then we already
populated that memory on the migration source.

Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2021-11-01 22:56:44 +01:00
David Hildenbrand 372aa6fd73 virtio-mem: Implement replay_discarded RamDiscardManager callback
Implement it similar to the replay_populated callback.

Acked-by: Peter Xu <peterx@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2021-11-01 22:56:44 +01:00