Commit Graph

93 Commits

Author SHA1 Message Date
Philippe Mathieu-Daudé
97e0310601 hw/core: Declare CPUArchId::cpu as CPUState instead of Object
Do not accept any Object for CPUArchId::cpu field,
restrict it to CPUState type.

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20240129164514.73104-3-philmd@linaro.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
2024-03-12 11:46:16 +01:00
Bernhard Beschow
9b0c44334c hw/i386/x86: Let ioapic_init_gsi() take parent as pointer
Rather than taking a QOM name which has to be resolved, let's pass the parent
directly as pointer. This simplifies the code.

Signed-off-by: Bernhard Beschow <shentey@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Message-ID: <20240224135851.100361-2-shentey@gmail.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
2024-02-27 09:37:30 +01:00
Bernhard Beschow
c2e6d7d8e7 hw/i386/x86: Fix PIC interrupt handling if APIC is globally disabled
QEMU populates the apic_state attribute of x86 CPUs if supported by real
hardware or if SMP is active. When handling interrupts, it just checks whether
apic_state is populated to route the interrupt to the PIC or to the APIC.
However, chapter 10.4.3 of [1] requires that:

  When IA32_APIC_BASE[11] is 0, the processor is functionally equivalent to an
  IA-32 processor without an on-chip APIC.

This means that when apic_state is populated, QEMU needs to check for the
MSR_IA32_APICBASE_ENABLE flag in addition. Implement this which fixes some
real-world BIOSes.

[1] Intel 64 and IA-32 Architectures Software Developer's Manual, Vol. 3A:
    System Programming Guide, Part 1

Signed-off-by: Bernhard Beschow <shentey@gmail.com>
Message-Id: <20240106132546.21248-3-shentey@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-02-14 06:09:32 -05:00
Bernhard Beschow
f22f3a92eb hw/i386/x86: Reverse if statement
The if statement currently uses double negation when executing the else branch.
So swap the branches and simplify the condition to make the code more
comprehensible.

Signed-off-by: Bernhard Beschow <shentey@gmail.com>
Message-Id: <20240106132546.21248-2-shentey@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-02-14 06:09:32 -05:00
Bui Quang Minh
b5ee0468e9 apic: add support for x2APIC mode
This commit extends the APIC ID to 32-bit long and remove the 255 max APIC
ID limit in userspace APIC. The array that manages local APICs is now
dynamically allocated based on the max APIC ID of created x86 machine.
Also, new x2APIC IPI destination determination scheme, self IPI and x2APIC
mode register access are supported.

Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
Message-Id: <20240111154404.5333-3-minhquangbui99@gmail.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-02-14 06:09:32 -05:00
Daniel Hoffman
c04cfb4596 hw/i386: fix short-circuit logic with non-optimizing builds
`kvm_enabled()` is compiled down to `0` and short-circuit logic is
used to remove references to undefined symbols at the compile stage.
Some build configurations with some compilers don't attempt to
simplify this logic down in some cases (the pattern appears to be
that the literal false must be the first term) and this was causing
some builds to emit references to undefined symbols.

An example of such a configuration is clang 16.0.6 with the following
configure: ./configure --enable-debug --without-default-features
--target-list=x86_64-softmmu --enable-tcg-interpreter

Signed-off-by: Daniel Hoffman <dhoff749@gmail.com>
Message-Id: <20231119203116.3027230-1-dhoff749@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-12-02 15:56:49 -05:00
Ani Sinha
363636787d hw/i386: changes towards enabling -Wshadow=local for x86 machines
Code changes that addresses all compiler complaints coming from enabling
-Wshadow flags. Enabling -Wshadow catches cases of local variables shadowing
other local variables or parameters. These makes the code confusing and/or adds
bugs that are difficult to catch.

See also

   Subject: Help wanted for enabling -Wshadow=local
   Message-Id: <87r0mqlf9x.fsf@pond.sub.org>
   https://lore.kernel.org/qemu-devel/87r0mqlf9x.fsf@pond.sub.org

CC: Markus Armbruster <armbru@redhat.com>
CC: Philippe Mathieu-Daude <philmd@linaro.org>
CC: mst@redhat.com

Signed-off-by: Ani Sinha <anisinha@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Message-ID: <20231003102803.6163-1-anisinha@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2023-10-06 10:56:54 +02:00
Philippe Mathieu-Daudé
ef1cf6890f target/i386: Allow elision of kvm_hv_vpindex_settable()
Call kvm_enabled() before kvm_hv_vpindex_settable()
to let the compiler elide its call.

kvm-stub.c is now empty, remove it.

Suggested-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20230904124325.79040-9-philmd@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-09-07 13:32:37 +02:00
Philippe Mathieu-Daudé
9926cf34de target/i386: Allow elision of kvm_enable_x2apic()
Call kvm_enabled() before kvm_enable_x2apic() to let the compiler elide
its call.  Cleanup the code by simplifying "!xen_enabled() &&
kvm_enabled()" to just "kvm_enabled()".

Suggested-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20230904124325.79040-8-philmd@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-09-07 13:32:37 +02:00
Peter Maydell
66577e9e1c virtio,pc,pci: features, cleanups, fixes
vhost-user support without ioeventfd
 word replacements in vhost user spec
 shpc improvements
 
 cleanups, fixes all over the place
 
 Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmQBO8QPHG1zdEByZWRo
 YXQuY29tAAoJECgfDbjSjVRpMUMH/3/FVp4qaF4CDwCHn7xWFRJpOREIhX/iWfUu
 lGkwxnB7Lfyqdg7i4CAfgMf2emWKZchEE2DamfCo5bIX0IgRU3DWcOdR9ePvJ29J
 cKwIYpxZcB4RYSoWL5OUakQLCT3JOu4XWaXeVjyHABjQhf3lGpwN4KmIOBGOy/N6
 0YHOQScW2eW62wIOwhAEuYQceMt6KU32Uw3tLnMbJliiBf3a/hPctVNM9TFY9pcd
 UYHGfBx/zD45owf1lTVEQFDg0eqPZKWW29g5haiOd5oAyXHHolzu+bt3bU7lH46b
 f7iP12LqDudyrgoF5YWv3NJ4HaGm5V3kPqNqLLF/mjF7alxG+N8=
 =hN3h
 -----END PGP SIGNATURE-----

Merge tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu into staging

virtio,pc,pci: features, cleanups, fixes

vhost-user support without ioeventfd
word replacements in vhost user spec
shpc improvements

cleanups, fixes all over the place

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

# -----BEGIN PGP SIGNATURE-----
#
# iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmQBO8QPHG1zdEByZWRo
# YXQuY29tAAoJECgfDbjSjVRpMUMH/3/FVp4qaF4CDwCHn7xWFRJpOREIhX/iWfUu
# lGkwxnB7Lfyqdg7i4CAfgMf2emWKZchEE2DamfCo5bIX0IgRU3DWcOdR9ePvJ29J
# cKwIYpxZcB4RYSoWL5OUakQLCT3JOu4XWaXeVjyHABjQhf3lGpwN4KmIOBGOy/N6
# 0YHOQScW2eW62wIOwhAEuYQceMt6KU32Uw3tLnMbJliiBf3a/hPctVNM9TFY9pcd
# UYHGfBx/zD45owf1lTVEQFDg0eqPZKWW29g5haiOd5oAyXHHolzu+bt3bU7lH46b
# f7iP12LqDudyrgoF5YWv3NJ4HaGm5V3kPqNqLLF/mjF7alxG+N8=
# =hN3h
# -----END PGP SIGNATURE-----
# gpg: Signature made Fri 03 Mar 2023 00:13:56 GMT
# gpg:                using RSA key 5D09FD0871C8F85B94CA8A0D281F0DB8D28D5469
# gpg:                issuer "mst@redhat.com"
# gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" [full]
# gpg:                 aka "Michael S. Tsirkin <mst@redhat.com>" [full]
# Primary key fingerprint: 0270 606B 6F3C DF3D 0B17  0970 C350 3912 AFBE 8E67
#      Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA  8A0D 281F 0DB8 D28D 5469

* tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu: (53 commits)
  tests/data/acpi/virt: drop (most) duplicate files.
  hw/cxl/mailbox: Use new UUID network order define for cel_uuid
  qemu/uuid: Add UUID static initializer
  qemu/bswap: Add const_le64()
  tests: acpi: Update q35/DSDT.cxl for removed duplicate UID
  hw/i386/acpi: Drop duplicate _UID entry for CXL root bridge
  tests/acpi: Allow update of q35/DSDT.cxl
  hw/cxl: Add CXL_CAPACITY_MULTIPLIER definition
  hw/cxl: set cxl-type3 device type to PCI_CLASS_MEMORY_CXL
  hw/pci-bridge/cxl_downstream: Fix type naming mismatch
  hw/mem/cxl_type3: Improve error handling in realize()
  MAINTAINERS: Add Fan Ni as Compute eXpress Link QEMU reviewer
  intel-iommu: send UNMAP notifications for domain or global inv desc
  smmu: switch to use memory_region_unmap_iommu_notifier_range()
  memory: introduce memory_region_unmap_iommu_notifier_range()
  intel-iommu: fail DEVIOTLB_UNMAP without dt mode
  intel-iommu: fail MAP notifier without caching mode
  memory: Optimize replay of guest mapping
  chardev/char-socket: set s->listener = NULL in char_socket_finalize
  hw/pci: Trace IRQ routing on PCI topology
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2023-03-03 13:35:54 +00:00
Michael S. Tsirkin
167f487358 Revert "hw/i386: pass RNG seed via setup_data entry"
This reverts commit 67f7e426e5.

Additionally to the automatic revert, I went over the code
and dropped all mentions of legacy_no_rng_seed manually,
effectively reverting a combination of 2 additional commits:

    commit ffe2d2382e
    Author: Jason A. Donenfeld <Jason@zx2c4.com>
    Date:   Wed Sep 21 11:31:34 2022 +0200

        x86: re-enable rng seeding via SetupData

    commit 3824e25db1
    Author: Gerd Hoffmann <kraxel@redhat.com>
    Date:   Wed Aug 17 10:39:40 2022 +0200

        x86: disable rng seeding via setup_data

Fixes: 67f7e426e5 ("hw/i386: pass RNG seed via setup_data entry")
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Dov Murik <dovmurik@linux.ibm.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
2023-03-02 03:10:46 -05:00
Michael S. Tsirkin
ae80d81cfa Revert "x86: return modified setup_data only if read as memory, not as file"
This reverts commit e935b73508.

Fixes: e935b73508 ("x86: return modified setup_data only if read as memory, not as file")
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Dov Murik <dovmurik@linux.ibm.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
2023-03-02 03:10:46 -05:00
Michael S. Tsirkin
ea96a78477 Revert "x86: use typedef for SetupData struct"
This reverts commit eebb38a563.

Fixes: eebb38a563 ("x86: use typedef for SetupData struct")
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Dov Murik <dovmurik@linux.ibm.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
2023-03-02 03:10:46 -05:00
Michael S. Tsirkin
fdc27ced04 Revert "x86: reinitialize RNG seed on system reboot"
This reverts commit 763a2828bf.

Fixes: 763a2828bf ("x86: reinitialize RNG seed on system reboot")
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Dov Murik <dovmurik@linux.ibm.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
2023-03-02 03:10:46 -05:00
Michael S. Tsirkin
b4bfa0a31d Revert "x86: re-initialize RNG seed when selecting kernel"
This reverts commit cc63374a5a.

Fixes: cc63374a5a ("x86: re-initialize RNG seed when selecting kernel")
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Dov Murik <dovmurik@linux.ibm.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
2023-03-02 03:10:46 -05:00
Michael S. Tsirkin
ef82d893de Revert "x86: do not re-randomize RNG seed on snapshot load"
This reverts commit 14b29fea74.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Fixes: 14b29fea74 ("x86: do not re-randomize RNG seed on snapshot load")
Tested-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Dov Murik <dovmurik@linux.ibm.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
2023-03-02 03:10:46 -05:00
Michael S. Tsirkin
b34f2fd17e Revert "x86: don't let decompressed kernel image clobber setup_data"
This reverts commit eac7a7791b.

Fixes: eac7a7791b ("x86: don't let decompressed kernel image clobber setup_data")
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Dov Murik <dovmurik@linux.ibm.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
2023-03-02 03:10:46 -05:00
David Woodhouse
4f81baa33e hw/xen: Support GSI mapping to PIRQ
If I advertise XENFEAT_hvm_pirqs then a guest now boots successfully as
long as I tell it 'pci=nomsi'.

[root@localhost ~]# cat /proc/interrupts
           CPU0
  0:         52   IO-APIC   2-edge      timer
  1:         16  xen-pirq   1-ioapic-edge  i8042
  4:       1534  xen-pirq   4-ioapic-edge  ttyS0
  8:          1  xen-pirq   8-ioapic-edge  rtc0
  9:          0  xen-pirq   9-ioapic-level  acpi
 11:       5648  xen-pirq  11-ioapic-level  ahci[0000:00:04.0]
 12:        257  xen-pirq  12-ioapic-edge  i8042
...

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>
2023-03-01 09:09:20 +00:00
Philippe Mathieu-Daudé
2d4bd81e39 hw/rtc: Rename rtc_[get|set]_memory -> mc146818rtc_[get|set]_cmos_data
rtc_get_memory() and rtc_set_memory() helpers only work with
TYPE_MC146818_RTC devices. 'memory' in their name refer to
the CMOS region. Rename them as mc146818rtc_get_cmos_data()
and mc146818rtc_set_cmos_data() to be explicit about what
they are doing.

Mechanical change doing:

  $ sed -i -e 's/rtc_set_memory/mc146818rtc_set_cmos_data/g' \
        $(git grep -wl rtc_set_memory)
  $ sed -i -e 's/rtc_get_memory/mc146818rtc_get_cmos_data/g' \
        $(git grep -wl rtc_get_memory)

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20230210233116.80311-4-philmd@linaro.org>
2023-02-27 22:29:02 +01:00
Philippe Mathieu-Daudé
55c86cb803 hw/rtc/mc146818rtc: Pass MC146818RtcState instead of ISADevice argument
rtc_get_memory() and rtc_set_memory() methods can not take any
TYPE_ISA_DEVICE object. They expect a TYPE_MC146818_RTC one.

Simplify the API by passing a MC146818RtcState.

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20230210233116.80311-3-philmd@linaro.org>
2023-02-27 22:29:02 +01:00
Philippe Mathieu-Daudé
892afa04e6 hw/i386/x86: Reduce init_topo_info() scope
This function is not used anywhere outside this file, so
we can delete the prototype from include/hw/i386/x86.h and
make the function "static void".

This fixes when building with -Wall and using Clang
("Apple clang version 14.0.0 (clang-1400.0.29.202)"):

  ../hw/i386/x86.c:70:24: error: static function 'MACHINE' is used in an inline function with external linkage [-Werror,-Wstatic-in-inline]
      MachineState *ms = MACHINE(x86ms);
                         ^
  include/hw/i386/x86.h:101:1: note: use 'static' to give inline function 'init_topo_info' internal linkage
  void init_topo_info(X86CPUTopoInfo *topo_info, const X86MachineState *x86ms);
  ^
  static
  include/hw/boards.h:24:49: note: 'MACHINE' declared here
  OBJECT_DECLARE_TYPE(MachineState, MachineClass, MACHINE)
                                                  ^

Reported-by: Stefan Weil <sw@weilnetz.de>
Suggested-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20221216220158.6317-6-philmd@linaro.org>
2023-02-27 22:29:01 +01:00
Markus Armbruster
6f1e91f716 error: Drop superfluous #include "qapi/qmp/qerror.h"
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20230207075115.1525-2-armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Konstantin Kostiuk <kkostiuk@redhat.com>
2023-02-23 13:56:14 +01:00
Jason A. Donenfeld
eac7a7791b x86: don't let decompressed kernel image clobber setup_data
The setup_data links are appended to the compressed kernel image. Since
the kernel image is typically loaded at 0x100000, setup_data lives at
`0x100000 + compressed_size`, which does not get relocated during the
kernel's boot process.

The kernel typically decompresses the image starting at address
0x1000000 (note: there's one more zero there than the compressed image
above). This usually is fine for most kernels.

However, if the compressed image is actually quite large, then
setup_data will live at a `0x100000 + compressed_size` that extends into
the decompressed zone at 0x1000000. In other words, if compressed_size
is larger than `0x1000000 - 0x100000`, then the decompression step will
clobber setup_data, resulting in crashes.

Visually, what happens now is that QEMU appends setup_data to the kernel
image:

          kernel image            setup_data
   |--------------------------||----------------|
0x100000                  0x100000+l1     0x100000+l1+l2

The problem is that this decompresses to 0x1000000 (one more zero). So
if l1 is > (0x1000000-0x100000), then this winds up looking like:

          kernel image            setup_data
   |--------------------------||----------------|
0x100000                  0x100000+l1     0x100000+l1+l2

                                 d e c o m p r e s s e d   k e r n e l
                     |-------------------------------------------------------------|
                0x1000000                                                     0x1000000+l3

The decompressed kernel seemingly overwriting the compressed kernel
image isn't a problem, because that gets relocated to a higher address
early on in the boot process, at the end of startup_64. setup_data,
however, stays in the same place, since those links are self referential
and nothing fixes them up.  So the decompressed kernel clobbers it.

Fix this by appending setup_data to the cmdline blob rather than the
kernel image blob, which remains at a lower address that won't get
clobbered.

This could have been done by overwriting the initrd blob instead, but
that poses big difficulties, such as no longer being able to use memory
mapped files for initrd, hurting performance, and, more importantly, the
initrd address calculation is hard coded in qboot, and it always grows
down rather than up, which means lots of brittle semantics would have to
be changed around, incurring more complexity. In contrast, using cmdline
is simple and doesn't interfere with anything.

The microvm machine has a gross hack where it fiddles with fw_cfg data
after the fact. So this hack is updated to account for this appending,
by reserving some bytes.

Fixup-by: Michael S. Tsirkin <mst@redhat.com>
Cc: x86@kernel.org
Cc: Philippe Mathieu-Daudé <philmd@linaro.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Message-Id: <20221230220725.618763-1-Jason@zx2c4.com>
Message-ID: <20230128061015-mutt-send-email-mst@kernel.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Eric Biggers <ebiggers@google.com>
Tested-by: Mathias Krause <minipli@grsecurity.net>
2023-01-28 06:21:29 -05:00
Zeng Guang
19e2a9fb9d target/i386: Set maximum APIC ID to KVM prior to vCPU creation
Specify maximum possible APIC ID assigned for current VM session to KVM
prior to the creation of vCPUs. By this setting, KVM can set up VM-scoped
data structure indexed by the APIC ID, e.g. Posted-Interrupt Descriptor
pointer table to support Intel IPI virtualization, with the most optimal
memory footprint.

It can be achieved by calling KVM_ENABLE_CAP for KVM_CAP_MAX_VCPU_ID
capability once KVM has enabled it. Ignoring the return error if KVM
doesn't support this capability yet.

Signed-off-by: Zeng Guang <guang.zeng@intel.com>
Acked-by: Peter Xu <peterx@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20220825025246.26618-1-guang.zeng@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-31 09:46:34 +01:00
Jason A. Donenfeld
14b29fea74 x86: do not re-randomize RNG seed on snapshot load
Snapshot loading is supposed to be deterministic, so we shouldn't
re-randomize the various seeds used.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Message-id: 20221025004327.568476-4-Jason@zx2c4.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2022-10-27 11:34:31 +01:00
Jason A. Donenfeld
cc63374a5a x86: re-initialize RNG seed when selecting kernel
We don't want it to be possible to re-read the RNG seed after ingesting
it, because this ruins forward secrecy. Currently, however, the setup
data section can just be re-read. Since the kernel is always read after
the setup data, use the selection of the kernel as a trigger to
re-initialize the RNG seed, just like we do on reboot, to preserve
forward secrecy.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Message-Id: <20220922152847.3670513-1-Jason@zx2c4.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-01 21:16:36 +02:00
Jason A. Donenfeld
763a2828bf x86: reinitialize RNG seed on system reboot
Since this is read from fw_cfg on each boot, the kernel zeroing it out
alone is insufficient to prevent it from being used twice. And indeed on
reboot we always want a new seed, not the old one. So re-fill it in this
circumstance.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Message-Id: <20220921093134.2936487-3-Jason@zx2c4.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-09-27 11:30:59 +02:00
Jason A. Donenfeld
eebb38a563 x86: use typedef for SetupData struct
The preferred style is SetupData as a typedef, not setup_data as a plain
struct.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Suggested-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Message-Id: <20220921093134.2936487-2-Jason@zx2c4.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-09-27 11:30:59 +02:00
Jason A. Donenfeld
e935b73508 x86: return modified setup_data only if read as memory, not as file
If setup_data is being read into a specific memory location, then
generally the setup_data address parameter is read first, so that the
caller knows where to read it into. In that case, we should return
setup_data containing the absolute addresses that are hard coded and
determined a priori. This is the case when kernels are loaded by BIOS,
for example. In contrast, when setup_data is read as a file, then we
shouldn't modify setup_data, since the absolute address will be wrong by
definition. This is the case when OVMF loads the image.

This allows setup_data to be used like normal, without crashing when EFI
tries to use it.

(As a small development note, strangely, fw_cfg_add_file_callback() was
exported but fw_cfg_add_bytes_callback() wasn't, so this makes that
consistent.)

Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: Laurent Vivier <laurent@vivier.eu>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Maydell <peter.maydell@linaro.org>
Cc: Philippe Mathieu-Daudé <f4bug@amsat.org>
Cc: Richard Henderson <richard.henderson@linaro.org>
Suggested-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Message-Id: <20220921093134.2936487-1-Jason@zx2c4.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-09-27 11:30:59 +02:00
Joao Martins
4ab4c33014 hw/i386: add 4g boundary start to X86MachineState
Rather than hardcoding the 4G boundary everywhere, introduce a
X86MachineState field @above_4g_mem_start and use it
accordingly.

This is in preparation for relocating ram-above-4g to be
dynamically start at 1T on AMD platforms.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20220719170014.27028-2-joao.m.martins@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-07-26 10:40:58 -04:00
Jason A. Donenfeld
67f7e426e5 hw/i386: pass RNG seed via setup_data entry
Tiny machines optimized for fast boot time generally don't use EFI,
which means a random seed has to be supplied some other way. For this
purpose, Linux (≥5.20) supports passing a seed in the setup_data table
with SETUP_RNG_SEED, specially intended for hypervisors, kexec, and
specialized bootloaders. The linked commit shows the upstream kernel
implementation.

At Paolo's request, we don't pass these to versioned machine types ≤7.0.

Link: https://git.kernel.org/tip/tip/c/68b8e9713c8
Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Richard Henderson <richard.henderson@linaro.org>
Cc: Eduardo Habkost <eduardo@habkost.net>
Cc: Peter Maydell <peter.maydell@linaro.org>
Cc: Philippe Mathieu-Daudé <f4bug@amsat.org>
Cc: Laurent Vivier <laurent@vivier.eu>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Message-Id: <20220721125636.446842-1-Jason@zx2c4.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-07-22 19:26:34 +02:00
Jamie Iles
af9751316e hw/core/loader: return image sizes as ssize_t
Various loader functions return an int which limits images to 2GB which
is fine for things like a BIOS/kernel image, but if we want to be able
to load memory images or large ramdisks then any file over 2GB would
silently fail to load.

Cc: Luc Michel <lmichel@kalray.eu>
Signed-off-by: Jamie Iles <jamie@nuviainc.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Luc Michel <lmichel@kalray.eu>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-Id: <20211111141141.3295094-2-jamie@nuviainc.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
2022-06-10 09:31:42 +10:00
Xiaoyao Li
c300bbe8d2 hw/i386: Make pic a property of common x86 base machine type
Legacy PIC (8259) cannot be supported for TDX guests since TDX module
doesn't allow directly interrupt injection.  Using posted interrupts
for the PIC is not a viable option as the guest BIOS/kernel will not
do EOI for PIC IRQs, i.e. will leave the vIRR bit set.

Make PIC the property of common x86 machine type. Hence all x86
machines, including microvm, can disable it.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Message-Id: <20220310122811.807794-3-xiaoyao.li@intel.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-05-16 16:15:40 -04:00
Xiaoyao Li
9dee7e5109 hw/i386: Make pit a property of common x86 base machine type
Both pc and microvm have pit property individually. Let's just make it
the property of common x86 base machine type.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Message-Id: <20220310122811.807794-2-xiaoyao.li@intel.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-05-16 16:15:40 -04:00
David Woodhouse
dc89f32d92 target/i386: Fix sanity check on max APIC ID / X2APIC enablement
The check on x86ms->apic_id_limit in pc_machine_done() had two problems.

Firstly, we need KVM to support the X2APIC API in order to allow IRQ
delivery to APICs >= 255. So we need to call/check kvm_enable_x2apic(),
which was done elsewhere in *some* cases but not all.

Secondly, microvm needs the same check. So move it from pc_machine_done()
to x86_cpus_init() where it will work for both.

The check in kvm_cpu_instance_init() is now redundant and can be dropped.

Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Acked-by: Claudio Fontana <cfontana@suse.de>
Message-Id: <20220314142544.150555-1-dwmw2@infradead.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-05-16 04:38:39 -04:00
Gerd Hoffmann
a8152c4e46 i386: firmware parsing and sev setup for -bios loaded firmware
Don't register firmware as rom, not needed (see comment).
Add x86_firmware_configure() call for proper sev initialization.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Tested-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Tested-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20220425135051.551037-4-kraxel@redhat.com>
2022-04-27 07:51:01 +02:00
Gerd Hoffmann
2aa6a39bc8 i386: move bios load error message
Switch to usual goto-end-of-function error handling style.
No functional change.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Tested-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20220425135051.551037-2-kraxel@redhat.com>
2022-04-27 07:51:01 +02:00
Marc-André Lureau
0f9668e0c1 Remove qemu-common.h include from most units
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20220323155743.1585078-33-marcandre.lureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-06 14:31:55 +02:00
Igor Mammedov
e6895f04c8 x86: cleanup unused compat_apic_id_mode
commit
  f862ddbb1a (hw/i386: Remove the deprecated pc-1.x machine types)
removed the last user of broken APIC ID compat knob,
but compat_apic_id_mode itself was forgotten.
Clean it up and simplify x86_cpu_apic_id_from_index()

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20220228131634.3389805-1-imammedo@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-03-06 05:08:23 -05:00
Paolo Bonzini
3ca8ce720f target/i386: use DMA-enabled multiboot ROM for new-enough QEMU machine types
As long as fw_cfg supports DMA, the new ROM can be used also on older
machine types because it has the same size as the existing one.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-02 15:57:27 +01:00
Paolo Bonzini
f014c97459 target/i386: move linuxboot_dma_enabled to X86MachineState
This removes a parameter from x86_load_linux, and will avoid code
duplication between the linux and multiboot cases once multiboot
starts to support DMA.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-02 15:57:27 +01:00
Philippe Mathieu-Daudé
93777de365 target/i386/sev: Rename sev_i386.h -> sev.h
SEV is a x86 specific feature, and the "sev_i386.h" header
is already in target/i386/. Rename it as "sev.h" to simplify.

Patch created mechanically using:

  $ git mv target/i386/sev_i386.h target/i386/sev.h
  $ sed -i s/sev_i386.h/sev.h/ $(git grep -l sev_i386.h)

Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20211007161716.453984-15-philmd@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-13 10:47:49 +02:00
Dov Murik
c0c2d319d6 x86/sev: generate SEV kernel loader hashes in x86_load_linux
If SEV is enabled and a kernel is passed via -kernel, pass the hashes of
kernel/initrd/cmdline in an encrypted guest page to OVMF for SEV
measured boot.

Co-developed-by: James Bottomley <jejb@linux.ibm.com>
Signed-off-by: James Bottomley <jejb@linux.ibm.com>
Signed-off-by: Dov Murik <dovmurik@linux.ibm.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20210930054915.13252-3-dovmurik@linux.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-05 12:47:24 +02:00
Sean Christopherson
dfce81f1b9 vl: Add sgx compound properties to expose SGX EPC sections to guest
Because SGX EPC is enumerated through CPUID, EPC "devices" need to be
realized prior to realizing the vCPUs themselves, i.e. long before
generic devices are parsed and realized.  From a virtualization
perspective, the CPUID aspect also means that EPC sections cannot be
hotplugged without paravirtualizing the guest kernel (hardware does
not support hotplugging as EPC sections must be locked down during
pre-boot to provide EPC's security properties).

So even though EPC sections could be realized through the generic
-devices command, they need to be created much earlier for them to
actually be usable by the guest.  Place all EPC sections in a
contiguous block, somewhat arbitrarily starting after RAM above 4g.
Ensuring EPC is in a contiguous region simplifies calculations, e.g.
device memory base, PCI hole, etc..., allows dynamic calculation of the
total EPC size, e.g. exposing EPC to guests does not require -maxmem,
and last but not least allows all of EPC to be enumerated in a single
ACPI entry, which is expected by some kernels, e.g. Windows 7 and 8.

The new compound properties command for sgx like below:
 ......
 -object memory-backend-epc,id=mem1,size=28M,prealloc=on \
 -object memory-backend-epc,id=mem2,size=10M \
 -M sgx-epc.0.memdev=mem1,sgx-epc.1.memdev=mem2

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
Message-Id: <20210719112136.57018-6-yang.zhong@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-09-30 14:50:20 +02:00
Paolo Bonzini
67872eb8ed machine: move dies from X86MachineState to CpuTopology
In order to make SMP configuration a Machine property, we need a getter as
well as a setter.  To simplify the implementation put everything that the
getter needs in the CpuTopology struct.

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20210617155308.928754-7-pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-25 16:13:48 +02:00
Chenyi Qiang
035d1ef265 i386: Add ratelimit for bus locks acquired in guest
A bus lock is acquired through either split locked access to writeback
(WB) memory or any locked access to non-WB memory. It is typically >1000
cycles slower than an atomic operation within a cache and can also
disrupts performance on other cores.

Virtual Machines can exploit bus locks to degrade the performance of
system. To address this kind of performance DOS attack coming from the
VMs, bus lock VM exit is introduced in KVM and it can report the bus
locks detected in guest. If enabled in KVM, it would exit to the
userspace to let the user enforce throttling policies once bus locks
acquired in VMs.

The availability of bus lock VM exit can be detected through the
KVM_CAP_X86_BUS_LOCK_EXIT. The returned bitmap contains the potential
policies supported by KVM. The field KVM_BUS_LOCK_DETECTION_EXIT in
bitmap is the only supported strategy at present. It indicates that KVM
will exit to userspace to handle the bus locks.

This patch adds a ratelimit on the bus locks acquired in guest as a
mitigation policy.

Introduce a new field "bus_lock_ratelimit" to record the limited speed
of bus locks in the target VM. The user can specify it through the
"bus-lock-ratelimit" as a machine property. In current implementation,
the default value of the speed is 0 per second, which means no
restrictions on the bus locks.

As for ratelimit on detected bus locks, simply set the ratelimit
interval to 1s and restrict the quota of bus lock occurence to the value
of "bus_lock_ratelimit". A potential alternative is to introduce the
time slice as a property which can help the user achieve more precise
control.

The detail of bus lock VM exit can be found in spec:
https://software.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.html

Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
Message-Id: <20210521043820.29678-1-chenyi.qiang@intel.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2021-06-17 14:11:06 -04:00
Marian Postevca
d07b22863b acpi: Move setters/getters of oem fields to X86MachineState
The code that sets/gets oem fields is duplicated in both PC and MICROVM
variants. This commit moves it to X86MachineState so that all x86
variants can use it and duplication is removed.

Signed-off-by: Marian Postevca <posteuca@mutex.one>
Message-Id: <20210221001737.24499-2-posteuca@mutex.one>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-03-22 18:58:19 -04:00
David Edmondson
e20e182ea0 x86/pvh: extract only 4 bytes of start address for 32 bit kernels
When loading the PVH start address from a 32 bit ELF note, extract
only the appropriate number of bytes.

Fixes: ab969087da ("pvh: Boot uncompressed kernel using direct boot ABI")
Signed-off-by: David Edmondson <david.edmondson@oracle.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Message-Id: <20210302090315.3031492-3-david.edmondson@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-06 11:41:54 +01:00
Claudio Fontana
a9dc68d9b2 i386: move kvm accel files into kvm/
Signed-off-by: Claudio Fontana <cfontana@suse.de>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20201212155530.23098-2-cfontana@suse.de>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2020-12-16 14:06:52 -05:00
Paolo Bonzini
2c65db5e58 vl: extract softmmu/datadir.c
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-12-10 12:15:18 -05:00