Commit Graph

925 Commits

Author SHA1 Message Date
Paolo Bonzini c6986f16a7 KVM: x86: do not fail if software breakpoint has already been removed
If kvm_arch_remove_sw_breakpoint finds that a software breakpoint does not
have an INT3 instruction, it fails.  This can happen if one sets a
software breakpoint in a kernel module and then reloads it.  gdb then
thinks the breakpoint cannot be deleted and there is no way to add it
back.

Suggested-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-06 11:41:54 +01:00
Zheng Zhan Liang c45b426acd tcg/i386: rdpmc: fix the the condtions
Signed-off-by: Zheng Zhan Liang <linuxmaker@163.com>
Message-Id: <20210225054756.35962-1-linuxmaker@163.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-02-25 15:41:53 +01:00
Chenyi Qiang 06e878b413 target/i386: Add bus lock debug exception support
Bus lock debug exception is a feature that can notify the kernel by
generate an #DB trap after the instruction acquires a bus lock when
CPL>0. This allows the kernel to enforce user application throttling or
mitigations.

This feature is enumerated via CPUID.(EAX=7,ECX=0).ECX[bit 24].

Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
Message-Id: <20210202090224.13274-1-chenyi.qiang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-02-25 14:14:33 +01:00
Daniel P. Berrangé b7d77f5a8e target/i386: update to show preferred boolean syntax for -cpu
The preferred syntax is to use "foo=on|off", rather than a bare
"+foo" or "-foo"

Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20210216191027.595031-11-berrange@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-02-25 14:14:33 +01:00
Babu Moger 623972ceae i386: Add the support for AMD EPYC 3rd generation processors
Adds the support for AMD 3rd generation processors. The model
display for the new processor will be EPYC-Milan.

Adds the following new feature bits on top of the feature bits from
the first and second generation EPYC models.

pcid          : Process context identifiers support
ibrs          : Indirect Branch Restricted Speculation
ssbd          : Speculative Store Bypass Disable
erms          : Enhanced REP MOVSB/STOSB support
fsrm          : Fast Short REP MOVSB support
invpcid       : Invalidate processor context ID
pku           : Protection keys support
svme-addr-chk : SVM instructions address check for #GP handling

Depends on the following kernel commits:
14c2bf81fcd2 ("KVM: SVM: Fix #GP handling for doubly-nested virtualization")
3b9c723ed7cf ("KVM: SVM: Add support for SVM instruction address check change")
4aa2691dcbd3 ("8ce1c461188799d863398dd2865d KVM: x86: Factor out x86 instruction emulation with decoding")
4407a797e941 ("KVM: SVM: Enable INVPCID feature on AMD")
9715092f8d7e ("KVM: X86: Move handling of INVPCID types to x86")
3f3393b3ce38 ("KVM: X86: Rename and move the function vmx_handle_memory_failure to x86.c")
830bd71f2c06 ("KVM: SVM: Remove set_cr_intercept, clr_cr_intercept and is_cr_intercept")
4c44e8d6c193 ("KVM: SVM: Add new intercept word in vmcb_control_area")
c62e2e94b9d4 ("KVM: SVM: Modify 64 bit intercept field to two 32 bit vectors")
9780d51dc2af ("KVM: SVM: Modify intercept_exceptions to generic intercepts")
30abaa88382c ("KVM: SVM: Change intercept_dr to generic intercepts")
03bfeeb988a9 ("KVM: SVM: Change intercept_cr to generic intercepts")
c45ad7229d13 ("KVM: SVM: Introduce vmcb_(set_intercept/clr_intercept/_is_intercept)")
a90c1ed9f11d ("(pcid) KVM: nSVM: Remove unused field")
fa44b82eb831 ("KVM: x86: Move MPK feature detection to common code")
38f3e775e9c2 ("x86/Kconfig: Update config and kernel doc for MPK feature on AMD")
37486135d3a7 ("KVM: x86: Fix pkru save/restore when guest CR4.PKE=0, move it to x86.c")

Signed-off-by: Babu Moger <babu.moger@amd.com>
Message-Id: <161290460478.11352.8933244555799318236.stgit@bmoger-ubuntu>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2021-02-18 18:34:45 -05:00
Peter Maydell f0f75dc174 * HVF fixes
* Extra qos-test debugging output (Christian)
 * SEV secret address autodetection (James)
 * SEV-ES support (Thomas)
 * Relocatable paths bugfix (Stefan)
 * RR fix (Pavel)
 * EventNotifier fix (Greg)
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmAr778UHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroNVLwf/V3lb/HbyqFkhacB9eqEsEXGC3Hdp
 hU4J11P3lGS84muByxCdfw1axCGZ5x2cJmJSE71LfCcHXxEQSx4FmfxX5xeKbp1n
 vHPJ1XKhsFkOYA2O6mCW4yynTfizmp+JK36wwjmG3BEXTMMC5o2V8gAnzkP1sT9l
 0h454CtPq2lD0upgVIvI7AStpWXZwysh0hQEDk8TsIfFfzLNs+MJyvlPGn4pj+kN
 k+G3475FinPdncIBGsnRNMfiBmA4/L0L4lriQzZPV57lDfZ8sJkrmh1+/JfK6vsb
 FWIe6Suior6JGorzATbXrFhmNJ+FxNNEmlzSdqRxRz7CDv0SDZb7Ckv37Q==
 =FDIr
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/bonzini-gitlab/tags/for-upstream' into staging

* HVF fixes
* Extra qos-test debugging output (Christian)
* SEV secret address autodetection (James)
* SEV-ES support (Thomas)
* Relocatable paths bugfix (Stefan)
* RR fix (Pavel)
* EventNotifier fix (Greg)

# gpg: Signature made Tue 16 Feb 2021 16:15:59 GMT
# gpg:                using RSA key F13338574B662389866C7682BFFBD25F78C7AE83
# gpg:                issuer "pbonzini@redhat.com"
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [full]
# gpg:                 aka "Paolo Bonzini <pbonzini@redhat.com>" [full]
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4  E2F7 7E15 100C CD36 69B1
#      Subkey fingerprint: F133 3857 4B66 2389 866C  7682 BFFB D25F 78C7 AE83

* remotes/bonzini-gitlab/tags/for-upstream: (21 commits)
  replay: fix icount request when replaying clock access
  event_notifier: Set ->initialized earlier in event_notifier_init()
  hvf: Fetch cr4 before evaluating CPUID(1)
  target/i386/hvf: add rdmsr 35H MSR_CORE_THREAD_COUNT
  hvf: x86: Remove unused definitions
  target/i386/hvf: add vmware-cpuid-freq cpu feature
  hvf: Guard xgetbv call
  util/cutils: Skip "." when looking for next directory component
  tests/qtest/qos-test: dump QEMU command if verbose
  tests/qtest/qos-test: dump environment variables if verbose
  tests/qtest/qos-test: dump qos graph if verbose
  libqos/qgraph_internal: add qos_printf() and qos_printf_literal()
  libqos/qgraph: add qos_node_create_driver_named()
  sev/i386: Enable an SEV-ES guest based on SEV policy
  kvm/i386: Use a per-VM check for SMM capability
  sev/i386: Don't allow a system reset under an SEV-ES guest
  sev/i386: Allow AP booting under SEV-ES
  sev/i386: Require in-kernel irqchip support for SEV-ES guests
  sev/i386: Add initial support for SEV-ES
  sev: update sev-inject-launch-secret to make gpa optional
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-02-17 13:04:48 +00:00
Alexander Graf 106f91d59c hvf: Fetch cr4 before evaluating CPUID(1)
The CPUID function 1 has a bit called OSXSAVE which tells user space the
status of the CR4.OSXSAVE bit. Our generic CPUID function injects that bit
based on the status of CR4.

With Hypervisor.framework, we do not synchronize full CPU state often enough
for this function to see the CR4 update before guest user space asks for it.

To be on the save side, let's just always synchronize it when we receive a
CPUID(1) request. That way we can set the bit with real confidence.

Reported-by: Asad Ali <asad@osaro.com>
Signed-off-by: Alexander Graf <agraf@csgraf.de>
Message-Id: <20210123004129.6364-1-agraf@csgraf.de>
[RB: resolved conflict with another CPUID change]
Signed-off-by: Roman Bolshakov <r.bolshakov@yadro.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-02-16 17:15:39 +01:00
Vladislav Yaroshchuk 027ac0cb51 target/i386/hvf: add rdmsr 35H MSR_CORE_THREAD_COUNT
Some guests (ex. Darwin-XNU) can attemp to read this MSR to retrieve and
validate CPU topology comparing it to ACPI MADT content

MSR description from Intel Manual:
35H: MSR_CORE_THREAD_COUNT: Configured State of Enabled Processor Core
  Count and Logical Processor Count

Bits 15:0 THREAD_COUNT The number of logical processors that are
  currently enabled in the physical package

Bits 31:16 Core_COUNT The number of processor cores that are currently
  enabled in the physical package

Bits 63:32 Reserved

Signed-off-by: Vladislav Yaroshchuk <yaroshchuk2000@gmail.com>
Message-Id: <20210113205323.33310-1-yaroshchuk2000@gmail.com>
[RB: reordered MSR definition and dropped u suffix from shift offset]
Signed-off-by: Roman Bolshakov <r.bolshakov@yadro.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-02-16 17:15:39 +01:00
Alexander Graf 45f918ccf6 hvf: x86: Remove unused definitions
The hvf i386 has a few struct and cpp definitions that are never
used. Remove them.

Suggested-by: Roman Bolshakov <r.bolshakov@yadro.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Alexander Graf <agraf@csgraf.de>
Message-Id: <20210120224444.71840-3-agraf@csgraf.de>
Signed-off-by: Roman Bolshakov <r.bolshakov@yadro.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-02-16 17:15:39 +01:00
Vladislav Yaroshchuk 3b502b0e47 target/i386/hvf: add vmware-cpuid-freq cpu feature
For `-accel hvf` cpu_x86_cpuid() is wrapped with hvf_cpu_x86_cpuid() to
add paravirtualization cpuid leaf 0x40000010
https://lkml.org/lkml/2008/10/1/246

Leaf 0x40000010, Timing Information:
EAX: (Virtual) TSC frequency in kHz.
EBX: (Virtual) Bus (local apic timer) frequency in kHz.
ECX, EDX: RESERVED (Per above, reserved fields are set to zero).

On macOS TSC and APIC Bus frequencies can be readed by sysctl call with
names `machdep.tsc.frequency` and `hw.busfrequency`

This options is required for Darwin-XNU guest to be synchronized with
host

Leaf 0x40000000 not exposes HVF leaving hypervisor signature empty

Signed-off-by: Vladislav Yaroshchuk <yaroshchuk2000@gmail.com>
Message-Id: <20210122150518.3551-1-yaroshchuk2000@gmail.com>
Signed-off-by: Roman Bolshakov <r.bolshakov@yadro.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-02-16 17:15:39 +01:00
Hill Ma 118f2aadbc hvf: Guard xgetbv call
This prevents illegal instruction on cpus that do not support xgetbv.

Buglink: https://bugs.launchpad.net/qemu/+bug/1758819
Reviewed-by: Cameron Esfahani <dirty@apple.com>
Signed-off-by: Hill Ma <maahiuzeon@gmail.com>
Message-Id: <X/6OJ7qk0W6bHkHQ@Hills-Mac-Pro.local>
Signed-off-by: Roman Bolshakov <r.bolshakov@yadro.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-02-16 17:15:39 +01:00
Tom Lendacky 027b524d6a sev/i386: Enable an SEV-ES guest based on SEV policy
Update the sev_es_enabled() function return value to be based on the SEV
policy that has been specified. SEV-ES is enabled if SEV is enabled and
the SEV-ES policy bit is set in the policy object.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Richard Henderson <richard.henderson@linaro.org>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Venu Busireddy <venu.busireddy@oracle.com>
Message-Id: <c69f81c6029f31fc4c52a9f35f1bd704362476a5.1611682609.git.thomas.lendacky@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-02-16 17:15:39 +01:00
Tom Lendacky 23edf8b549 kvm/i386: Use a per-VM check for SMM capability
SMM is not currently supported for an SEV-ES guest by KVM. Change the SMM
capability check from a KVM-wide check to a per-VM check in order to have
a finer-grained SMM capability check.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Richard Henderson <richard.henderson@linaro.org>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Venu Busireddy <venu.busireddy@oracle.com>
Message-Id: <f851903809e9d4e6a22d5dfd738dac8da991e28d.1611682609.git.thomas.lendacky@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-02-16 17:15:39 +01:00
Tom Lendacky 92a5199b29 sev/i386: Don't allow a system reset under an SEV-ES guest
An SEV-ES guest does not allow register state to be altered once it has
been measured. When an SEV-ES guest issues a reboot command, Qemu will
reset the vCPU state and resume the guest. This will cause failures under
SEV-ES. Prevent that from occuring by introducing an arch-specific
callback that returns a boolean indicating whether vCPUs are resettable.

Cc: Peter Maydell <peter.maydell@linaro.org>
Cc: Aurelien Jarno <aurelien@aurel32.net>
Cc: Jiaxun Yang <jiaxun.yang@flygoat.com>
Cc: Aleksandar Rikalo <aleksandar.rikalo@syrmia.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: David Hildenbrand <david@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Venu Busireddy <venu.busireddy@oracle.com>
Message-Id: <1ac39c441b9a3e970e9556e1cc29d0a0814de6fd.1611682609.git.thomas.lendacky@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-02-16 17:15:39 +01:00
Paolo Bonzini b2f73a0784 sev/i386: Allow AP booting under SEV-ES
When SEV-ES is enabled, it is not possible modify the guests register
state after it has been initially created, encrypted and measured.

Normally, an INIT-SIPI-SIPI request is used to boot the AP. However, the
hypervisor cannot emulate this because it cannot update the AP register
state. For the very first boot by an AP, the reset vector CS segment
value and the EIP value must be programmed before the register has been
encrypted and measured. Search the guest firmware for the guest for a
specific GUID that tells Qemu the value of the reset vector to use.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Cc: Richard Henderson <richard.henderson@linaro.org>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Message-Id: <22db2bfb4d6551aed661a9ae95b4fdbef613ca21.1611682609.git.thomas.lendacky@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-02-16 17:15:39 +01:00
Tom Lendacky 9681f8677f sev/i386: Require in-kernel irqchip support for SEV-ES guests
In prep for AP booting, require the use of in-kernel irqchip support. This
lessens the Qemu support burden required to boot APs.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Richard Henderson <richard.henderson@linaro.org>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Venu Busireddy <venu.busireddy@oracle.com>
Message-Id: <e9aec5941e613456f0757f5a73869cdc5deea105.1611682609.git.thomas.lendacky@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-02-16 17:15:39 +01:00
Tom Lendacky 6b98e96f18 sev/i386: Add initial support for SEV-ES
Provide initial support for SEV-ES. This includes creating a function to
indicate the guest is an SEV-ES guest (which will return false until all
support is in place), performing the proper SEV initialization and
ensuring that the guest CPU state is measured as part of the launch.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Richard Henderson <richard.henderson@linaro.org>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Co-developed-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Venu Busireddy <venu.busireddy@oracle.com>
Message-Id: <2e6386cbc1ddeaf701547dd5677adf5ddab2b6bd.1611682609.git.thomas.lendacky@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-02-16 17:15:39 +01:00
James Bottomley f522cef9b3 sev: update sev-inject-launch-secret to make gpa optional
If the gpa isn't specified, it's value is extracted from the OVMF
properties table located below the reset vector (and if this doesn't
exist, an error is returned).  OVMF has defined the GUID for the SEV
secret area as 4c2eb361-7d9b-4cc3-8081-127c90d3d294 and the format of
the <data> is: <base>|<size> where both are uint32_t.  We extract
<base> and use it as the gpa for the injection.

Note: it is expected that the injected secret will also be GUID
described but since qemu can't interpret it, the format is left
undefined here.

Signed-off-by: James Bottomley <jejb@linux.ibm.com>

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20210204193939.16617-3-jejb@linux.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-02-16 17:15:39 +01:00
James Bottomley 9617cddb72 pc: add parser for OVMF reset block
OVMF is developing a mechanism for depositing a GUIDed table just
below the known location of the reset vector.  The table goes
backwards in memory so all entries are of the form

<data>|len|<GUID>

Where <data> is arbtrary size and type, <len> is a uint16_t and
describes the entire length of the entry from the beginning of the
data to the end of the guid.

The foot of the table is of this form and <len> for this case
describes the entire size of the table.  The table foot GUID is
defined by OVMF as 96b582de-1fb2-45f7-baea-a366c55a082d and if the
table is present this GUID is just below the reset vector, 48 bytes
before the end of the firmware file.

Add a parser for the ovmf reset block which takes a copy of the block,
if the table foot guid is found, minus the footer and a function for
later traversal to return the data area of any specified GUIDs.

Signed-off-by: James Bottomley <jejb@linux.ibm.com>

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20210204193939.16617-2-jejb@linux.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-02-16 17:15:39 +01:00
Richard Henderson 3e8f1628e8 exec: Use cpu_untagged_addr in g2h; split out g2h_untagged
Use g2h_untagged in contexts that have no cpu, e.g. the binary
loaders that operate before the primary cpu is created.  As a
colollary, target_mmap and friends must use untagged addresses,
since they are used by the loaders.

Use g2h_untagged on values returned from target_mmap, as the
kernel never applies a tag itself.

Use g2h_untagged on all pc values.  The only current user of
tags, aarch64, removes tags from code addresses upon branch,
so "pc" is always untagged.

Use g2h with the cpu context on hand wherever possible.

Use g2h_untagged in lock_user, which will be updated soon.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210212184902.1251044-13-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-02-16 11:04:53 +00:00
Chenyi Qiang 52a44ad2b9 target/i386: Expose VMX entry/exit load pkrs control bits
Expose the VMX exit/entry load pkrs control bits in
VMX_TRUE_EXIT_CTLS/VMX_TRUE_ENTRY_CTLS MSRs to guest, which supports the
PKS in nested VM.

Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
Message-Id: <20210205083325.13880-3-chenyi.qiang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-02-08 15:15:32 +01:00
Chenyi Qiang 6aa4228bd6 target/i386: Add support for save/load IA32_PKRS MSR
PKS introduces MSR IA32_PKRS(0x6e1) to manage the supervisor protection
key rights. Page access and writes can be managed via the MSR update
without TLB flushes when permissions change.

Add the support to save/load IA32_PKRS MSR in guest.

Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
Message-Id: <20210205083325.13880-2-chenyi.qiang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-02-08 15:15:32 +01:00
Paolo Bonzini e7e7bdabab target/i86: implement PKS
Protection Keys for Supervisor-mode pages is a simple extension of
the PKU feature that QEMU already implements.  For supervisor-mode
pages, protection key restrictions come from a new MSR.  The MSR
has no XSAVE state associated to it.

PKS is only respected in long mode.  However, in principle it is
possible to set the MSR even outside long mode, and in fact
even the XSAVE state for PKRU could be set outside long mode
using XRSTOR.  So do not limit the migration subsections for
PKRU and PKRS to long mode.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-02-08 14:43:55 +01:00
David Greenaway 51909241d2 target/i386: Fix decoding of certain BMI instructions
This patch fixes a translation bug for a subset of x86 BMI instructions
such as the following:

   c4 e2 f9 f7 c0                shlxq   %rax, %rax, %rax

Currently, these incorrectly generate an undefined instruction exception
when SSE is disabled via CR4, while instructions like "shrxq" work fine.

The problem appears to be related to BMI instructions encoded using VEX
and with a mandatory prefix of "0x66" (data). Instructions with this
data prefix (such as shlxq) are currently rejected. Instructions with
other mandatory prefixes (such as shrxq) translate as expected.

This patch removes the incorrect check in "gen_sse" that causes the
exception to be generated. For the non-BMI cases, the check is
redundant: prefixes are already checked at line 3696.

Buglink: https://bugs.launchpad.net/qemu/+bug/1748296

Signed-off-by: David Greenaway <dgreenaway@google.com>
Message-Id: <20210114063958.1508050-1-dgreenaway@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-02-08 14:43:55 +01:00
Wei Huang 5447089c2b x86/cpu: Populate SVM CPUID feature bits
Newer AMD CPUs will add CPUID_0x8000000A_EDX[28] bit, which indicates
that SVM instructions (VMRUN/VMSAVE/VMLOAD) will trigger #VMEXIT before
CPU checking their EAX against reserved memory regions. This change will
allow the hypervisor to avoid intercepting #GP and emulating SVM
instructions. KVM turns on this CPUID bit for nested VMs. In order to
support it, let us populate this bit, along with other SVM feature bits,
in FEAT_SVM.

Signed-off-by: Wei Huang <wei.huang2@amd.com>
Message-Id: <20210126202456.589932-1-wei.huang2@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-02-08 14:43:54 +01:00
Paolo Bonzini 5ea9e9e239 target/i386: do not set LM for 32-bit emulation "-cpu host/max"
32-bit targets by definition do not support long mode; therefore, the
bit must be masked in the features supported by the accelerator.

As a side effect, this avoids setting up the 0x80000008 CPUID leaf
for

   qemu-system-i386 -cpu host

which since commit 5a140b255d ("x86/cpu: Use max host physical address
if -cpu max option is applied") would have printed this error:

  qemu-system-i386: phys-bits should be between 32 and 36  (but is 48)

Reported-by: Nathan Chancellor <natechancellor@gmail.com>
Tested-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-02-08 14:43:54 +01:00
David Gibson ec78e2cda3 confidential guest support: Move SEV initialization into arch specific code
While we've abstracted some (potential) differences between mechanisms for
securing guest memory, the initialization is still specific to SEV.  Given
that, move it into x86's kvm_arch_init() code, rather than the generic
kvm_init() code.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
2021-02-08 16:57:38 +11:00
David Gibson abc27d4241 confidential guest support: Introduce cgs "ready" flag
The platform specific details of mechanisms for implementing
confidential guest support may require setup at various points during
initialization.  Thus, it's not really feasible to have a single cgs
initialization hook, but instead each mechanism needs its own
initialization calls in arch or machine specific code.

However, to make it harder to have a bug where a mechanism isn't
properly initialized under some circumstances, we want to have a
common place, late in boot, where we verify that cgs has been
initialized if it was requested.

This patch introduces a ready flag to the ConfidentialGuestSupport
base type to accomplish this, which we verify in
qemu_machine_creation_done().

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
2021-02-08 16:57:38 +11:00
David Gibson c9f5aaa6bc sev: Add Error ** to sev_kvm_init()
This allows failures to be reported richly and idiomatically.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
2021-02-08 16:57:38 +11:00
David Gibson e0292d7c62 confidential guest support: Rework the "memory-encryption" property
Currently the "memory-encryption" property is only looked at once we
get to kvm_init().  Although protection of guest memory from the
hypervisor isn't something that could really ever work with TCG, it's
not conceptually tied to the KVM accelerator.

In addition, the way the string property is resolved to an object is
almost identical to how a QOM link property is handled.

So, create a new "confidential-guest-support" link property which sets
this QOM interface link directly in the machine.  For compatibility we
keep the "memory-encryption" property, but now implemented in terms of
the new property.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
2021-02-08 16:57:38 +11:00
David Gibson aacdb84413 sev: Remove false abstraction of flash encryption
When AMD's SEV memory encryption is in use, flash memory banks (which are
initialed by pc_system_flash_map()) need to be encrypted with the guest's
key, so that the guest can read them.

That's abstracted via the kvm_memcrypt_encrypt_data() callback in the KVM
state.. except, that it doesn't really abstract much at all.

For starters, the only call site is in code specific to the 'pc'
family of machine types, so it's obviously specific to those and to
x86 to begin with.  But it makes a bunch of further assumptions that
need not be true about an arbitrary confidential guest system based on
memory encryption, let alone one based on other mechanisms:

 * it assumes that the flash memory is defined to be encrypted with the
   guest key, rather than being shared with hypervisor
 * it assumes that that hypervisor has some mechanism to encrypt data into
   the guest, even though it can't decrypt it out, since that's the whole
   point
 * the interface assumes that this encrypt can be done in place, which
   implies that the hypervisor can write into a confidential guests's
   memory, even if what it writes isn't meaningful

So really, this "abstraction" is actually pretty specific to the way SEV
works.  So, this patch removes it and instead has the PC flash
initialization code call into a SEV specific callback.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
2021-02-08 16:57:38 +11:00
David Gibson f91f9f254b confidential guest support: Introduce new confidential guest support class
Several architectures have mechanisms which are designed to protect
guest memory from interference or eavesdropping by a compromised
hypervisor.  AMD SEV does this with in-chip memory encryption and
Intel's TDX can do similar things.  POWER's Protected Execution
Framework (PEF) accomplishes a similar goal using an ultravisor and
new memory protection features, instead of encryption.

To (partially) unify handling for these, this introduces a new
ConfidentialGuestSupport QOM base class.  "Confidential" is kind of vague,
but "confidential computing" seems to be the buzzword about these schemes,
and "secure" or "protected" are often used in connection to unrelated
things (such as hypervisor-from-guest or guest-from-guest security).

The "support" in the name is significant because in at least some of the
cases it requires the guest to take specific actions in order to protect
itself from hypervisor eavesdropping.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-02-08 16:57:37 +11:00
Claudio Fontana b86f59c715 accel: replace struct CpusAccel with AccelOpsClass
This will allow us to centralize the registration of
the cpus.c module accelerator operations (in accel/accel-softmmu.c),
and trigger it automatically using object hierarchy lookup from the
new accel_init_interfaces() initialization step, depending just on
which accelerators are available in the code.

Rename all tcg-cpus.c, kvm-cpus.c, etc to tcg-accel-ops.c,
kvm-accel-ops.c, etc, matching the object type names.

Signed-off-by: Claudio Fontana <cfontana@suse.de>
Message-Id: <20210204163931.7358-18-cfontana@suse.de>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-02-05 10:24:15 -10:00
Claudio Fontana 940e43aa30 accel: extend AccelState and AccelClass to user-mode
Signed-off-by: Claudio Fontana <cfontana@suse.de>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

[claudio: rebased on Richard's splitwx work]

Signed-off-by: Claudio Fontana <cfontana@suse.de>
Message-Id: <20210204163931.7358-17-cfontana@suse.de>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-02-05 10:24:15 -10:00
Claudio Fontana 7827168471 cpu: tcg_ops: move to tcg-cpu-ops.h, keep a pointer in CPUClass
we cannot in principle make the TCG Operations field definitions
conditional on CONFIG_TCG in code that is included by both common_ss
and specific_ss modules.

Therefore, what we can do safely to restrict the TCG fields to TCG-only
builds, is to move all tcg cpu operations into a separate header file,
which is only included by TCG, target-specific code.

This leaves just a NULL pointer in the cpu.h for the non-TCG builds.

This also tidies up the code in all targets a bit, having all TCG cpu
operations neatly contained by a dedicated data struct.

Signed-off-by: Claudio Fontana <cfontana@suse.de>
Message-Id: <20210204163931.7358-16-cfontana@suse.de>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-02-05 10:24:15 -10:00
Claudio Fontana 0545608056 cpu: move cc->do_interrupt to tcg_ops
Signed-off-by: Claudio Fontana <cfontana@suse.de>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20210204163931.7358-10-cfontana@suse.de>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-02-05 10:24:14 -10:00
Eduardo Habkost e9ce43e97a cpu: Move debug_excp_handler to tcg_ops
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Claudio Fontana <cfontana@suse.de>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20210204163931.7358-8-cfontana@suse.de>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-02-05 10:24:14 -10:00
Eduardo Habkost e124536f37 cpu: Move tlb_fill to tcg_ops
[claudio: wrapped target code in CONFIG_TCG]

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Claudio Fontana <cfontana@suse.de>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20210204163931.7358-7-cfontana@suse.de>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-02-05 10:24:14 -10:00
Eduardo Habkost 48c1a3e303 cpu: Move cpu_exec_* to tcg_ops
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
[claudio: wrapped target code in CONFIG_TCG]
Signed-off-by: Claudio Fontana <cfontana@suse.de>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20210204163931.7358-6-cfontana@suse.de>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-02-05 10:24:14 -10:00
Eduardo Habkost ec62595bab cpu: Move synchronize_from_tb() to tcg_ops
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
[claudio: wrapped target code in CONFIG_TCG, reworded comments]
Signed-off-by: Claudio Fontana <cfontana@suse.de>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20210204163931.7358-5-cfontana@suse.de>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-02-05 10:24:14 -10:00
Eduardo Habkost e9e51b7154 cpu: Introduce TCGCpuOperations struct
The TCG-specific CPU methods will be moved to a separate struct,
to make it easier to move accel-specific code outside generic CPU
code in the future.  Start by moving tcg_initialize().

The new CPUClass.tcg_opts field may eventually become a pointer,
but keep it an embedded struct for now, to make code conversion
easier.

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
[claudio: move TCGCpuOperations inside include/hw/core/cpu.h]
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20210204163931.7358-2-cfontana@suse.de>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-02-05 10:24:14 -10:00
Eric Blake c3033fd372 qapi: Use QAPI_LIST_APPEND in trivial cases
The easiest spots to use QAPI_LIST_APPEND are where we already have an
obvious pointer to the tail of a list.  While at it, consistently use
the variable name 'tail' for that purpose.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20210113221013.390592-5-eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2021-01-28 08:08:45 +01:00
Yang Weijiang 5a140b255d x86/cpu: Use max host physical address if -cpu max option is applied
QEMU option -cpu max(max_features) means "Enables all features supported by
the accelerator in the current host", this looks true for all the features
except guest max physical address width, so add this patch to enable it.

Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
Message-Id: <20210113090430.26394-1-weijiang.yang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-01-21 13:00:41 +01:00
Philippe Mathieu-Daudé c117e5b11a target/i386: Use X86Seg enum for segment registers
Use the dedicated X86Seg enum type for segment registers.

Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20210109233427.749748-1-f4bug@amsat.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-01-12 17:05:10 +01:00
Yonggang Luo 9a46d044d8 whpx: move whpx_lapic_state from header to c file
This struct only used in whpx-apic.c, there is no need
expose it in whpx.h.

Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Message-Id: <20210107101919.80-6-luoyonggang@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-01-12 12:38:03 +01:00
Paolo Bonzini 84f4ef17ae whpx: move internal definitions to whpx-internal.h
Only leave the external interface in sysemu/whpx.h.  whpx_apic_in_platform
is moved to a .c file because it needs whpx_state.

Reported-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20201219090637.1700900-3-pbonzini@redhat.com>
2021-01-12 12:38:03 +01:00
Paolo Bonzini 9102c96821 whpx: rename whp-dispatch to whpx-internal.h
Rename the file in preparation for moving more implementation-internal
definitions to it.  The build is still broken though.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20201219090637.1700900-2-pbonzini@redhat.com>
2021-01-12 12:38:03 +01:00
Richard Henderson 04a37d4ca4 tcg: Make tb arg to synchronize_from_tb const
There is nothing within the translators that ought to be
changing the TranslationBlock data, so make it const.

This does not actually use the read-only copy of the
data structure that exists within the rx region.

Reviewed-by: Joelle van Dyne <j@getutm.app>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-01-07 05:09:41 -10:00
Peter Maydell 3df1a3d070 target/i386: Check privilege level for protected mode 'int N' task gate
When the 'int N' instruction is executed in protected mode, the
pseudocode in the architecture manual specifies that we need to check:

 * vector number within IDT limits
 * selected IDT descriptor is a valid type (interrupt, trap or task gate)
 * if this was a software interrupt then gate DPL < CPL

The way we had structured the code meant that the privilege check for
software interrupts ended up not in the code path taken for task gate
handling, because all of the task gate handling code was in the 'case 5'
of the switch which was checking "is this descriptor a valid type".

Move the task gate handling code out of that switch (so that it is now
purely doing the "valid type?" check) and below the software interrupt
privilege check.

The effect of this missing check was that in a guest userspace binary
executing 'int 8' would cause a guest kernel panic rather than the
userspace binary being handed a SEGV.

This is essentially the same bug fixed in VirtualBox in 2012:
https://www.halfdog.net/Security/2012/VirtualBoxSoftwareInterrupt0x8GuestCrash/

Note that for QEMU this is not a security issue because it is only
present when using TCG.

Fixes: https://bugs.launchpad.net/qemu/+bug/1813201
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <20201121224445.16236-1-peter.maydell@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-01-02 21:03:09 +01:00
Peter Maydell 1f7c02797f QAPI patches patches for 2020-12-19
-----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEENUvIs9frKmtoZ05fOHC0AOuRhlMFAl/dynUSHGFybWJydUBy
 ZWRoYXQuY29tAAoJEDhwtADrkYZT3igP/3bWwsKR5vKVsDUTmMfrhcgaFvQiaYoG
 F29Bond8Xy0Zd0gl7OWh/5jKL0vGlrEVPrKfYLUjMnfkeRec/pOkIB2oOmIxpnPs
 9zi4kh2hQ3dEoRBuvSnnZzedetYPTuCpWMIjlztkgfxgcimqm8TPNVSxRaSApjC3
 Y8108wGwBWVf2C0rhKO9E2xA51uo6khy05i1psUtqUlC+PuDQ/OwzQHM2dnWdDB6
 kUwBDK17nhL6WwsYqCyKLSiDModReYfDiY8GS5MDLo74dzwXiatEefCR7+sbM4xq
 eX/SBoqoeS1jLPNuCryNeGNKvNA2KAbEJTnbQA2NxBXHgZ9/1SxVZFxuPp4nDMSQ
 N7BDuDI8YtJE479RjT/ZzRG65xadGBSe/HXkXM9mZwh1zitop8SVZ9fArFBHvNzw
 Y5zAv3fQd54+87psffg4dYFK0wGmqTabLEEuVzM8KIVqcAdYA2yC2b2EHy+vsxuq
 GMkr0WaA6Sq2gthXmzdTjmUPuHdan/NIhuV6d66SbPNH2oH31piptFxuznyFWSKV
 isciFFdUrkg5QrF8DSt2nmdwMFf8QGbszqP8QIGMzhJCCS9GXIiGG8f149++q8X8
 HO1lFAdLQJdrDwCYmfx36tOvi2rS/rcoTGgvg66UX3xKko1ruoxR1ZWcS54obJN6
 vEQDZ+PxubDg
 =vGLy
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/armbru/tags/pull-qapi-2020-12-19' into staging

QAPI patches patches for 2020-12-19

# gpg: Signature made Sat 19 Dec 2020 09:40:05 GMT
# gpg:                using RSA key 354BC8B3D7EB2A6B68674E5F3870B400EB918653
# gpg:                issuer "armbru@redhat.com"
# gpg: Good signature from "Markus Armbruster <armbru@redhat.com>" [full]
# gpg:                 aka "Markus Armbruster <armbru@pond.sub.org>" [full]
# Primary key fingerprint: 354B C8B3 D7EB 2A6B 6867  4E5F 3870 B400 EB91 8653

* remotes/armbru/tags/pull-qapi-2020-12-19: (33 commits)
  qobject: Make QString immutable
  block: Use GString instead of QString to build filenames
  keyval: Use GString to accumulate value strings
  json: Use GString instead of QString to accumulate strings
  migration: Replace migration's JSON writer by the general one
  qobject: Factor JSON writer out of qobject_to_json()
  qobject: Factor quoted_str() out of to_json()
  qobject: Drop qstring_get_try_str()
  qobject: Drop qobject_get_try_str()
  Revert "qobject: let object_property_get_str() use new API"
  block: Avoid qobject_get_try_str()
  qmp: Fix tracing of non-string command IDs
  qobject: Move internals to qobject-internal.h
  hw/rdma: Replace QList by GQueue
  Revert "qstring: add qstring_free()"
  qobject: Change qobject_to_json()'s value to GString
  qobject: Use GString instead of QString to accumulate JSON
  qobject: Make qobject_to_json_pretty() take a pretty argument
  monitor: Use GString instead of QString for output buffer
  hmp: Simplify how qmp_human_monitor_command() gets output
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-01-01 14:33:03 +00:00