Commit Graph

76 Commits

Author SHA1 Message Date
Vitaly Kuznetsov 3aae0854b2 i386: Hyper-V Direct TLB flush hypercall
Hyper-V TLFS allows for L0 and L1 hypervisors to collaborate on L2's
TLB flush hypercalls handling. With the correct setup, L2's TLB flush
hypercalls can be handled by L0 directly, without the need to exit to
L1.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20220525115949.1294004-6-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-05-25 21:26:35 +02:00
Vitaly Kuznetsov aa6bb5fad5 i386: Hyper-V Support extended GVA ranges for TLB flush hypercalls
KVM kind of supported "extended GVA ranges" (up to 4095 additional GFNs
per hypercall) since the implementation of Hyper-V PV TLB flush feature
(Linux-4.18) as regardless of the request, full TLB flush was always
performed. "Extended GVA ranges for TLB flush hypercalls" feature bit
wasn't exposed then. Now, as KVM gains support for fine-grained TLB
flush handling, exposing this feature starts making sense.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20220525115949.1294004-5-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-05-25 21:26:35 +02:00
Vitaly Kuznetsov 9411e8b6fa i386: Hyper-V XMM fast hypercall input feature
Hyper-V specification allows to pass parameters for certain hypercalls
using XMM registers ("XMM Fast Hypercall Input"). When the feature is
in use, it allows for faster hypercalls processing as KVM can avoid
reading guest's memory.

KVM supports the feature since v5.14.

Rename HV_HYPERCALL_{PARAMS_XMM_AVAILABLE -> XMM_INPUT_AVAILABLE} to
comply with KVM.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20220525115949.1294004-4-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-05-25 21:26:35 +02:00
Vitaly Kuznetsov 869840d26c i386: Hyper-V Enlightened MSR bitmap feature
The newly introduced enlightenment allow L0 (KVM) and L1 (Hyper-V)
hypervisors to collaborate to avoid unnecessary updates to L2
MSR-Bitmap upon vmexits.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20220525115949.1294004-3-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-05-25 21:26:35 +02:00
Vitaly Kuznetsov 7110fe56c1 i386: Use hv_build_cpuid_leaf() for HV_CPUID_NESTED_FEATURES
Previously, HV_CPUID_NESTED_FEATURES.EAX CPUID leaf was handled differently
as it was only used to encode the supported eVMCS version range. In fact,
there are also feature (e.g. Enlightened MSR-Bitmap) bits there. In
preparation to adding these features, move HV_CPUID_NESTED_FEATURES leaf
handling to hv_build_cpuid_leaf() and drop now-unneeded 'hyperv_nested'.

No functional change intended.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20220525115949.1294004-2-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-05-25 21:26:35 +02:00
Yang Weijiang 3a7a27cffb target/i386: Remove LBREn bit check when access Arch LBR MSRs
Live migration can happen when Arch LBR LBREn bit is cleared,
e.g., when migration happens after guest entered SMM mode.
In this case, we still need to migrate Arch LBR MSRs.

Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
Message-Id: <20220517155024.33270-1-weijiang.yang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-05-23 10:56:01 +02:00
Yang Weijiang 12703d4e75 target/i386: Add MSR access interface for Arch LBR
In the first generation of Arch LBR, the max support
Arch LBR depth is 32, both host and guest use the value
to set depth MSR. This can simplify the implementation
of patch given the side-effect of mismatch of host/guest
depth MSR: XRSTORS will reset all recording MSRs to 0s
if the saved depth mismatches MSR_ARCH_LBR_DEPTH.

In most of the cases Arch LBR is not in active status,
so check the control bit before save/restore the big
chunck of Arch LBR MSRs.

Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
Message-Id: <20220215195258.29149-7-weijiang.yang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-05-14 12:32:41 +02:00
Yang Weijiang 5a778a5f82 target/i386: Add kvm_get_one_msr helper
When try to get one msr from KVM, I found there's no such kind of
existing interface while kvm_put_one_msr() is there. So here comes
the patch. It'll remove redundant preparation code before finally
call KVM_GET_MSRS IOCTL.

No functional change intended.

Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
Message-Id: <20220215195258.29149-4-weijiang.yang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-05-14 12:32:41 +02:00
Jon Doron d8701185f4 hw: hyperv: Initial commit for Synthetic Debugging device
Signed-off-by: Jon Doron <arilou@gmail.com>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Message-Id: <20220216102500.692781-5-arilou@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-06 14:31:56 +02:00
Jon Doron 73d2407407 hyperv: Add support to process syndbg commands
SynDbg commands can come from two different flows:
1. Hypercalls, in this mode the data being sent is fully
   encapsulated network packets.
2. SynDbg specific MSRs, in this mode only the data that needs to be
   transfered is passed.

Signed-off-by: Jon Doron <arilou@gmail.com>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Message-Id: <20220216102500.692781-4-arilou@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-06 14:31:56 +02:00
luofei cb48748af7 i386: Set MCG_STATUS_RIPV bit for mce SRAR error
In the physical machine environment, when a SRAR error occurs,
the IA32_MCG_STATUS RIPV bit is set, but qemu does not set this
bit. When qemu injects an SRAR error into virtual machine, the
virtual machine kernel just call do_machine_check() to kill the
current task, but not call memory_failure() to isolate the faulty
page, which will cause the faulty page to be allocated and used
repeatedly. If used by the virtual machine kernel, it will cause
the virtual machine to crash

Signed-off-by: luofei <luofei@unicloud.com>
Message-Id: <20220120084634.131450-1-luofei@unicloud.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-03-23 12:22:25 +01:00
Philippe Mathieu-Daudé dcebbb65b8 target/i386/kvm: Free xsave_buf when destroying vCPU
Fix vCPU hot-unplug related leak reported by Valgrind:

  ==132362== 4,096 bytes in 1 blocks are definitely lost in loss record 8,440 of 8,549
  ==132362==    at 0x4C3B15F: memalign (vg_replace_malloc.c:1265)
  ==132362==    by 0x4C3B288: posix_memalign (vg_replace_malloc.c:1429)
  ==132362==    by 0xB41195: qemu_try_memalign (memalign.c:53)
  ==132362==    by 0xB41204: qemu_memalign (memalign.c:73)
  ==132362==    by 0x7131CB: kvm_init_xsave (kvm.c:1601)
  ==132362==    by 0x7148ED: kvm_arch_init_vcpu (kvm.c:2031)
  ==132362==    by 0x91D224: kvm_init_vcpu (kvm-all.c:516)
  ==132362==    by 0x9242C9: kvm_vcpu_thread_fn (kvm-accel-ops.c:40)
  ==132362==    by 0xB2EB26: qemu_thread_start (qemu-thread-posix.c:556)
  ==132362==    by 0x7EB2159: start_thread (in /usr/lib64/libpthread-2.28.so)
  ==132362==    by 0x9D45DD2: clone (in /usr/lib64/libc-2.28.so)

Reported-by: Mark Kanda <mark.kanda@oracle.com>
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Tested-by: Mark Kanda <mark.kanda@oracle.com>
Message-Id: <20220322120522.26200-1-philippe.mathieu.daude@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-03-23 12:22:25 +01:00
Paolo Bonzini 3ec5ad4008 target/i386: kvm: do not access uninitialized variable on older kernels
KVM support for AMX includes a new system attribute, KVM_X86_XCOMP_GUEST_SUPP.
Commit 19db68ca68 ("x86: Grant AMX permission for guest", 2022-03-15) however
did not fully consider the behavior on older kernels.  First, it warns
too aggressively.  Second, it invokes the KVM_GET_DEVICE_ATTR ioctl
unconditionally and then uses the "bitmask" variable, which remains
uninitialized if the ioctl fails.  Third, kvm_ioctl returns -errno rather
than -1 on errors.

While at it, explain why the ioctl is needed and KVM_GET_SUPPORTED_CPUID
is not enough.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-03-20 20:38:52 +01:00
Zeng Guang cdec2b753b x86: Support XFD and AMX xsave data migration
XFD(eXtended Feature Disable) allows to enable a
feature on xsave state while preventing specific
user threads from using the feature.

Support save and restore XFD MSRs if CPUID.D.1.EAX[4]
enumerate to be valid. Likewise migrate the MSRs and
related xsave state necessarily.

Signed-off-by: Zeng Guang <guang.zeng@intel.com>
Signed-off-by: Wei Wang <wei.w.wang@intel.com>
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
Message-Id: <20220217060434.52460-8-yang.zhong@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-03-15 11:50:50 +01:00
Jing Liu e56dd3c70a x86: add support for KVM_CAP_XSAVE2 and AMX state migration
When dynamic xfeatures (e.g. AMX) are used by the guest, the xsave
area would be larger than 4KB. KVM_GET_XSAVE2 and KVM_SET_XSAVE
under KVM_CAP_XSAVE2 works with a xsave buffer larger than 4KB.
Always use the new ioctls under KVM_CAP_XSAVE2 when KVM supports it.

Signed-off-by: Jing Liu <jing2.liu@intel.com>
Signed-off-by: Zeng Guang <guang.zeng@intel.com>
Signed-off-by: Wei Wang <wei.w.wang@intel.com>
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
Message-Id: <20220217060434.52460-7-yang.zhong@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-03-15 11:50:50 +01:00
Jing Liu f21a48171c x86: Add AMX CPUIDs enumeration
Add AMX primary feature bits XFD and AMX_TILE to
enumerate the CPU's AMX capability. Meanwhile, add
AMX TILE and TMUL CPUID leaf and subleaves which
exist when AMX TILE is present to provide the maximum
capability of TILE and TMUL.

Signed-off-by: Jing Liu <jing2.liu@intel.com>
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
Message-Id: <20220217060434.52460-6-yang.zhong@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-03-15 11:50:50 +01:00
Yang Zhong 19db68ca68 x86: Grant AMX permission for guest
Kernel allocates 4K xstate buffer by default. For XSAVE features
which require large state component (e.g. AMX), Linux kernel
dynamically expands the xstate buffer only after the process has
acquired the necessary permissions. Those are called dynamically-
enabled XSAVE features (or dynamic xfeatures).

There are separate permissions for native tasks and guests.

Qemu should request the guest permissions for dynamic xfeatures
which will be exposed to the guest. This only needs to be done
once before the first vcpu is created.

KVM implemented one new ARCH_GET_XCOMP_SUPP system attribute API to
get host side supported_xcr0 and Qemu can decide if it can request
dynamically enabled XSAVE features permission.
https://lore.kernel.org/all/20220126152210.3044876-1-pbonzini@redhat.com/

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
Signed-off-by: Jing Liu <jing2.liu@intel.com>
Message-Id: <20220217060434.52460-4-yang.zhong@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-03-15 11:50:50 +01:00
Longpeng(Mike) def4c5570c kvm/msi: do explicit commit when adding msi routes
We invoke the kvm_irqchip_commit_routes() for each addition to MSI route
table, which is not efficient if we are adding lots of routes in some cases.

This patch lets callers invoke the kvm_irqchip_commit_routes(), so the
callers can decide how to optimize.

[1] https://lists.gnu.org/archive/html/qemu-devel/2021-11/msg00967.html

Signed-off-by: Longpeng <longpeng2@huawei.com>
Message-Id: <20220222141116.2091-3-longpeng2@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-03-15 11:26:20 +01:00
Peter Maydell 5df022cf2e osdep: Move memalign-related functions to their own header
Move the various memalign-related functions out of osdep.h and into
their own header, which we include only where they are used.
While we're doing this, add some brief documentation comments.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20220226180723.1706285-10-peter.maydell@linaro.org
2022-03-07 13:16:49 +00:00
Paolo Bonzini 1520f8bb67 KVM: x86: ignore interrupt_bitmap field of KVM_GET/SET_SREGS
This is unnecessary, because the interrupt would be retrieved and queued
anyway by KVM_GET_VCPU_EVENTS and KVM_SET_VCPU_EVENTS respectively,
and it makes the flow more similar to the one for KVM_GET/SET_SREGS2.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-12 14:09:06 +01:00
Maxim Levitsky 8f515d3869 KVM: use KVM_{GET|SET}_SREGS2 when supported.
This allows to make PDPTRs part of the migration
stream and thus not reload them after migration which
is against X86 spec.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20211101132300.192584-2-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-12 14:09:06 +01:00
Philippe Mathieu-Daudé dc7d6cafce target/i386/kvm: Replace use of __u32 type
QEMU coding style mandates to not use Linux kernel internal
types for scalars types. Replace __u32 by uint32_t.

Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20211116193955.2793171-1-philmd@redhat.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2021-12-17 10:40:51 +01:00
Maxim Levitsky cabf9862e4 KVM: SVM: add migration support for nested TSC scaling
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20211101132300.192584-4-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-02 15:57:27 +01:00
Philippe Mathieu-Daudé deae846f94 target/i386/sev: Declare system-specific functions in 'sev.h'
"sysemu/sev.h" is only used from x86-specific files. Let's move it
to include/hw/i386, and merge it with target/i386/sev.h.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20211007161716.453984-16-philmd@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-13 10:47:49 +02:00
Philippe Mathieu-Daudé 93777de365 target/i386/sev: Rename sev_i386.h -> sev.h
SEV is a x86 specific feature, and the "sev_i386.h" header
is already in target/i386/. Rename it as "sev.h" to simplify.

Patch created mechanically using:

  $ git mv target/i386/sev_i386.h target/i386/sev.h
  $ sed -i s/sev_i386.h/sev.h/ $(git grep -l sev_i386.h)

Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20211007161716.453984-15-philmd@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-13 10:47:49 +02:00
Vitaly Kuznetsov af7228b88d i386: Make Hyper-V version id configurable
Currently, we hardcode Hyper-V version id (CPUID 0x40000002) to
WS2008R2 and it is known that certain tools in Windows check this. It
seems useful to provide some flexibility by making it possible to change
this info at will. CPUID information is defined in TLFS as:

EAX: Build Number
EBX Bits 31-16: Major Version
    Bits 15-0: Minor Version
ECX Service Pack
EDX Bits 31-24: Service Branch
    Bits 23-0: Service Number

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20210902093530.345756-8-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-01 19:04:45 +02:00
Vitaly Kuznetsov e1f9a8e8c9 i386: Implement pseudo 'hv-avic' ('hv-apicv') enlightenment
The enlightenment allows to use Hyper-V SynIC with hardware APICv/AVIC
enabled. Normally, Hyper-V SynIC disables these hardware features and
suggests the guest to use paravirtualized AutoEOI feature. Linux-4.15
gains support for conditional APICv/AVIC disablement, the feature
stays on until the guest tries to use AutoEOI feature with SynIC. With
'HV_DEPRECATING_AEOI_RECOMMENDED' bit exposed, modern enough Windows/
Hyper-V versions should follow the recommendation and not use the
(unwanted) feature.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20210902093530.345756-7-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-01 19:04:45 +02:00
Vitaly Kuznetsov 050716292a i386: Move HV_APIC_ACCESS_RECOMMENDED bit setting to hyperv_fill_cpuids()
In preparation to enabling Hyper-V + APICv/AVIC move
HV_APIC_ACCESS_RECOMMENDED setting out of kvm_hyperv_properties[]: the
'real' feature bit for the vAPIC features is HV_APIC_ACCESS_AVAILABLE,
HV_APIC_ACCESS_RECOMMENDED is a recommendation to use the feature which
we may not always want to give.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20210902093530.345756-6-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-01 19:04:45 +02:00
Vitaly Kuznetsov 70367f0917 i386: Support KVM_CAP_HYPERV_ENFORCE_CPUID
By default, KVM allows the guest to use all currently supported Hyper-V
enlightenments when Hyper-V CPUID interface was exposed, regardless of if
some features were not announced in guest visible CPUIDs. hv-enforce-cpuid
feature alters this behavior and only allows the guest to use exposed
Hyper-V enlightenments. The feature is supported by Linux >= 5.14 and is
not enabled by default in QEMU.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20210902093530.345756-5-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-01 19:04:45 +02:00
Vitaly Kuznetsov 988f7b8bfe i386: Support KVM_CAP_ENFORCE_PV_FEATURE_CPUID
By default, KVM allows the guest to use all currently supported PV features
even when they were not announced in guest visible CPUIDs. Introduce a new
"kvm-pv-enforce-cpuid" flag to limit the supported feature set to the
exposed features. The feature is supported by Linux >= 5.10 and is not
enabled by default in QEMU.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20210902093530.345756-4-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-01 19:04:45 +02:00
Peter Xu 142518bda5 memory: Name all the memory listeners
Provide a name field for all the memory listeners.  It can be used to identify
which memory listener is which.

Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Message-Id: <20210817013553.30584-2-peterx@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-09-30 15:30:24 +02:00
Sean Christopherson b9edbadefb i386: Propagate SGX CPUID sub-leafs to KVM
The SGX sub-leafs are enumerated at CPUID 0x12.  Indices 0 and 1 are
always present when SGX is supported, and enumerate SGX features and
capabilities.  Indices >=2 are directly correlated with the platform's
EPC sections.  Because the number of EPC sections is dynamic and user
defined, the number of SGX sub-leafs is "NULL" terminated.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
Message-Id: <20210719112136.57018-15-yang.zhong@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-09-30 14:50:20 +02:00
Sean Christopherson c22f546785 i386: kvm: Add support for exposing PROVISIONKEY to guest
If the guest want to fully use SGX, the guest needs to be able to
access provisioning key. Add a new KVM_CAP_SGX_ATTRIBUTE to KVM to
support provisioning key to KVM guests.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
Message-Id: <20210719112136.57018-14-yang.zhong@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-09-30 14:50:20 +02:00
Sean Christopherson a04835414b i386: Add feature control MSR dependency when SGX is enabled
SGX adds multiple flags to FEATURE_CONTROL to enable SGX and Flexible
Launch Control.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
Message-Id: <20210719112136.57018-12-yang.zhong@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-09-30 14:50:20 +02:00
Sean Christopherson db88806523 i386: Add get/set/migrate support for SGX_LEPUBKEYHASH MSRs
On real hardware, on systems that supports SGX Launch Control, those
MSRs are initialized to digest of Intel's signing key; on systems that
don't support SGX Launch Control, those MSRs are not available but
hardware always uses digest of Intel's signing key in EINIT.

KVM advertises SGX LC via CPUID if and only if the MSRs are writable.
Unconditionally initialize those MSRs to digest of Intel's signing key
when CPU is realized and reset to reflect the fact. This avoids
potential bug in case kvm_arch_put_registers() is called before
kvm_arch_get_registers() is called, in which case guest's virtual
SGX_LEPUBKEYHASH MSRs will be set to 0, although KVM initializes those
to digest of Intel's signing key by default, since KVM allows those MSRs
to be updated by Qemu to support live migration.

Save/restore the SGX Launch Enclave Public Key Hash MSRs if SGX Launch
Control (LC) is exposed to the guest. Likewise, migrate the MSRs if they
are writable by the guest.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
Message-Id: <20210719112136.57018-11-yang.zhong@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-09-30 14:50:20 +02:00
Markus Armbruster 436c831a28 migration: Unify failure check for migrate_add_blocker()
Most callers check the return value.  Some check whether it set an
error.  Functionally equivalent, but the former tends to be easier on
the eyes, so do that everywhere.

Prior art: commit c6ecec43b2 "qemu-option: Check return value instead
of @err where convenient".

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20210720125408.387910-10-armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
2021-08-26 17:15:28 +02:00
Markus Armbruster a5c051b2cf i386: Never free migration blocker objects instead of sometimes
invtsc_mig_blocker has static storage duration.  When a CPU with
certain features is initialized, and invtsc_mig_blocker is still null,
we add a migration blocker and store it in invtsc_mig_blocker.

The object is freed when migrate_add_blocker() fails, leaving
invtsc_mig_blocker dangling.  It is not freed on later failures.

Same for hv_passthrough_mig_blocker and hv_no_nonarch_cs_mig_blocker.

All failures are actually fatal, so whether we free or not doesn't
really matter, except as bad examples to be copied / imitated.

Clean this up in a minimal way: never free these blocker objects.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20210720125408.387910-7-armbru@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
2021-08-26 17:15:28 +02:00
Vitaly Kuznetsov e4adb09f79 i386: assert 'cs->kvm_state' is not null
Coverity reports potential NULL pointer dereference in
get_supported_hv_cpuid_legacy() when 'cs->kvm_state' is NULL. While
'cs->kvm_state' can indeed be NULL in hv_cpuid_get_host(),
kvm_hyperv_expand_features() makes sure that it only happens when
KVM_CAP_SYS_HYPERV_CPUID is supported and KVM_CAP_SYS_HYPERV_CPUID
implies KVM_CAP_HYPERV_CPUID so get_supported_hv_cpuid_legacy() is
never really called. Add asserts to strengthen the protection against
broken KVM behavior.

Coverity: CID 1458243
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20210716115852.418293-1-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-07-29 10:15:51 +02:00
Vitaly Kuznetsov cce087f628 i386: Hyper-V SynIC requires POST_MESSAGES/SIGNAL_EVENTS privileges
When Hyper-V SynIC is enabled, we may need to allow Windows guests to make
hypercalls (POST_MESSAGES/SIGNAL_EVENTS). No issue is currently observed
because KVM is very permissive, allowing these hypercalls regarding of
guest visible CPUid bits.

Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20210608120817.1325125-9-vkuznets@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2021-07-13 09:13:29 -04:00
Vitaly Kuznetsov b26f68c36b i386: HV_HYPERCALL_AVAILABLE privilege bit is always needed
According to TLFS, Hyper-V guest is supposed to check
HV_HYPERCALL_AVAILABLE privilege bit before accessing
HV_X64_MSR_GUEST_OS_ID/HV_X64_MSR_HYPERCALL MSRs but at least some
Windows versions ignore that. As KVM is very permissive and allows
accessing these MSRs unconditionally, no issue is observed. We may,
however, want to tighten the checks eventually. Conforming to the
spec is probably also a good idea.

Enable HV_HYPERCALL_AVAILABLE bit unconditionally.

Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20210608120817.1325125-8-vkuznets@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2021-07-13 09:13:29 -04:00
Vitaly Kuznetsov 5ce48fa354 i386: kill off hv_cpuid_check_and_set()
hv_cpuid_check_and_set() does too much:
- Checks if the feature is supported by KVM;
- Checks if all dependencies are enabled;
- Sets the feature bit in cpu->hyperv_features for 'passthrough' mode.

To reduce the complexity, move all the logic except for dependencies
check out of it. Also, in 'passthrough' mode we don't really need to
check dependencies because KVM is supposed to provide a consistent
set anyway.

Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20210608120817.1325125-7-vkuznets@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2021-07-13 09:13:29 -04:00
Vitaly Kuznetsov 071ce4b03b i386: expand Hyper-V features during CPU feature expansion time
To make Hyper-V features appear in e.g. QMP query-cpu-model-expansion we
need to expand and set the corresponding CPUID leaves early. Modify
x86_cpu_get_supported_feature_word() to call newly intoduced Hyper-V
specific kvm_hv_get_supported_cpuid() instead of
kvm_arch_get_supported_cpuid(). We can't use kvm_arch_get_supported_cpuid()
as Hyper-V specific CPUID leaves intersect with KVM's.

Note, early expansion will only happen when KVM supports system wide
KVM_GET_SUPPORTED_HV_CPUID ioctl (KVM_CAP_SYS_HYPERV_CPUID).

Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20210608120817.1325125-6-vkuznets@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2021-07-13 09:13:29 -04:00
Vitaly Kuznetsov d7652b772f i386: make hyperv_expand_features() return bool
Return 'false' when hyperv_expand_features() sets an error.

No functional change intended.

Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20210608120817.1325125-5-vkuznets@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2021-07-13 09:13:29 -04:00
Vitaly Kuznetsov 07454e2ea8 i386: hardcode supported eVMCS version to '1'
Currently, the only eVMCS version, supported by KVM (and described in TLFS)
is '1'. When Enlightened VMCS feature is enabled, QEMU takes the supported
eVMCS version range (from KVM_CAP_HYPERV_ENLIGHTENED_VMCS enablement) and
puts it to guest visible CPUIDs. When (and if) eVMCS ver.2 appears a
problem on migration is expected: it doesn't seem to be possible to migrate
from a host supporting eVMCS ver.2 to a host, which only support eVMCS
ver.1.

Hardcode eVMCS ver.1 as the result of 'hv-evmcs' enablement for now. Newer
eVMCS versions will have to have their own enablement options (e.g.
'hv-evmcs=2').

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20210608120817.1325125-4-vkuznets@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2021-07-13 09:13:29 -04:00
David Edmondson fea4500841 target/i386: Populate x86_ext_save_areas offsets using cpuid where possible
Rather than relying on the X86XSaveArea structure definition,
determine the offset of XSAVE state areas using CPUID leaf 0xd where
possible (KVM and HVF).

Signed-off-by: David Edmondson <david.edmondson@oracle.com>
Message-Id: <20210705104632.2902400-8-david.edmondson@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-07-06 08:33:48 +02:00
David Edmondson c0198c5f87 target/i386: Pass buffer and length to XSAVE helper
In preparation for removing assumptions about XSAVE area offsets, pass
a buffer pointer and buffer length to the XSAVE helper functions.

Signed-off-by: David Edmondson <david.edmondson@oracle.com>
Message-Id: <20210705104632.2902400-5-david.edmondson@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-07-06 07:54:53 +02:00
David Edmondson 436463b84b target/i386: Consolidate the X86XSaveArea offset checks
Rather than having similar but different checks in cpu.h and kvm.c,
move them all to cpu.h.
Message-Id: <20210705104632.2902400-3-david.edmondson@oracle.com>

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-07-06 07:54:53 +02:00
Paolo Bonzini 9ce8af4d92 target/i386: kvm: add support for TSC scaling
Linux 5.14 will add support for nested TSC scaling.  Add the
corresponding feature in QEMU; to keep support for existing kernels,
do not add it to any processor yet.

The handling of the VMCS enumeration MSR is ugly; once we have more than
one case, we may want to add a table to check VMX features against.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-25 10:53:46 +02:00
Chenyi Qiang 035d1ef265 i386: Add ratelimit for bus locks acquired in guest
A bus lock is acquired through either split locked access to writeback
(WB) memory or any locked access to non-WB memory. It is typically >1000
cycles slower than an atomic operation within a cache and can also
disrupts performance on other cores.

Virtual Machines can exploit bus locks to degrade the performance of
system. To address this kind of performance DOS attack coming from the
VMs, bus lock VM exit is introduced in KVM and it can report the bus
locks detected in guest. If enabled in KVM, it would exit to the
userspace to let the user enforce throttling policies once bus locks
acquired in VMs.

The availability of bus lock VM exit can be detected through the
KVM_CAP_X86_BUS_LOCK_EXIT. The returned bitmap contains the potential
policies supported by KVM. The field KVM_BUS_LOCK_DETECTION_EXIT in
bitmap is the only supported strategy at present. It indicates that KVM
will exit to userspace to handle the bus locks.

This patch adds a ratelimit on the bus locks acquired in guest as a
mitigation policy.

Introduce a new field "bus_lock_ratelimit" to record the limited speed
of bus locks in the target VM. The user can specify it through the
"bus-lock-ratelimit" as a machine property. In current implementation,
the default value of the speed is 0 per second, which means no
restrictions on the bus locks.

As for ratelimit on detected bus locks, simply set the ratelimit
interval to 1s and restrict the quota of bus lock occurence to the value
of "bus_lock_ratelimit". A potential alternative is to introduce the
time slice as a property which can help the user achieve more precise
control.

The detail of bus lock VM exit can be found in spec:
https://software.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.html

Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
Message-Id: <20210521043820.29678-1-chenyi.qiang@intel.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2021-06-17 14:11:06 -04:00
Vitaly Kuznetsov 5aa9ef5e4b i386: use global kvm_state in hyperv_enabled() check
There is no need to use vCPU-specific kvm state in hyperv_enabled() check
and we need to do that when feature expansion happens early, before vCPU
specific KVM state is created.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20210422161130.652779-15-vkuznets@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2021-05-31 15:53:03 -04:00