Commit Graph

121 Commits

Author SHA1 Message Date
David Hildenbrand
25459eb762 exec: Get rid of phys_mem_set_alloc()
As the last user is gone, we can get rid of phys_mem_set_alloc() and
simplify.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Halil Pasic <pasic@linux.ibm.com>
Cc: Cornelia Huck <cohuck@redhat.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Thomas Huth <thuth@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20210303130916.22553-3-david@redhat.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
2021-03-15 11:01:23 +01:00
Tom Lendacky
92a5199b29 sev/i386: Don't allow a system reset under an SEV-ES guest
An SEV-ES guest does not allow register state to be altered once it has
been measured. When an SEV-ES guest issues a reboot command, Qemu will
reset the vCPU state and resume the guest. This will cause failures under
SEV-ES. Prevent that from occuring by introducing an arch-specific
callback that returns a boolean indicating whether vCPUs are resettable.

Cc: Peter Maydell <peter.maydell@linaro.org>
Cc: Aurelien Jarno <aurelien@aurel32.net>
Cc: Jiaxun Yang <jiaxun.yang@flygoat.com>
Cc: Aleksandar Rikalo <aleksandar.rikalo@syrmia.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: David Hildenbrand <david@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Venu Busireddy <venu.busireddy@oracle.com>
Message-Id: <1ac39c441b9a3e970e9556e1cc29d0a0814de6fd.1611682609.git.thomas.lendacky@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-02-16 17:15:39 +01:00
David Gibson
aacdb84413 sev: Remove false abstraction of flash encryption
When AMD's SEV memory encryption is in use, flash memory banks (which are
initialed by pc_system_flash_map()) need to be encrypted with the guest's
key, so that the guest can read them.

That's abstracted via the kvm_memcrypt_encrypt_data() callback in the KVM
state.. except, that it doesn't really abstract much at all.

For starters, the only call site is in code specific to the 'pc'
family of machine types, so it's obviously specific to those and to
x86 to begin with.  But it makes a bunch of further assumptions that
need not be true about an arbitrary confidential guest system based on
memory encryption, let alone one based on other mechanisms:

 * it assumes that the flash memory is defined to be encrypted with the
   guest key, rather than being shared with hypervisor
 * it assumes that that hypervisor has some mechanism to encrypt data into
   the guest, even though it can't decrypt it out, since that's the whole
   point
 * the interface assumes that this encrypt can be done in place, which
   implies that the hypervisor can write into a confidential guests's
   memory, even if what it writes isn't meaningful

So really, this "abstraction" is actually pretty specific to the way SEV
works.  So, this patch removes it and instead has the PC flash
initialization code call into a SEV specific callback.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
2021-02-08 16:57:38 +11:00
Claudio Fontana
940e43aa30 accel: extend AccelState and AccelClass to user-mode
Signed-off-by: Claudio Fontana <cfontana@suse.de>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

[claudio: rebased on Richard's splitwx work]

Signed-off-by: Claudio Fontana <cfontana@suse.de>
Message-Id: <20210204163931.7358-17-cfontana@suse.de>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-02-05 10:24:15 -10:00
Eduardo Habkost
31c707fb4d kvm: Correct documentation of kvm_irqchip_*()
When split irqchip support was introduced, the meaning of
kvm_irqchip_in_kernel() changed: now it only means the LAPIC is
in kernel.  The PIC, IOAPIC, and PIT might be in userspace if
irqchip=split was set.  Update the doc comment to reflect that.

While at it, remove the "the user asked us" part in
kvm_irqchip_is_split() doc comment.  That macro has nothing to do
with existence of explicit user-provided options.

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20200922203612.2178370-1-ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2020-10-14 15:28:54 -04:00
Claudio Fontana
e0715f6abc kvm: remove kvm specific functions from global includes
Signed-off-by: Claudio Fontana <cfontana@suse.de>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-10-05 16:41:22 +02:00
Claudio Fontana
57038a92bb cpus: extract out kvm-specific code to accel/kvm
register a "CpusAccel" interface for KVM as well.

Signed-off-by: Claudio Fontana <cfontana@suse.de>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
[added const]
Signed-off-by: Claudio Fontana <cfontana@suse.de>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-10-05 16:41:22 +02:00
Eduardo Habkost
8110fa1d94 Use DECLARE_*CHECKER* macros
Generated using:

 $ ./scripts/codeconverter/converter.py -i \
   --pattern=TypeCheckMacro $(git grep -l '' -- '*.[ch]')

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-Id: <20200831210740.126168-12-ehabkost@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-Id: <20200831210740.126168-13-ehabkost@redhat.com>
Message-Id: <20200831210740.126168-14-ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2020-09-09 09:27:09 -04:00
Eduardo Habkost
db1015e92e Move QOM typedefs and add missing includes
Some typedefs and macros are defined after the type check macros.
This makes it difficult to automatically replace their
definitions with OBJECT_DECLARE_TYPE.

Patch generated using:

 $ ./scripts/codeconverter/converter.py -i \
   --pattern=QOMStructTypedefSplit $(git grep -l '' -- '*.[ch]')

which will split "typdef struct { ... } TypedefName"
declarations.

Followed by:

 $ ./scripts/codeconverter/converter.py -i --pattern=MoveSymbols \
    $(git grep -l '' -- '*.[ch]')

which will:
- move the typedefs and #defines above the type check macros
- add missing #include "qom/object.h" lines if necessary

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-Id: <20200831210740.126168-9-ehabkost@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-Id: <20200831210740.126168-10-ehabkost@redhat.com>
Message-Id: <20200831210740.126168-11-ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2020-09-09 09:26:43 -04:00
Eduardo Habkost
97e622ded7 kvm: Move QOM macros to kvm.h
Move QOM macros close to the KVMState typedef.

This will make future conversion to OBJECT_DECLARE* easier.

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Tested-By: Roman Bolshakov <r.bolshakov@yadro.com>
Message-Id: <20200825192110.3528606-42-ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2020-08-27 14:04:55 -04:00
Peter Xu
c82d9d43ed KVM: Kick resamplefd for split kernel irqchip
This is majorly only for X86 because that's the only one that supports
split irqchip for now.

When the irqchip is split, we face a dilemma that KVM irqfd will be
enabled, however the slow irqchip is still running in the userspace.
It means that the resamplefd in the kernel irqfds won't take any
effect and it will miss to ack INTx interrupts on EOIs.

One example is split irqchip with VFIO INTx, which will break if we
use the VFIO INTx fast path.

This patch can potentially supports the VFIO fast path again for INTx,
that the IRQ delivery will still use the fast path, while we don't
need to trap MMIOs in QEMU for the device to emulate the EIOs (see the
callers of vfio_eoi() hook).  However the EOI of the INTx will still
need to be done from the userspace by caching all the resamplefds in
QEMU and kick properly for IOAPIC EOI broadcast.

This is tricky because in this case the userspace ioapic irr &
remote-irr will be bypassed.  However such a change will greatly boost
performance for assigned devices using INTx irqs (TCP_RR boosts 46%
after this patch applied).

When the userspace is responsible for the resamplefd kickup, don't
register it on the kvm_irqfd anymore, because on newer kernels (after
commit 654f1f13ea56, 5.2+) the KVM_IRQFD will fail if with both split
irqchip and resamplefd.  This will make sure that the fast path will
work for all supported kernels.

https://patchwork.kernel.org/patch/10738541/#22609933

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20200318145204.74483-5-peterx@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-10 12:10:33 -04:00
Dongjiu Geng
e24fd076a5 target-arm: kvm64: handle SIGBUS signal from kernel or KVM
Add a SIGBUS signal handler. In this handler, it checks the SIGBUS type,
translates the host VA delivered by host to guest PA, then fills this PA
to guest APEI GHES memory, then notifies guest according to the SIGBUS
type.

When guest accesses the poisoned memory, it will generate a Synchronous
External Abort(SEA). Then host kernel gets an APEI notification and calls
memory_failure() to unmapped the affected page in stage 2, finally
returns to guest.

Guest continues to access the PG_hwpoison page, it will trap to KVM as
stage2 fault, then a SIGBUS_MCEERR_AR synchronous signal is delivered to
Qemu, Qemu records this error address into guest APEI GHES memory and
notifes guest using Synchronous-External-Abort(SEA).

In order to inject a vSEA, we introduce the kvm_inject_arm_sea() function
in which we can setup the type of exception and the syndrome information.
When switching to guest, the target vcpu will jump to the synchronous
external abort vector table entry.

The ESR_ELx.DFSC is set to synchronous external abort(0x10), and the
ESR_ELx.FnV is set to not valid(0x1), which will tell guest that FAR is
not valid and hold an UNKNOWN value. These values will be set to KVM
register structures through KVM_SET_ONE_REG IOCTL.

Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
Signed-off-by: Xiang Zheng <zhengxiang9@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Xiang Zheng <zhengxiang9@huawei.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Message-id: 20200512030609.19593-10-gengdongjiu@huawei.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2020-05-14 15:03:09 +01:00
Philippe Mathieu-Daudé
64c033badd hw/i386: De-duplicate gsi_handler() to remove kvm_pc_gsi_handler()
Both gsi_handler() and kvm_pc_gsi_handler() have the same content,
except one comment. Move the comment, and de-duplicate the code.

Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-12-17 19:33:51 +01:00
Paolo Bonzini
4376c40ded kvm: introduce kvm_kernel_irqchip_* functions
The KVMState struct is opaque, so provide accessors for the fields
that will be moved from current_machine to the accelerator.  For now
they just forward to the machine object, but this will change.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-12-17 19:32:45 +01:00
David Gibson
3607715a30 kvm: Introduce KVM irqchip change notifier
Awareness of an in kernel irqchip is usually local to the machine and its
top-level interrupt controller.  However, in a few cases other things need
to know about it.  In particular vfio devices need this in order to
accelerate interrupt delivery.

If interrupt routing is changed, such devices may need to readjust their
connection to the KVM irqchip.  pci_bus_fire_intx_routing_notifier() exists
to do just this.

However, for the pseries machine type we have a situation where the routing
remains constant but the top-level irq chip itself is changed.  This occurs
because of PAPR feature negotiation which allows the guest to decide
between the older XICS and newer XIVE irq chip models (both of which are
paravirtualized).

To allow devices like vfio to adjust to this change, introduce a new
notifier for the purpose kvm_irqchip_change_notify().

Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Alexey Kardashevskiy <aik@ozlabs.ru>

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Tested-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
2019-11-26 10:11:30 +11:00
Paolo Bonzini
ede146c2e7 target/i386: expand feature words to 64 bits
VMX requires 64-bit feature words for the IA32_VMX_EPT_VPID_CAP
and IA32_VMX_BASIC MSRs.  (The VMX control MSRs are 64-bit wide but
actually have only 32 bits of information).

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-04 18:49:19 +02:00
Greg Kurz
84b6ea05ea kvm: Fix typo in header of kvm_device_access()
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <156829664683.2070256.13400788010568373502.stgit@bahia.tls.ibm.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2019-09-19 14:42:31 +02:00
Markus Armbruster
2e5b09fd0e hw/core: Move cpu.c, cpu.h from qom/ to hw/core/
Suggested-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20190709152053.16670-2-armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
[Rebased onto merge commit 95a9457fd44; missed instances of qom/cpu.h
in comments replaced]
2019-08-21 13:24:01 +02:00
Markus Armbruster
64552b6be4 Include hw/irq.h a lot less
In my "build everything" tree, changing hw/irq.h triggers a recompile
of some 5400 out of 6600 objects (not counting tests and objects that
don't depend on qemu/osdep.h).

hw/hw.h supposedly includes it for convenience.  Several other headers
include it just to get qemu_irq and.or qemu_irq_handler.

Move the qemu_irq and qemu_irq_handler typedefs from hw/irq.h to
qemu/typedefs.h, and then include hw/irq.h only where it's still
needed.  Touching it now recompiles only some 500 objects.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20190812052359.30071-13-armbru@redhat.com>
2019-08-16 13:31:52 +02:00
Liran Alon
ebbfef2f34 target/i386: kvm: Add support for save and restore nested state
Kernel commit 8fcc4b5923af ("kvm: nVMX: Introduce KVM_CAP_NESTED_STATE")
introduced new IOCTLs to extract and restore vCPU state related to
Intel VMX & AMD SVM.

Utilize these IOCTLs to add support for migration of VMs which are
running nested hypervisors.

Reviewed-by: Nikita Leshenko <nikita.leshchenko@oracle.com>
Reviewed-by: Maran Wilson <maran.wilson@oracle.com>
Tested-by: Maran Wilson <maran.wilson@oracle.com>
Signed-off-by: Liran Alon <liran.alon@oracle.com>
Message-Id: <20190619162140.133674-9-liran.alon@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-06-21 13:23:47 +02:00
Liran Alon
b1115c9991 KVM: Introduce kvm_arch_destroy_vcpu()
Simiar to how kvm_init_vcpu() calls kvm_arch_init_vcpu() to perform
arch-dependent initialisation, introduce kvm_arch_destroy_vcpu()
to be called from kvm_destroy_vcpu() to perform arch-dependent
destruction.

This was added because some architectures (Such as i386)
currently do not free memory that it have allocated in
kvm_arch_init_vcpu().

Suggested-by: Maran Wilson <maran.wilson@oracle.com>
Reviewed-by: Maran Wilson <maran.wilson@oracle.com>
Signed-off-by: Liran Alon <liran.alon@oracle.com>
Message-Id: <20190619162140.133674-3-liran.alon@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-06-21 02:29:39 +02:00
Paolo Bonzini
b58deb344d qemu/queue.h: leave head structs anonymous unless necessary
Most list head structs need not be given a name.  In most cases the
name is given just in case one is going to use QTAILQ_LAST, QTAILQ_PREV
or reverse iteration, but this does not apply to lists of other kinds,
and even for QTAILQ in practice this is only rarely needed.  In addition,
we will soon reimplement those macros completely so that they do not
need a name for the head struct.  So clean up everything, not giving a
name except in the rare case where it is necessary.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-01-11 15:46:55 +01:00
Robert Hoo
f57bceb6ab kvm: Add support to KVM_GET_MSR_FEATURE_INDEX_LIST and KVM_GET_MSRS system ioctl
Add kvm_get_supported_feature_msrs() to get supported MSR feature index list.
Add kvm_arch_get_supported_msr_feature() to get each MSR features value.

Signed-off-by: Robert Hoo <robert.hu@linux.intel.com>
Message-Id: <1539578845-37944-2-git-send-email-robert.hu@linux.intel.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2018-10-30 21:14:43 -03:00
Michael S. Tsirkin
1814eab673 x86/cpu: use standard-headers/asm-x86.kvm_para.h
Switch to the header we imported from Linux,
this allows us to drop a hack in kvm_i386.h.
More code will be dropped in the next patch.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-05-23 03:14:41 +03:00
Brijesh Singh
54e8953967 kvm: introduce memory encryption APIs
Inorder to integerate the Secure Encryption Virtualization (SEV) support
add few high-level memory encryption APIs which can be used for encrypting
the guest memory region.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: kvm@vger.kernel.org
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-03-13 12:04:03 +01:00
Brijesh Singh
b20e37801f kvm: add memory encryption context
Split from a patch by Brijesh Singh (brijesh.singh@amd.com).

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
2018-03-13 12:04:03 +01:00
Marcel Apfelbaum
06329ccecf mem: add share parameter to memory-backend-ram
Currently only file backed memory backend can
be created with a "share" flag in order to allow
sharing guest RAM with other processes in the host.

Add the "share" flag also to RAM Memory Backend
in order to allow remapping parts of the guest RAM
to different host virtual addresses. This is needed
by the RDMA devices in order to remap non-contiguous
QEMU virtual addresses to a contiguous virtual address range.

Moved the "share" flag to the Host Memory base class,
modified phys_mem_alloc to include the new parameter
and a new interface memory_region_init_ram_shared_nomigrate.

There are no functional changes if the new flag is not used.

Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Marcel Apfelbaum <marcel@redhat.com>
2018-02-19 13:03:24 +02:00
Greg Kurz
62dd4edaaf kvm: check KVM_CAP_SYNC_MMU with kvm_vm_check_extension()
On a server-class ppc host, this capability depends on the KVM type,
ie, HV or PR. If both KVM are present in the kernel, we will always
get the HV specific value, even if we explicitely requested PR on
the command line.

This can have an impact if we're using hugepages or a balloon device.

Since we've already created the VM at the time any user calls
kvm_has_sync_mmu(), switching to kvm_vm_check_extension() is
enough to fix any potential issue.

It is okay for the other archs that also implement KVM_CAP_SYNC_MMU,
ie, mips, s390, x86 and arm, because they don't depend on the VM being
created or not.

While here, let's cache the state of this extension in a bool variable,
since it has several users in the code, as suggested by Thomas Huth.

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <150600965332.30533.14702405809647835716.stgit@bahia.lan>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-10-02 14:38:06 +02:00
Greg Kurz
1b7ac7cab6 kvm: workaround build break on gcc-7.1.1 / fedora26
Building QEMU on fedora26 with the latest gcc package fails:

  CC      ppc64-softmmu/target/ppc/kvm.o
In file included from include/sysemu/hw_accel.h:16:0,
                 from target/ppc/kvm.c:31:
target/ppc/kvm.c: In function ‘kvmppc_booke_watchdog_enable’:
include/sysemu/kvm.h:449:35: error: ‘args_tmp[i]’ may be used uninitialized
 in this function [-Werror=maybe-uninitialized]
             cap.args[i] = args_tmp[i];                               \
                                   ^
target/ppc/kvm.c: In function ‘kvmppc_set_papr’:
include/sysemu/kvm.h:449:35: error: ‘args_tmp[i]’ may be used uninitialized
 in this function [-Werror=maybe-uninitialized]
cc1: all warnings being treated as errors

$ rpm -q gcc
gcc-7.1.1-3.fc26.ppc64le

The compiler should obviously optimize this code away when no extra
agument is passed to kvm_vm_enable_cap() and kvm_vcpu_enable_cap(),
but it doesn't. This bug should be fixed one day in gcc, but we can
also change our code pattern so that we don't hit the issue anymore.
We workaround this, by using memcpy() instead of open-coding the copy.

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <150210580404.1343.7325713896658799315.stgit@bahia.lan>
Acked-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-08-08 10:40:20 +02:00
Alexander Graf
5d721b785f ARM: KVM: Enable in-kernel timers with user space gic
When running with KVM enabled, you can choose between emulating the
gic in kernel or user space. If the kernel supports in-kernel virtualization
of the interrupt controller, it will default to that. If not, if will
default to user space emulation.

Unfortunately when running in user mode gic emulation, we miss out on
interrupt events which are only available from kernel space, such as the timer.
This patch leverages the new kernel/user space pending line synchronization for
timer events. It does not handle PMU events yet.

Signed-off-by: Alexander Graf <agraf@suse.de>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Message-id: 1498577737-130264-1-git-send-email-agraf@suse.de
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-07-11 11:21:26 +01:00
Thomas Huth
cbca3722a3 include/exec/poison: Mark CONFIG_KVM as poisoned, too
CONFIG_KVM is only defined for target-specific code, so nobody should
use it by accident in common code. To avoid such subtle bugs,
CONFIG_KVM is now marked as poisoned in common code. The header
include/sysemu/kvm.h is somewhat special since it is included
all over the place from common code, too, so we need some extra
logic via "#ifdef NEED_CPU_H" here to make sure that we can
compile all files without problems.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <1498454578-18709-4-git-send-email-thuth@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-07-04 14:30:03 +02:00
Thomas Huth
2099935dbf Move CONFIG_KVM related definitions to kvm_i386.h
pc.h and sysemu/kvm.h are also included from common code (where
CONFIG_KVM is not available), so the #defines that depend on CONFIG_KVM
should not be declared here to avoid that anybody is using them in a
wrong way. Since we're also going to poison CONFIG_KVM for common code,
let's move them to kvm_i386.h instead. Most of the dummy definitions
from sysemu/kvm.h are also unused since the code that uses them is
only compiled for CONFIG_KVM (e.g. target/i386/kvm.c), so the unused
defines are also simply dropped here instead of being moved.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <1498454578-18709-3-git-send-email-thuth@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-07-04 14:30:03 +02:00
Eric Auger
556969e938 kvm-all: Pass an error object to kvm_device_access
In some circumstances, we don't want to abort if the
kvm_device_access fails. This will be the case during ITS
migration, in case the ITS table save/restore fails because
the guest did not program the vITS correctly. So let's pass an
error object to the function and return the ioctl value. New
callers will be able to make a decision upon this returned
value.

Existing callers pass &error_abort which will cause the
function to abort on failure.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Message-id: 1497023553-18411-2-git-send-email-eric.auger@redhat.com
[PMM: wrapped long line]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-06-13 14:57:00 +01:00
David Gibson
75e972dab5 migration: Mark CPU states dirty before incoming migration/loadvm
As a rule, CPU internal state should never be updated when
!cpu->kvm_vcpu_dirty (or the HAX equivalent).  If that is done, then
subsequent calls to cpu_synchronize_state() - usually safe and idempotent -
will clobber state.

However, we routinely do this during a loadvm or incoming migration.
Usually this is called shortly after a reset, which will clear all the cpu
dirty flags with cpu_synchronize_all_post_reset().  Nothing is expected
to set the dirty flags again before the cpu state is loaded from the
incoming stream.

This means that it isn't safe to call cpu_synchronize_state() from a
post_load handler, which is non-obvious and potentially inconvenient.

We could cpu_synchronize_all_state() before the loadvm, but that would be
overkill since a) we expect the state to already be synchronized from the
reset and b) we expect to completely rewrite the state with a call to
cpu_synchronize_all_post_init() at the end of qemu_loadvm_state().

To clear this up, this patch introduces cpu_synchronize_pre_loadvm() and
associated helpers, which simply marks the cpu state as dirty without
actually changing anything.  i.e. it says we want to discard any existing
KVM (or HAX) state and replace it with what we're going to load.

Cc: Juan Quintela <quintela@redhat.com>
Cc: Dave Gilbert <dgilbert@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Juan Quintela <quintela@redhat.com>
2017-06-06 08:53:24 +10:00
Sam Bobroff
c64abd1f9c spapr: Add ibm,processor-radix-AP-encodings to the device tree
Use the new ioctl, KVM_PPC_GET_RMMU_INFO, to fetch radix MMU
information from KVM and present the page encodings in the device tree
under ibm,processor-radix-AP-encodings. This provides page size
information to the guest which is necessary for it to use radix mode.

Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
[dwg: Compile fix for 32-bit targets, style nit fix]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-04-26 12:00:41 +10:00
Paolo Bonzini
18268b6016 KVM: move SIG_IPI handling to kvm-all.c
This lets us remove a bunch of CONFIG_LINUX defines.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-03-03 16:40:02 +01:00
Paolo Bonzini
2ae41db262 KVM: do not use sigtimedwait to catch SIGBUS
Call kvm_on_sigbus_vcpu asynchronously from the VCPU thread.
Information for the SIGBUS can be stored in thread-local variables
and processed later in kvm_cpu_exec.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-03-03 16:40:02 +01:00
Paolo Bonzini
4d39892cca KVM: remove kvm_arch_on_sigbus
Build it on kvm_arch_on_sigbus_vcpu instead.  They do the same
for "action optional" SIGBUSes, and the main thread should never get
"action required" SIGBUSes because it blocks the signal.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-03-03 16:40:02 +01:00
Vincent Palatin
b39466269b kvm: move cpu synchronization code
Move the generic cpu_synchronize_ functions to the common hw_accel.h header,
in order to prepare for the addition of a second hardware accelerator.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Vincent Palatin <vpalatin@chromium.org>
Message-Id: <f5c3cffe8d520011df1c2e5437bb814989b48332.1484045952.git.vpalatin@chromium.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-01-19 22:07:46 +01:00
Pavel Fedin
767a554a0c kvm-all: Pass requester ID to MSI routing functions
Introduce global kvm_msi_use_devid flag plus associated
kvm_msi_devid_required() macro. Passes the device ID,
if needed, while building the MSI route entry. Device IDs are
required by the ARM GICv3 ITS (IRQ remapping function is based on
this information).

Signed-off-by: Pavel Fedin <p.fedin@samsung.com>
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Message-id: 1474616617-366-5-git-send-email-eric.auger@redhat.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-10-04 13:28:09 +01:00
Ladi Prosek
d4b84d564e Remove unused function declarations
Unused function declarations were found using a simple gcc plugin and
manually verified by grepping the sources.

Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2016-09-15 15:32:22 +03:00
Dr. David Alan Gilbert
78d6a05d2f x86/lapic: Load LAPIC state at post_load
Load the LAPIC state during post_load (rather than when the CPU
starts).

This allows an interrupt to be delivered from the ioapic to
the lapic prior to cpu loading, in particular the RTC that starts
ticking as soon as we load it's state.

Fixes a case where Windows hangs after migration due to RTC interrupts
disappearing.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-09-13 19:09:44 +02:00
Cao jin
c2cd627ddb kvm-all: drop kvm_setup_guest_memory
kvm_setup_guest_memory only does "madvise to QEMU_MADV_DONTFORK" and
is only called by ram_block_add, which actually is duplicate code.
Bonus: add simple comment for kvm_has_sync_mmu to make life easier.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>
Message-Id: <1473662096-32598-1-git-send-email-caoj.fnst@cn.fujitsu.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-09-13 19:09:43 +02:00
Peter Xu
3f1fea0fb5 kvm-irqchip: do explicit commit when update irq
In the past, we are doing gsi route commit for each irqchip route
update. This is not efficient if we are updating lots of routes in the
same time. This patch removes the committing phase in
kvm_irqchip_update_msi_route(). Instead, we do explicit commit after all
routes updated.

Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-07-21 20:44:19 +03:00
Peter Xu
38d87493f3 kvm-irqchip: i386: add hook for add/remove virq
Adding two hooks to be notified when adding/removing msi routes. There
are two kinds of MSI routes:

- in kvm_irqchip_add_irq_route(): before assigning IRQFD. Used by
  vhost, vfio, etc.

- in kvm_irqchip_send_msi(): when sending direct MSI message, if
  direct MSI not allowed, we will first create one MSI route entry
  in the kernel, then trigger it.

This patch only hooks the first one (irqfd case). We do not need to
take care for the 2nd one, since it's only used by QEMU userspace
(kvm-apic) and the messages will always do in-time translation when
triggered. While we need to note them down for the 1st one, so that we
can notify the kernel when cache invalidation happens.

Also, we do not hook IOAPIC msi routes (we have explicit notifier for
IOAPIC to keep its cache updated). We only need to care about irqfd
users.

Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-07-21 20:44:19 +03:00
Peter Xu
d1f6af6a17 kvm-irqchip: simplify kvm_irqchip_add_msi_route
Changing the original MSIMessage parameter in kvm_irqchip_add_msi_route
into the vector number. Vector index provides more information than the
MSIMessage, we can retrieve the MSIMessage using the vector easily. This
will avoid fetching MSIMessage every time before adding MSI routes.

Meanwhile, the vector info will be used in the coming patches to further
enable gsi route update notifications.

Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-07-21 20:44:18 +03:00
Bharata B Rao
44f2e6c10e kvm: API to obtain max supported mem slots
Introduce kvm_get_max_memslots() API that can be used to obtain the
maximum number of memslots supported by KVM.

Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-06-07 10:17:45 +10:00
Gu Zheng
4c055ab54f cpu: Reclaim vCPU objects
In order to deal well with the kvm vcpus (which can not be removed without any
protection), we do not close KVM vcpu fd, just record and mark it as stopped
into a list, so that we can reuse it for the appending cpu hot-add request if
possible. It is also the approach that kvm guys suggested:
https://www.mail-archive.com/kvm@vger.kernel.org/msg102839.html

Signed-off-by: Chen Fan <chen.fan.fnst@cn.fujitsu.com>
Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Zhu Guihua <zhugh.fnst@cn.fujitsu.com>
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
               [- Explicit CPU_REMOVE() from qemu_kvm/tcg_destroy_vcpu()
                  isn't needed as it is done from cpu_exec_exit()
                - Use iothread mutex instead of global mutex during
                  destroy
                - Don't cleanup vCPU object from vCPU thread context
                  but leave it to the callers (device_add/device_del)]
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-05-30 14:03:59 +10:00
Greg Kurz
41264b385c PPC/KVM: early validation of vcpu id
The KVM API restricts vcpu ids to be < KVM_CAP_MAX_VCPUS. On PowerPC
targets, depending on the number of threads per core in the host and
in the guest, some topologies do generate higher vcpu ids actually.
When this happens, QEMU bails out with the following error:

kvm_init_vcpu failed: Invalid argument

The KVM_CREATE_VCPU ioctl has several EINVAL return paths, so it is
not possible to fully disambiguate.

This patch adds a check in the code that computes vcpu ids, so that
we can detect the error earlier, and print a friendlier message instead
of calling KVM_CREATE_VCPU with an obviously bogus vcpu id.

Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-05-27 09:40:23 +10:00
Paolo Bonzini
33c11879fd qemu-common: push cpu.h inclusion out of qemu-common.h
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-19 16:42:29 +02:00