Commit Graph

91 Commits

Author SHA1 Message Date
Avi Kivity acee3c04e8 KVM: Allocate guest memory as MAP_PRIVATE, not MAP_SHARED
There is no reason to share internal memory slots with fork()ed instances.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:23 +02:00
Avi Kivity f4bbd9aaaa KVM: Load real mode segments correctly
Real mode segments to not reference the GDT or LDT; they simply compute
base = selector * 16.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:21 +02:00
Harvey Harrison ee032c993e KVM: make irq ack notifier functions static
sparse says:

arch/x86/kvm/x86.c:107:32: warning: symbol 'kvm_find_assigned_dev' was not declared. Should it be static?
arch/x86/kvm/i8254.c:225:6: warning: symbol 'kvm_pit_ack_irq' was not declared. Should it be static?

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:21 +02:00
Amit Shah 29c8fa32c5 KVM: Use kvm_set_irq to inject interrupts
... instead of using the pic and ioapic variants

Signed-off-by: Amit Shah <amit.shah@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:21 +02:00
Amit Shah 6762b7299a KVM: Device assignment: Check for privileges before assigning irq
Even though we don't share irqs at the moment, we should ensure
regular user processes don't try to allocate system resources.

We check for capability to access IO devices (CAP_SYS_RAWIO) before
we request_irq on behalf of the guest.

Noticed by Avi.

Signed-off-by: Amit Shah <amit.shah@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:20 +02:00
Marcelo Tosatti 29415c37f0 KVM: set debug registers after "schedulable" section
The vcpu thread can be preempted after the guest_debug_pre() callback,
resulting in invalid debug registers on the new vcpu.

Move it inside the non-preemptable section.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:19 +02:00
Dave Hansen b772ff362e KVM: Reduce stack usage in kvm_arch_vcpu_ioctl()
[sheng: fix KVM_GET_LAPIC using wrong size]

Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: Sheng Yang <sheng.yang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:18 +02:00
Dave Hansen f0d662759a KVM: Reduce kvm stack usage in kvm_arch_vm_ioctl()
On my machine with gcc 3.4, kvm uses ~2k of stack in a few
select functions.  This is mostly because gcc fails to
notice that the different case: statements could have their
stack usage combined.  It overflows very nicely if interrupts
happen during one of these large uses.

This patch uses two methods for reducing stack usage.
1. dynamically allocate large objects instead of putting
   on the stack.
2. Use a union{} member for all of the case variables. This
   tricks gcc into combining them all into a single stack
   allocation. (There's also a comment on this)

Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:18 +02:00
Ben-Ami Yassour 4d5c5d0fe8 KVM: pci device assignment
Based on a patch from: Amit Shah <amit.shah@qumranet.com>

This patch adds support for handling PCI devices that are assigned to
the guest.

The device to be assigned to the guest is registered in the host kernel
and interrupt delivery is handled.  If a device is already assigned, or
the device driver for it is still loaded on the host, the device
assignment is failed by conveying a -EBUSY reply to the userspace.

Devices that share their interrupt line are not supported at the moment.

By itself, this patch will not make devices work within the guest.
The VT-d extension is required to enable the device to perform DMA.
Another alternative is PVDMA.

Signed-off-by: Amit Shah <amit.shah@qumranet.com>
Signed-off-by: Ben-Ami Yassour <benami@il.ibm.com>
Signed-off-by: Weidong Han <weidong.han@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:18 +02:00
Alexander Graf b5e2fec0eb KVM: Ignore DEBUGCTL MSRs with no effect
Netware writes to DEBUGCTL and reads from the DEBUGCTL and LAST*IP MSRs
without further checks and is really confused to receive a #GP during that.
To make it happy we should just make them stubs, which is exactly what SVM
already does.

Writes to DEBUGCTL that are vendor-specific are resembled to behave as if the
virtual CPU does not know them.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:15 +02:00
Avi Kivity 26eef70c3e KVM: Clear exception queue before emulating an instruction
If we're emulating an instruction, either it will succeed, in which case
any previously queued exception will be spurious, or we will requeue the
same exception.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:13 +02:00
Marcelo Tosatti 5fdbf9765b KVM: x86: accessors for guest registers
As suggested by Avi, introduce accessors to read/write guest registers.
This simplifies the ->cache_regs/->decache_regs interface, and improves
register caching which is important for VMX, where the cost of
vmcs_read/vmcs_write is significant.

[avi: fix warnings]

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:13:57 +02:00
Avi Kivity ed84862433 KVM: Advertise synchronized mmu support to userspace
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-07-29 12:34:02 +03:00
Andrea Arcangeli 604b38ac03 KVM: Allow browsing memslots with mmu_lock
This allows reading memslots with only the mmu_lock hold for mmu
notifiers that runs in atomic context and with mmu_lock held.

Signed-off-by: Andrea Arcangeli <andrea@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-07-29 12:33:50 +03:00
Andrea Arcangeli a1708ce8a3 KVM: Allow reading aliases with mmu_lock
This allows the mmu notifier code to run unalias_gfn with only the
mmu_lock held.  Only alias writes need the mmu_lock held. Readers will
either take the slots_lock in read mode or the mmu_lock.

Signed-off-by: Andrea Arcangeli <andrea@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-07-29 12:33:40 +03:00
Marcelo Tosatti c93cd3a588 KVM: task switch: translate guest segment limit to virt-extension byte granular field
If 'g' is one then limit is 4kb granular.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-07-27 11:34:10 +03:00
Marcelo Tosatti 34198bf842 KVM: task switch: use seg regs provided by subarch instead of reading from GDT
There is no guarantee that the old TSS descriptor in the GDT contains
the proper base address. This is the case for Windows installation's
reboot-via-triplefault.

Use guest registers instead. Also translate the address properly.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-07-27 11:34:09 +03:00
Marcelo Tosatti 98899aa0e0 KVM: task switch: segment base is linear address
The segment base is always a linear address, so translate before
accessing guest memory.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-07-27 11:34:09 +03:00
Marcelo Tosatti 34d4cb8fca KVM: MMU: nuke shadowed pgtable pages and ptes on memslot destruction
Flush the shadow mmu before removing regions to avoid stale entries.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-07-20 12:42:40 +03:00
Avi Kivity d6e88aec07 KVM: Prefix some x86 low level function with kvm_, to avoid namespace issues
Fixes compilation with CONFIG_VMI enabled.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-07-20 12:42:39 +03:00
Avi Kivity ac9f6dc0db KVM: Apply the kernel sigmask to vcpus blocked due to being uninitialized
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-07-20 12:42:38 +03:00
Marcelo Tosatti f8b78fa3d4 KVM: move slots_lock acquision down to vapic_exit
There is no need to grab slots_lock if the vapic_page will not
be touched.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-07-20 12:42:36 +03:00
Avi Kivity 7a5b56dfd3 KVM: x86 emulator: lazily evaluate segment registers
Instead of prefetching all segment bases before emulation, read them at the
last moment.  Since most of them are unneeded, we save some cycles on
Intel machines where this is a bit expensive.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-07-20 12:42:35 +03:00
Avi Kivity f76c710d75 KVM: Use printk_rlimit() instead of reporting emulation failures just once
Emulation failure reports are useful, so allow more than one per the lifetime
of the module.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-07-20 12:42:32 +03:00
Glauber Costa 25be46080f KVM: Do not calculate linear rip in emulation failure report
If we're not gonna do anything (case in which failure is already
reported), we do not need to even bother with calculating the linear rip.

Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-07-20 12:42:32 +03:00
Laurent Vivier 542472b53e KVM: Add coalesced MMIO support (x86 part)
This patch enables coalesced MMIO for x86 architecture.
It defines KVM_MMIO_PAGE_OFFSET and KVM_CAP_COALESCED_MMIO.
It enables the compilation of coalesced_mmio.c.

Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-07-20 12:42:31 +03:00
Laurent Vivier 92760499d0 KVM: kvm_io_device: extend in_range() to manage len and write attribute
Modify member in_range() of structure kvm_io_device to pass length and the type
of the I/O (write or read).

This modification allows to use kvm_io_device with coalesced MMIO.

Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-07-20 12:42:30 +03:00
Guillaume Thouvenin 3e6e0aab1b KVM: Prefixes segment functions that will be exported with "kvm_"
Prefixes functions that will be exported with kvm_.
We also prefixed set_segment() even if it still static
to be coherent.

signed-off-by: Guillaume Thouvenin <guillaume.thouvenin@ext.bull.net>
Signed-off-by: Laurent Vivier <laurent.vivier@bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-07-20 12:42:27 +03:00
Avi Kivity 9ba075a664 KVM: MTRR support
Add emulation for the memory type range registers, needed by VMware esx 3.5,
and by pci device assignment.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-07-20 12:42:26 +03:00
Sheng Yang f08864b42a KVM: VMX: Enable NMI with in-kernel irqchip
Signed-off-by: Sheng Yang <sheng.yang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-07-20 12:42:26 +03:00
Sheng Yang 3419ffc8e4 KVM: IOAPIC/LAPIC: Enable NMI support
[avi: fix ia64 build breakage]

Signed-off-by: Sheng Yang <sheng.yang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-07-20 12:42:25 +03:00
Avi Kivity 50d40d7fb9 KVM: Remove unnecessary ->decache_regs() call
Since we aren't modifying any register, there's no need to decache
the register state.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-07-20 12:42:25 +03:00
Avi Kivity 7cc8883074 KVM: Remove decache_vcpus_on_cpu() and related callbacks
Obsoleted by the vmx-specific per-cpu list.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-07-20 12:42:25 +03:00
Avi Kivity 543e424366 KVM: VMX: Add list of potentially locally cached vcpus
VMX hardware can cache the contents of a vcpu's vmcs.  This cache needs
to be flushed when migrating a vcpu to another cpu, or (which is the case
that interests us here) when disabling hardware virtualization on a cpu.

The current implementation of decaching iterates over the list of all vcpus,
picks the ones that are potentially cached on the cpu that is being offlined,
and flushes the cache.  The problem is that it uses mutex_trylock() to gain
exclusive access to the vcpu, which fires off a (benign) warning about using
the mutex in an interrupt context.

To avoid this, and to make things generally nicer, add a new per-cpu list
of potentially cached vcus.  This makes the decaching code much simpler.  The
list is vmx-specific since other hardware doesn't have this issue.

[andrea: fix crash on suspend/resume]

Signed-off-by: Andrea Arcangeli <andrea@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-07-20 12:42:24 +03:00
Joerg Roedel 54e445ca84 KVM: add missing kvmtrace bits
This patch adds some kvmtrace bits to the generic x86 code
where it is instrumented from SVM.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-07-20 12:40:48 +03:00
Harvey Harrison 8b2cf73cc1 KVM: add statics were possible, function definition in lapic.h
Noticed by sparse:
arch/x86/kvm/vmx.c:1583:6: warning: symbol 'vmx_disable_intercept_for_msr' was not declared. Should it be static?
arch/x86/kvm/x86.c:3406:5: warning: symbol 'kvm_task_switch_16' was not declared. Should it be static?
arch/x86/kvm/x86.c:3429:5: warning: symbol 'kvm_task_switch_32' was not declared. Should it be static?
arch/x86/kvm/mmu.c:1968:6: warning: symbol 'kvm_mmu_remove_one_alloc_mmu_page' was not declared. Should it be static?
arch/x86/kvm/mmu.c:2014:6: warning: symbol 'mmu_destroy_caches' was not declared. Should it be static?
arch/x86/kvm/lapic.c:862:5: warning: symbol 'kvm_lapic_get_base' was not declared. Should it be static?
arch/x86/kvm/i8254.c:94:5: warning: symbol 'pit_get_gate' was not declared. Should it be static?
arch/x86/kvm/i8254.c:196:5: warning: symbol '__pit_timer_fn' was not declared. Should it be static?
arch/x86/kvm/i8254.c:561:6: warning: symbol '__inject_pit_timer_intr' was not declared. Should it be static?

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-07-20 12:40:46 +03:00
Jens Axboe 8691e5a8f6 smp_call_function: get rid of the unused nonatomic/retry argument
It's never used and the comments refer to nonatomic and retry
interchangably. So get rid of it.

Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-06-26 11:24:35 +02:00
Gerd Hoffmann 50d0a0f987 KVM: Make kvm host use the paravirt clocksource structs
This patch updates the kvm host code to use the pvclock structs.
It also makes the paravirt clock compatible with Xen.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-06-24 21:02:32 +03:00
Marcelo Tosatti 06e0564566 KVM: close timer injection race window in __vcpu_run
If a timer fires after kvm_inject_pending_timer_irqs() but before
local_irq_disable() the code will enter guest mode and only inject such
timer interrupt the next time an unrelated event causes an exit.

It would be simpler if the timer->pending irq conversion could be done
with IRQ's disabled, so that the above problem cannot happen.

For now introduce a new vcpu requests bit to cancel guest entry.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-06-24 12:16:59 +03:00
Marcelo Tosatti d4acf7e7ab KVM: Fix race between timer migration and vcpu migration
A guest vcpu instance can be scheduled to a different physical CPU
between the test for KVM_REQ_MIGRATE_TIMER and local_irq_disable().

If that happens, the timer will only be migrated to the current pCPU on
the next exit, meaning that guest LAPIC timer event can be delayed until
a host interrupt is triggered.

Fix it by cancelling guest entry if any vcpu request is pending.  This
has the side effect of nicely consolidating vcpu->requests checks.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-06-24 12:16:52 +03:00
Marcelo Tosatti 2f5997140f KVM: migrate PIT timer
Migrate the PIT timer to the physical CPU which vcpu0 is scheduled on,
similarly to what is done for the LAPIC timers, otherwise PIT interrupts
will be delayed until an unrelated event causes an exit.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-06-06 21:25:51 +03:00
Andrea Arcangeli bc1a34f1bf KVM: avoid fx_init() schedule in atomic
This make sure not to schedule in atomic during fx_init. I also
changed the name of fpu_init to fx_finit to avoid duplicating the name
with fpu_init that is already used in the kernel, this makes grep
simpler if nothing else.

Signed-off-by: Andrea Arcangeli <andrea@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-05-04 14:44:48 +03:00
Jan Kiszka b4f14abd95 KVM: Avoid spurious execeptions after setting registers
Clear pending exceptions when setting new register values. This avoids
spurious exceptions after restoring a vcpu state or after
reset-on-triple-fault.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-05-04 14:44:47 +03:00
Izik Eidus 3fe913e7c5 KVM: x86: task switch: fix wrong bit setting for the busy flag
The busy bit is bit 1 of the type field, not bit 8.

Signed-off-by: Izik Eidus <izike@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-05-04 14:44:43 +03:00
Sheng Yang b7ebfb0509 KVM: VMX: Prepare an identity page table for EPT in real mode
[aliguory: plug leak]

Signed-off-by: Sheng Yang <sheng.yang@intel.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-05-04 14:44:41 +03:00
Sheng Yang 7b52345e2c KVM: MMU: Add EPT support
Enable kvm_set_spte() to generate EPT entries.

Signed-off-by: Sheng Yang <sheng.yang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-05-04 14:44:38 +03:00
Marcelo Tosatti e9571ed54b KVM: fix kvm_vcpu_kick vs __vcpu_run race
There is a window open between testing of pending IRQ's
and assignment of guest_mode in __vcpu_run.

Injection of IRQ's can race with __vcpu_run as follows:

CPU0                                CPU1
kvm_x86_ops->run()
vcpu->guest_mode = 0                SET_IRQ_LINE ioctl
..
kvm_x86_ops->inject_pending_irq
kvm_cpu_has_interrupt()

                                    apic_test_and_set_irr()
                                    kvm_vcpu_kick
                                    if (vcpu->guest_mode)
                                        send_ipi()

vcpu->guest_mode = 1

So move guest_mode=1 assignment before ->inject_pending_irq, and make
sure that it won't reorder after it.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 18:21:32 +03:00
Marcelo Tosatti 62d9f0dbc9 KVM: add ioctls to save/store mpstate
So userspace can save/restore the mpstate during migration.

[avi: export the #define constants describing the value]
[christian: add s390 stubs]
[avi: ditto for ia64]

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 18:21:16 +03:00
Avi Kivity a45352908b KVM: Rename VCPU_MP_STATE_* to KVM_MP_STATE_*
We wish to export it to userspace, so move it into the kvm namespace.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 12:04:13 +03:00
Feng (Eric) Liu 2714d1d3d6 KVM: Add trace markers
Trace markers allow userspace to trace execution of a virtual machine
in order to monitor its performance.

Signed-off-by: Feng (Eric) Liu <eric.e.liu@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 12:01:19 +03:00