Commit Graph

547305 Commits

Author SHA1 Message Date
Wanpeng Li 5c614b3583 KVM: nVMX: nested VPID emulation
VPID is used to tag address space and avoid a TLB flush. Currently L0 use
the same VPID to run L1 and all its guests. KVM flushes VPID when switching
between L1 and L2.

This patch advertises VPID to the L1 hypervisor, then address space of L1
and L2 can be separately treated and avoid TLB flush when swithing between
L1 and L2. For each nested vmentry, if vpid12 is changed, reuse shadow vpid
w/ an invvpid.

Performance:

run lmbench on L2 w/ 3.5 kernel.

Context switching - times in microseconds - smaller is better
-------------------------------------------------------------------------
Host                 OS  2p/0K 2p/16K 2p/64K 8p/16K 8p/64K 16p/16K 16p/64K
                         ctxsw  ctxsw  ctxsw ctxsw  ctxsw   ctxsw   ctxsw
--------- ------------- ------ ------ ------ ------ ------ ------- -------
kernel    Linux 3.5.0-1 1.2200 1.3700 1.4500 4.7800 2.3300 5.60000 2.88000  nested VPID
kernel    Linux 3.5.0-1 1.2600 1.4300 1.5600   12.7   12.9 3.49000 7.46000  vanilla

Reviewed-by: Jan Kiszka <jan.kiszka@siemens.com>
Reviewed-by: Wincy Van <fanwenyi0529@gmail.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-16 10:30:35 +02:00
Wanpeng Li 99b83ac893 KVM: nVMX: emulate the INVVPID instruction
Add the INVVPID instruction emulation.

Reviewed-by: Wincy Van <fanwenyi0529@gmail.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-16 10:30:24 +02:00
Wanpeng Li dd5f5341a3 KVM: VMX: introduce __vmx_flush_tlb to handle specific vpid
Introduce __vmx_flush_tlb() to handle specific vpid.

Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-14 16:41:09 +02:00
Wanpeng Li 991e7a0eed KVM: VMX: adjust interface to allocate/free_vpid
Adjust allocate/free_vid so that they can be reused for the nested vpid.

Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-14 16:41:09 +02:00
Kosuke Tatsukawa 6003a42010 kvm: fix waitqueue_active without memory barrier in virt/kvm/async_pf.c
async_pf_execute() seems to be missing a memory barrier which might
cause the waker to not notice the waiter and miss sending a wake_up as
in the following figure.

        async_pf_execute                    kvm_vcpu_block
------------------------------------------------------------------------
spin_lock(&vcpu->async_pf.lock);
if (waitqueue_active(&vcpu->wq))
/* The CPU might reorder the test for
   the waitqueue up here, before
   prior writes complete */
                                    prepare_to_wait(&vcpu->wq, &wait,
                                      TASK_INTERRUPTIBLE);
                                    /*if (kvm_vcpu_check_block(vcpu) < 0) */
                                     /*if (kvm_arch_vcpu_runnable(vcpu)) { */
                                      ...
                                      return (vcpu->arch.mp_state == KVM_MP_STATE_RUNNABLE &&
                                        !vcpu->arch.apf.halted)
                                        || !list_empty_careful(&vcpu->async_pf.done)
                                     ...
                                     return 0;
list_add_tail(&apf->link,
  &vcpu->async_pf.done);
spin_unlock(&vcpu->async_pf.lock);
                                    waited = true;
                                    schedule();
------------------------------------------------------------------------

The attached patch adds the missing memory barrier.

I found this issue when I was looking through the linux source code
for places calling waitqueue_active() before wake_up*(), but without
preceding memory barriers, after sending a patch to fix a similar
issue in drivers/tty/n_tty.c  (Details about the original issue can be
found here: https://lkml.org/lkml/2015/9/28/849).

Signed-off-by: Kosuke Tatsukawa <tatsu@ab.jp.nec.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-14 16:41:08 +02:00
Radim Krčmář 13db77347d KVM: x86: don't notify userspace IOAPIC on edge EOI
On real hardware, edge-triggered interrupts don't set a bit in TMR,
which means that IOAPIC isn't notified on EOI.  Do the same here.

Staying in guest/kernel mode after edge EOI is what we want for most
devices.  If some bugs could be nicely worked around with edge EOI
notifications, we should invest in a better interface.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-14 16:41:08 +02:00
Radim Krčmář db2bdcbbbd KVM: x86: fix edge EOI and IOAPIC reconfig race
KVM uses eoi_exit_bitmap to track vectors that need an action on EOI.
The problem is that IOAPIC can be reconfigured while an interrupt with
old configuration is pending and eoi_exit_bitmap only remembers the
newest configuration;  thus EOI from the pending interrupt is not
recognized.

(Reconfiguration is not a problem for level interrupts, because IOAPIC
 sends interrupt with the new configuration.)

For an edge interrupt with ACK notifiers, like i8254 timer; things can
happen in this order
 1) IOAPIC inject a vector from i8254
 2) guest reconfigures that vector's VCPU and therefore eoi_exit_bitmap
    on original VCPU gets cleared
 3) guest's handler for the vector does EOI
 4) KVM's EOI handler doesn't pass that vector to IOAPIC because it is
    not in that VCPU's eoi_exit_bitmap
 5) i8254 stops working

A simple solution is to set the IOAPIC vector in eoi_exit_bitmap if the
vector is in PIR/IRR/ISR.

This creates an unwanted situation if the vector is reused by a
non-IOAPIC source, but I think it is so rare that we don't want to make
the solution more sophisticated.  The simple solution also doesn't work
if we are reconfiguring the vector.  (Shouldn't happen in the wild and
I'd rather fix users of ACK notifiers instead of working around that.)

The are no races because ioapic injection and reconfig are locked.

Fixes: b053b2aef2 ("KVM: x86: Add EOI exit bitmap inference")
[Before b053b2aef2, this bug happened only with APICv.]
Fixes: c7c9c56ca2 ("x86, apicv: add virtual interrupt delivery support")
Cc: <stable@vger.kernel.org>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-14 16:41:08 +02:00
Radim Krčmář c77f3fab44 kvm: x86: set KVM_REQ_EVENT when updating IRR
After moving PIR to IRR, the interrupt needs to be delivered manually.

Reported-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-14 16:41:08 +02:00
Paolo Bonzini bff98d3b01 Merge branch 'kvm-master' into HEAD
Merge more important SMM fixes.
2015-10-14 16:40:46 +02:00
Paolo Bonzini b10d92a54d KVM: x86: fix RSM into 64-bit protected mode
In order to get into 64-bit protected mode, you need to enable
paging while EFER.LMA=1.  For this to work, CS.L must be 0.
Currently, we load the segments before CR0 and CR4, which means
that if RSM returns into 64-bit protected mode CS.L is already 1
and everything breaks.

Luckily, CS.L=0 is always the case when executing RSM, because it
is forbidden to execute RSM from 64-bit protected mode.  Hence it
is enough to load CR0 and CR4 first, and only then the segments.

Fixes: 660a5d517a
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-14 16:39:52 +02:00
Paolo Bonzini 25188b9986 KVM: x86: fix previous commit for 32-bit
Unfortunately I only noticed this after pushing.

Fixes: f0d648bdf0
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-14 16:39:25 +02:00
Paolo Bonzini 58f800d5ac Merge branch 'kvm-master' into HEAD
This merge brings in a couple important SMM fixes, which makes it
easier to test latest KVM with unrestricted_guest=0 and to test
the in-progress work on SMM support in the firmware.

Conflicts:
	arch/x86/kvm/x86.c
2015-10-13 21:32:50 +02:00
Paolo Bonzini 7391773933 KVM: x86: fix SMI to halted VCPU
An SMI to a halted VCPU must wake it up, hence a VCPU with a pending
SMI must be considered runnable.

Fixes: 64d6067057
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-13 18:29:41 +02:00
Paolo Bonzini 5d9bc648b9 KVM: x86: clean up kvm_arch_vcpu_runnable
Split the huge conditional in two functions.

Fixes: 64d6067057
Cc: stable@vger.kernel.org
Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-13 18:28:59 +02:00
Paolo Bonzini f0d648bdf0 KVM: x86: map/unmap private slots in __x86_set_memory_region
Otherwise, two copies (one of them never populated and thus bogus)
are allocated for the regular and SMM address spaces.  This breaks
SMM with EPT but without unrestricted guest support, because the
SMM copy of the identity page map is all zeros.

By moving the allocation to the caller we also remove the last
vestiges of kernel-allocated memory regions (not accessible anymore
in userspace since commit b74a07beed, "KVM: Remove kernel-allocated
memory regions", 2010-06-21); that is a nice bonus.

Reported-by: Alexandre DERUMIER <aderumier@odiso.com>
Cc: stable@vger.kernel.org
Fixes: 9da0e4d5ac
Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-13 18:28:58 +02:00
Paolo Bonzini 1d8007bdee KVM: x86: build kvm_userspace_memory_region in x86_set_memory_region
The next patch will make x86_set_memory_region fill the
userspace_addr.  Since the struct is not used untouched
anymore, it makes sense to build it in x86_set_memory_region
directly; it also simplifies the callers.

Reported-by: Alexandre DERUMIER <aderumier@odiso.com>
Cc: stable@vger.kernel.org
Fixes: 9da0e4d5ac
Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-13 18:28:46 +02:00
Paolo Bonzini 1330a0170a KVM: s390: Fixes for 4.4
A bunch of fixes and optimizations for interrupt and time
 handling. No fix is important enough to qualify for 4.3 or
 stable.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.14 (GNU/Linux)
 
 iQIcBAABAgAGBQJWHQ0hAAoJEBF7vIC1phx8pmMP/jx+wYMA5U4Gi5OT4HzyFUoh
 nNoQUuwLSykg82vTZgajJFP5Wo4cIDXs2pLUOsuJFpufUY1S2fRgdqjNLcnIKarz
 BY/a6t9nXYQqEDxeHXIkS1sqTXcpEv6yHfitpZCyAz2D+oDmOXQLyJ7tEpp3JGUh
 WwTA1b4cTOpdASWCB2ldcgKDiKMA70dm7e+3ejGlib6v/aoEWDI0/n/0/UZbH8gm
 Q9hKcdxhwqTqVMlMSHcCkcKqKMJpY/8eNNWtlTgVc7gd0kFaLc+T5JToJKUTmE5G
 lCCkBO3TjOGKnoccIRYc7DW+vHVR5er5IaNIRpxnCf/g3FF9R1jbfm9DixYT29IG
 H3GJSZwQMo0glWNfzuBlgmBAgMTGMka9+0zXvXUw+TIOFmjgjIx0w5H0rYmUdE6j
 tZYLYGa96DqdDur1lLN6RJGaO2O08bI2J6TJXJ5h1x8qY6V2YKLRGOabXxLEUut2
 LvanLczT4ou27fgW2kOpcLgCYKT1l2nlH22WzilpITKpBQFSq1flFMQfB32jqQJI
 v41aNBwIEqE/9dR1Zrwad6m//t9u8PAv3fna8cdchYolq/ZZF30R8BJW9lYxeify
 htPKhITzL30JdN3bw5ItVFA/p4YIqVswwq6u+pc9vpWeI4xG71Vq2DybhmuOQ6yd
 kuojcihXhEzkk2vit3Cc
 =LJJQ
 -----END PGP SIGNATURE-----

Merge tag 'kvm-s390-next-20151013' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD

KVM: s390: Fixes for 4.4

A bunch of fixes and optimizations for interrupt and time
handling. No fix is important enough to qualify for 4.3 or
stable.
2015-10-13 16:44:51 +02:00
David Hildenbrand 60417fcc2b KVM: s390: factor out reading of the guest TOD clock
Let's factor this out and always use get_tod_clock_fast() when
reading the guest TOD.

STORE CLOCK FAST does not do serialization and, therefore, might
result in some fuzziness between different processors in a way
that subsequent calls on different CPUs might have time stamps that
are earlier. This semantics is fine though for all KVM use cases.
To make it obvious that the new function has STORE CLOCK FAST
semantics we name it kvm_s390_get_tod_clock_fast.

With this patch, we only have a handful of places were we
have to care about STP sync (using preempt_disable() logic).

Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-10-13 15:50:35 +02:00
David Hildenbrand 25ed167596 KVM: s390: factor out and fix setting of guest TOD clock
Let's move that whole logic into one function. We now always use unsigned
values when calculating the epoch (to avoid over/underflow defined).
Also, we always have to get all VCPUs out of SIE before doing the update
to avoid running differing VCPUs with different TODs.

Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-10-13 15:50:35 +02:00
David Hildenbrand 5a3d883a59 KVM: s390: switch to get_tod_clock() and fix STP sync races
Nobody except early.c makes use of store_tod_clock() to handle the
cc. So if we would get a cc != 0, we would be in more trouble.

Let's replace all users with get_tod_clock(). Returning a cc
on an ioctl sounded strange either way.

We can now also easily move the get_tod_clock() call into the
preempt_disable() section. This is in fact necessary to make the
STP sync work as expected. Otherwise the host TOD could change
and we would end up with a wrong epoch calculation.

Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-10-13 15:50:34 +02:00
David Hildenbrand 238293b14d KVM: s390: correctly handle injection of pgm irqs and per events
PER events can always co-exist with other program interrupts.

For now, we always overwrite all program interrupt parameters when
injecting any type of program interrupt.

Let's handle that correctly by only overwriting the relevant portion of
the program interrupt parameters. Therefore we can now inject PER events
and ordinary program interrupts concurrently, resulting in no loss of
program interrupts. This will especially by helpful when manually detecting
PER events later - as both types might be triggered during one SIE exit.

Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-10-13 15:50:34 +02:00
David Hildenbrand 66933b78e3 KVM: s390: simplify in-kernel program irq injection
The main reason to keep program injection in kernel separated until now
was that we were able to do some checking, if really only the owning
thread injects program interrupts (via waitqueue_active(li->wq)).

This BUG_ON was never triggered and the chances of really hitting it, if
another thread injected a program irq to another vcpu, were very small.

Let's drop this check and turn kvm_s390_inject_program_int() and
kvm_s390_inject_prog_irq() into simple inline functions that makes use of
kvm_s390_inject_vcpu().

__must_check can be dropped as they are implicitely given by
kvm_s390_inject_vcpu(), to avoid ugly long function prototypes.

Reviewed-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-10-13 15:50:34 +02:00
David Hildenbrand 4d32ad6bec KVM: s390: drop out early in kvm_s390_has_irq()
Let's get rid of the local variable and exit directly if we found
any pending interrupt. This is not only faster, but also better
readable.

Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-10-13 15:50:33 +02:00
David Hildenbrand 118b862b15 KVM: s390: kvm_arch_vcpu_runnable already cares about timer interrupts
We can remove that double check.

Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-10-13 15:50:33 +02:00
David Hildenbrand 5f94c58ed0 KVM: s390: set interception requests for all floating irqs
No need to separate pending and floating irqs when setting interception
requests. Let's do it for all equally.

Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-10-13 15:50:33 +02:00
David Hildenbrand fee0e0fdb2 KVM: s390: disabled wait cares about machine checks, not PER
We don't care about program event recording irqs (synchronous
program irqs) but asynchronous irqs when checking for disabled
wait. Machine checks were missing.

Let's directly switch to the functions we have for that purpose
instead of testing once again for magic bits.

Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-10-13 15:50:32 +02:00
Christian Borntraeger f59922b47e KVM: s390: remove unused variable in __inject_vm
the float int structure is no longer used in __inject_vm.

Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-10-13 15:50:27 +02:00
Feng Wu b7d2063177 iommu/vt-d: Add a command line parameter for VT-d posted-interrupts
Enable VT-d Posted-Interrtups and add a command line
parameter for it.

Signed-off-by: Feng Wu <feng.wu@intel.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Joerg Roedel <joro@8bytes.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-01 15:06:54 +02:00
Feng Wu bf9f6ac8d7 KVM: Update Posted-Interrupts Descriptor when vCPU is blocked
This patch updates the Posted-Interrupts Descriptor when vCPU
is blocked.

pre-block:
- Add the vCPU to the blocked per-CPU list
- Set 'NV' to POSTED_INTR_WAKEUP_VECTOR

post-block:
- Remove the vCPU from the per-CPU list

Signed-off-by: Feng Wu <feng.wu@intel.com>
[Concentrate invocation of pre/post-block hooks to vcpu_block. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-01 15:06:53 +02:00
Feng Wu 28b835d60f KVM: Update Posted-Interrupts Descriptor when vCPU is preempted
This patch updates the Posted-Interrupts Descriptor when vCPU
is preempted.

sched out:
- Set 'SN' to suppress furture non-urgent interrupts posted for
the vCPU.

sched in:
- Clear 'SN'
- Change NDST if vCPU is scheduled to a different CPU
- Set 'NV' to POSTED_INTR_VECTOR

Signed-off-by: Feng Wu <feng.wu@intel.com>
[Include asm/cpu.h to fix !CONFIG_SMP compilation. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-01 15:06:53 +02:00
Feng Wu 8727688006 KVM: x86: select IRQ_BYPASS_MANAGER
Select IRQ_BYPASS_MANAGER for x86 when CONFIG_KVM is set

Signed-off-by: Feng Wu <feng.wu@intel.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-01 15:06:52 +02:00
Feng Wu efc644048e KVM: x86: Update IRTE for posted-interrupts
This patch adds the routine to update IRTE for posted-interrupts
when guest changes the interrupt configuration.

Signed-off-by: Feng Wu <feng.wu@intel.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
[Squashed in automatically generated patch from the build robot
 "KVM: x86: vcpu_to_pi_desc() can be static" - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-01 15:06:51 +02:00
Feng Wu 6d7425f109 vfio: Register/unregister irq_bypass_producer
This patch adds the registration/unregistration of an
irq_bypass_producer for MSI/MSIx on vfio pci devices.

Acked-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Feng Wu <feng.wu@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-01 15:06:50 +02:00
Feng Wu d84f1e0755 KVM: make kvm_set_msi_irq() public
Make kvm_set_msi_irq() public, we can use this function outside.

Signed-off-by: Feng Wu <feng.wu@intel.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-01 15:06:50 +02:00
Feng Wu 8feb4a04dc KVM: Define a new interface kvm_intr_is_single_vcpu()
This patch defines a new interface kvm_intr_is_single_vcpu(),
which can returns whether the interrupt is for single-CPU or not.

It is used by VT-d PI, since now we only support single-CPU
interrupts, For lowest-priority interrupts, if user configures
it via /proc/irq or uses irqbalance to make it single-CPU, we
can use PI to deliver the interrupts to it. Full functionality
of lowest-priority support will be added later.

Signed-off-by: Feng Wu <feng.wu@intel.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-01 15:06:49 +02:00
Feng Wu ebbfc76536 KVM: Add some helper functions for Posted-Interrupts
This patch adds some helper functions to manipulate the
Posted-Interrupts Descriptor.

Signed-off-by: Feng Wu <feng.wu@intel.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
[Make the new functions inline. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-01 15:06:48 +02:00
Feng Wu 6ef1522f7e KVM: Extend struct pi_desc for VT-d Posted-Interrupts
Extend struct pi_desc for VT-d Posted-Interrupts.

Signed-off-by: Feng Wu <feng.wu@intel.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-01 15:06:48 +02:00
Feng Wu f70c20aaf1 KVM: Add an arch specific hooks in 'struct kvm_kernel_irqfd'
This patch adds an arch specific hooks 'arch_update' in
'struct kvm_kernel_irqfd'. On Intel side, it is used to
update the IRTE when VT-d posted-interrupts is used.

Signed-off-by: Feng Wu <feng.wu@intel.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-01 15:06:47 +02:00
Eric Auger 9016cfb577 KVM: eventfd: add irq bypass consumer management
This patch adds the registration/unregistration of an
irq_bypass_consumer on irqfd assignment/deassignment.

Signed-off-by: Eric Auger <eric.auger@linaro.org>
Signed-off-by: Feng Wu <feng.wu@intel.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-01 15:06:46 +02:00
Eric Auger 1a02b27035 KVM: introduce kvm_arch functions for IRQ bypass
This patch introduces
- kvm_arch_irq_bypass_add_producer
- kvm_arch_irq_bypass_del_producer
- kvm_arch_irq_bypass_stop
- kvm_arch_irq_bypass_start

They make possible to specialize the KVM IRQ bypass consumer in
case CONFIG_KVM_HAVE_IRQ_BYPASS is set.

Signed-off-by: Eric Auger <eric.auger@linaro.org>
[Add weak implementations of the callbacks. - Feng]
Signed-off-by: Feng Wu <feng.wu@intel.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-01 15:06:45 +02:00
Eric Auger 166c9775f1 KVM: create kvm_irqfd.h
Move _irqfd_resampler and _irqfd struct declarations in a new
public header: kvm_irqfd.h. They are respectively renamed into
kvm_kernel_irqfd_resampler and kvm_kernel_irqfd. Those datatypes
will be used by architecture specific code, in the context of
IRQ bypass manager integration.

Signed-off-by: Eric Auger <eric.auger@linaro.org>
Signed-off-by: Feng Wu <feng.wu@intel.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-01 15:06:44 +02:00
Feng Wu 37d9fe4783 virt: Add virt directory to the top Makefile
We need to build files in virt/lib/, which are now used by
KVM and VFIO, so add virt directory to the top Makefile.

Signed-off-by: Feng Wu <feng.wu@intel.com>
Acked-by: Michal Marek <mmarek@suse.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-01 15:06:44 +02:00
Alex Williamson f73f817312 virt: IRQ bypass manager
When a physical I/O device is assigned to a virtual machine through
facilities like VFIO and KVM, the interrupt for the device generally
bounces through the host system before being injected into the VM.
However, hardware technologies exist that often allow the host to be
bypassed for some of these scenarios.  Intel Posted Interrupts allow
the specified physical edge interrupts to be directly injected into a
guest when delivered to a physical processor while the vCPU is
running.  ARM IRQ Forwarding allows forwarded physical interrupts to
be directly deactivated by the guest.

The IRQ bypass manager here is meant to provide the shim to connect
interrupt producers, generally the host physical device driver, with
interrupt consumers, generally the hypervisor, in order to configure
these bypass mechanism.  To do this, we base the connection on a
shared, opaque token.  For KVM-VFIO this is expected to be an
eventfd_ctx since this is the connection we already use to connect an
eventfd to an irqfd on the in-kernel path.  When a producer and
consumer with matching tokens is found, callbacks via both registered
participants allow the bypass facilities to be automatically enabled.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Eric Auger <eric.auger@linaro.org>
Tested-by: Eric Auger <eric.auger@linaro.org>
Tested-by: Feng Wu <feng.wu@intel.com>
Signed-off-by: Feng Wu <feng.wu@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-01 15:06:43 +02:00
Paolo Bonzini 18cd52c4d9 irq_remapping: move structs outside #ifdef
This is friendlier to clients of the code, who are going to prepare
vcpu_data structs unconditionally, even if CONFIG_IRQ_REMAP is not
defined.

Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-01 15:06:42 +02:00
Radim Krčmář 72c930dcfc x86: kvmclock: abolish PVCLOCK_COUNTS_FROM_ZERO
Newer KVM won't be exposing PVCLOCK_COUNTS_FROM_ZERO anymore.
The purpose of that flags was to start counting system time from 0 when
the KVM clock has been initialized.
We can achieve the same by selecting one read as the initial point.

A simple subtraction will work unless the KVM clock count overflows
earlier (has smaller width) than scheduler's cycle count.  We should be
safe till x86_128.

Because PVCLOCK_COUNTS_FROM_ZERO was enabled only on new hypervisors,
setting sched clock as stable based on PVCLOCK_TSC_STABLE_BIT might
regress on older ones.

I presume we don't need to change kvm_clock_read instead of introducing
kvm_sched_clock_read.  A problem could arise in case sched_clock is
expected to return the same value as get_cycles, but we should have
merged those clocks in that case.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Acked-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-01 15:06:42 +02:00
Xiao Guangrong 1cea0ce68e KVM: VMX: drop rdtscp_enabled field
Check cpuid bit instead of it

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-01 15:06:41 +02:00
Xiao Guangrong 7ec362964d KVM: VMX: clean up bit operation on SECONDARY_VM_EXEC_CONTROL
Use vmcs_set_bits() and vmcs_clear_bits() to clean up the code

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-01 15:06:40 +02:00
Xiao Guangrong feda805fe7 KVM: VMX: unify SECONDARY_VM_EXEC_CONTROL update
Unify the update in vmx_cpuid_update()

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
[Rewrite to use vmcs_set_secondary_exec_control. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-01 15:06:39 +02:00
Paolo Bonzini 8b97265a15 KVM: VMX: align vmx->nested.nested_vmx_secondary_ctls_high to vmx->rdtscp_enabled
The SECONDARY_EXEC_RDTSCP must be available iff RDTSCP is enabled in the
guest.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-01 15:06:38 +02:00
Xiao Guangrong 29541bb8f4 KVM: VMX: simplify invpcid handling in vmx_cpuid_update()
If vmx_invpcid_supported() is true, second execution control
filed must be supported and SECONDARY_EXEC_ENABLE_INVPCID
must have already been set in current vmcs by
vmx_secondary_exec_control()

If vmx_invpcid_supported() is false, no need to clear
SECONDARY_EXEC_ENABLE_INVPCID

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-01 15:06:38 +02:00