Commit Graph

494880 Commits

Author SHA1 Message Date
Kai Huang 3b0f1d01e5 KVM: Rename kvm_arch_mmu_write_protect_pt_masked to be more generic for log dirty
We don't have to write protect guest memory for dirty logging if architecture
supports hardware dirty logging, such as PML on VMX, so rename it to be more
generic.

Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
Reviewed-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-29 15:30:38 +01:00
Tiejun Chen b0165f1b41 kvm: update_memslots: clean flags for invalid memslots
Indeed, any invalid memslots should be new->npages = 0,
new->base_gfn = 0 and new->flags = 0 at the same time.

Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-27 21:31:44 +01:00
Joerg Roedel 128ca093cc kvm: iommu: Add cond_resched to legacy device assignment code
When assigning devices to large memory guests (>=128GB guest
memory in the failure case) the functions to create the
IOMMU page-tables for the whole guest might run for a very
long time. On non-preemptible kernels this might cause
Soft-Lockup warnings. Fix these by adding a cond_resched()
to the mapping and unmapping loops.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-27 21:31:12 +01:00
Nadav Amit 82268083fa KVM: x86: Emulation of call may use incorrect stack size
On long-mode, when far call that changes cs.l takes place, the stack size is
determined by the new mode.  For instance, if we go from 32-bit mode to 64-bit
mode, the stack-size if 64.  KVM uses the old stack size.

Fix it.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-26 12:17:34 +01:00
Nadav Amit bac155310b KVM: x86: 32-bit wraparound read/write not emulated correctly
If we got a wraparound of 32-bit operand, and the limit is 0xffffffff, read and
writes should be successful. It just needs to be done in two segments.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-26 12:15:18 +01:00
Nadav Amit 2b42fce695 KVM: x86: Fix defines in emulator.c
Unnecassary define was left after commit 7d882ffa81 ("KVM: x86: Revert
NoBigReal patch in the emulator").

Commit 39f062ff51 ("KVM: x86: Generate #UD when memory operand is required")
was missing undef.

Fix it.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-26 12:15:03 +01:00
Nadav Amit 2276b5116e KVM: x86: ARPL emulation can cause spurious exceptions
ARPL and MOVSXD are encoded the same and their execution depends on the
execution mode.  The operand sizes of each instruction are different.
Currently, ARPL is detected too late, after the decoding was already done, and
therefore may result in spurious exception (instead of failed emulation).

Introduce a group to the emulator to handle instructions according to execution
mode (32/64 bits). Note: in order not to make changes that may affect
performance, the new ModeDual can only be applied to instructions with ModRM.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-26 12:14:49 +01:00
Nadav Amit 801806d956 KVM: x86: IRET emulation does not clear NMI masking
The IRET instruction should clear NMI masking, but the current implementation
does not do so.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-26 12:14:42 +01:00
Nadav Amit 16794aaaab KVM: x86: Wrong operand size for far ret
Indeed, Intel SDM specifically states that for the RET instruction "In 64-bit
mode, the default operation size of this instruction is the stack-address size,
i.e. 64 bits."

However, experiments show this is not the case. Here is for example objdump of
small 64-bit asm:

  4004f1:	ca 14 00             	lret   $0x14
  4004f4:	48 cb                	lretq
  4004f6:	48 ca 14 00          	lretq  $0x14

Therefore, remove the Stack flag from far-ret instructions.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-26 12:14:25 +01:00
Nadav Amit 2fcf5c8ae2 KVM: x86: Dirty the dest op page on cmpxchg emulation
Intel SDM says for CMPXCHG: "To simplify the interface to the processor’s bus,
the destination operand receives a write cycle without regard to the result of
the comparison.". This means the destination page should be dirtied.

Fix it to by writing back the original value if cmpxchg failed.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-26 12:14:18 +01:00
Paolo Bonzini 8fff5e374a KVM: s390: fixes and features for kvm/next (3.20)
1. Generic
 - sparse warning (make function static)
 - optimize locking
 - bugfixes for interrupt injection
 - fix MVPG addressing modes
 
 2. hrtimer/wakeup fun
 A recent change can cause KVM hangs if adjtime is used in the host.
 The hrtimer might wake up too early or too late. Too early is fatal
 as vcpu_block will see that the wakeup condition is not met and
 sleep again. This CPU might never wake up again.
 This series addresses this problem. adjclock slowing down the host
 clock will result in too late wakeups. This will require more work.
 In addition to that we also change the hrtimer from REALTIME to
 MONOTONIC to avoid similar problems with timedatectl set-time.
 
 3. sigp rework
 We will move all "slow" sigps to QEMU (protected with a capability that
 can be enabled) to avoid several races between concurrent SIGP orders.
 
 4. Optimize the shadow page table
 Provide an interface to announce the maximum guest size. The kernel
 will use that to make the pagetable 2,3,4 (or theoretically) 5 levels.
 
 5. Provide an interface to set the guest TOD
 We now use two vm attributes instead of two oneregs, as oneregs are
 vcpu ioctl and we don't want to call them from other threads.
 
 6. Protected key functions
 The real HMC allows to enable/disable protected key CPACF functions.
 Lets provide an implementation + an interface for QEMU to activate
 this the protected key instructions.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.14 (GNU/Linux)
 
 iQIcBAABAgAGBQJUwj60AAoJEBF7vIC1phx8iV0QAKq1LZRTmgTLS2fd0oyWKZeN
 ShWUIUiB+7IUiuogYXZMfqOm61oogxwc95Ti+3tpSWYwkzUWagpS/RJQze7E1HOc
 3pHpXwrR01ueUT6uVV4xc/vmVIlQAIl/ScRDDPahlAT2crCleWcKVC9l0zBs/Kut
 IrfzN9pJcrkmXD178CDP8/VwXsn02ptLQEpidGibGHCd03YVFjp3X0wfwNdQxMbU
 qOwNYCz3SLfDm5gsybO2DG+aVY3AbM2ZOJt/qLv2j4Phz4XB4t4W9iJnAefSz7JA
 W4677wbMQpfZlUQYhI78H/Cl9SfWAuLug1xk83O/+lbEiR5u+8zLxB69dkFTiBaH
 442OY957T6TQZ/V9d0jDo2XxFrcaU9OONbVLsfBQ56Vwv5cAg9/7zqG8eqH7Nq9R
 gU3fQesgD4N0Kpa77T9k45TT/hBRnUEtsGixAPT6QYKyE6cK4AJATHKSjMSLbdfj
 ELbt0p2mVtKhuCcANfEx54U2CxOrg5ElBmPz8hRw0OkXdwpqh1sGKmt0govcHP1I
 BGSzE9G4mswwI1bQ7cqcyTk/lwL8g3+KQmRJoOcgCveQlnY12X5zGD5DhuPMPiIT
 VENqbcTzjlxdu+4t7Enml+rXl7ySsewT9L231SSrbLsTQVgCudD1B9m72WLu5ZUT
 9/Z6znv6tkeKV5rM9DYE
 =zLjR
 -----END PGP SIGNATURE-----

Merge tag 'kvm-s390-next-20150122' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into kvm-next

KVM: s390: fixes and features for kvm/next (3.20)

1. Generic
- sparse warning (make function static)
- optimize locking
- bugfixes for interrupt injection
- fix MVPG addressing modes

2. hrtimer/wakeup fun
A recent change can cause KVM hangs if adjtime is used in the host.
The hrtimer might wake up too early or too late. Too early is fatal
as vcpu_block will see that the wakeup condition is not met and
sleep again. This CPU might never wake up again.
This series addresses this problem. adjclock slowing down the host
clock will result in too late wakeups. This will require more work.
In addition to that we also change the hrtimer from REALTIME to
MONOTONIC to avoid similar problems with timedatectl set-time.

3. sigp rework
We will move all "slow" sigps to QEMU (protected with a capability that
can be enabled) to avoid several races between concurrent SIGP orders.

4. Optimize the shadow page table
Provide an interface to announce the maximum guest size. The kernel
will use that to make the pagetable 2,3,4 (or theoretically) 5 levels.

5. Provide an interface to set the guest TOD
We now use two vm attributes instead of two oneregs, as oneregs are
vcpu ioctl and we don't want to call them from other threads.

6. Protected key functions
The real HMC allows to enable/disable protected key CPACF functions.
Lets provide an implementation + an interface for QEMU to activate
this the protected key instructions.
2015-01-23 14:33:36 +01:00
Paolo Bonzini 1c6007d59a KVM/ARM changes for v3.20 including GICv3 emulation, dirty page logging, added
trace symbols, and adding an explicit VGIC init device control IOCTL.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJUwhsKAAoJEEtpOizt6ddyuSEH/ia2uf07N0i+C1dPKYiqhKEd
 nFqBvgrhAMVztWLmy1Wq4SnO9YNd+CrPYATrfCiYsYQ9aKc09+qDq+uo06bVpZXz
 KsHjVGUsdyJ4qRqjDixkPvZviGIXa6C//+hcwg1XH2nit1uHmXVupzB9dDz3ZM2l
 GCwApdRdaaUVDt5Ud2ljqIWZa18Qf/5/HD8MdPXpmotDOKucL6pBr/1R1XWueCU/
 ejRs/qy3EFyMWdEdfGFAMCa0ZvHbPmsJmvB/EgkyUnuJj77ptA0jNo1jtzSfEyis
 53x4ffWnIsPl9yqhk0oKerIALVUvV4A7/me2ya6tsQ5fiBX7lJ3+qwggvCkWQzw=
 =fMS2
 -----END PGP SIGNATURE-----

Merge tag 'kvm-arm-for-3.20' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into kvm-next

KVM/ARM changes for v3.20 including GICv3 emulation, dirty page logging, added
trace symbols, and adding an explicit VGIC init device control IOCTL.

Conflicts:
	arch/arm64/include/asm/kvm_arm.h
	arch/arm64/kvm/handle_exit.c
2015-01-23 13:39:51 +01:00
Jens Freimann 0eb135ff9f KVM: s390: remove redundant setting of interrupt type
Setting inti->type again is unnecessary here, so let's
remove this.

Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-01-23 13:25:42 +01:00
Jens Freimann 94d1f564ad KVM: s390: fix bug in interrupt parameter check
When we convert interrupt data from struct kvm_s390_interrupt to
struct kvm_s390_irq we need to check the data in the input parameter
not the output parameter.

Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-01-23 13:25:42 +01:00
David Hildenbrand 428d53be5e KVM: s390: avoid memory leaks if __inject_vm() fails
We have to delete the allocated interrupt info if __inject_vm() fails.

Otherwise user space can keep flooding kvm with floating interrupts and
provoke more and more memory leaks.

Reported-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
Reviewed-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Cc: stable@vger.kernel.org # v3.15+
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-01-23 13:25:41 +01:00
Tony Krowiak a374e892c3 KVM: s390/cpacf: Enable/disable protected key functions for kvm guest
Created new KVM device attributes for indicating whether the AES and
DES/TDES protected key functions are available for programs running
on the KVM guest.  The attributes are used to set up the controls in
the guest SIE block that specify whether programs running on the
guest will be given access to the protected key functions available
on the s390 hardware.

Signed-off-by: Tony Krowiak <akrowiak@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: Michael Mueller <mimu@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
[split MSA4/protected key into two patches]
2015-01-23 13:25:40 +01:00
Jason J. Herne 72f250206f KVM: s390: Provide guest TOD Clock Get/Set Controls
Provide controls for setting/getting the guest TOD clock based on the VM
attribute interface.

Provide TOD and TOD_HIGH vm attributes on s390 for managing guest Time Of
Day clock value.

TOD_HIGH is presently always set to 0. In the future it will contain a high
order expansion of the tod clock value after it overflows the 64-bits of
the TOD.

Signed-off-by: Jason J. Herne <jjherne@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-01-23 13:25:40 +01:00
Jens Freimann 556cc0dab1 KVM: s390: trace correct values for set prefix and machine checks
When injecting SIGP set prefix or a machine check, we trace
the values in our per-vcpu local_int data structure instead
of the parameters passed to the function.

Fix this by changing the trace statement to use the correct values.

Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-01-23 13:25:39 +01:00
Jens Freimann 49538d1238 KVM: s390: fix bug in sigp emergency signal injection
Currently we are always setting the wrong bit in the
bitmap for pending emergency signals. Instead of using
emerg.code from the passed in irq parameter, we use the
value in our per-vcpu local_int structure, which is always zero.
That means all emergency signals will have address 0 as parameter.
If two CPUs send a SIGP to the same target, one might be lost.

Let's fix this by using the value from the parameter and
also trace the correct value.

Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-01-23 13:25:39 +01:00
Thomas Huth 3cfad02380 KVM: s390: Take addressing mode into account for MVPG interception
The handler for MVPG partial execution interception does not take
the current CPU addressing mode into account yet, so addresses are
always treated as 64-bit addresses. For correct behaviour, we should
properly handle 24-bit and 31-bit addresses, too.

Since MVPG is defined to work with logical addresses, we can simply
use guest_translate_address() to achieve the required behaviour
(since DAT is disabled here, guest_translate_address() skips the MMU
translation and only translates the address via kvm_s390_logical_to_effective()
and kvm_s390_real_to_abs(), which is exactly what we want here).

Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-01-23 13:25:38 +01:00
Christian Borntraeger 69a8d45626 KVM: s390: no need to hold the kvm->mutex for floating interrupts
The kvm mutex was (probably) used to protect against cpu hotplug.
The current code no longer needs to protect against that, as we only
rely on CPU data structures that are guaranteed to be available
if we can access the CPU. (e.g. vcpu_create will put the cpu
in the array AFTER the cpu is ready).

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
2015-01-23 13:25:37 +01:00
David Hildenbrand 2444b352c3 KVM: s390: forward most SIGP orders to user space
Most SIGP orders are handled partially in kernel and partially in
user space. In order to:
- Get a correct SIGP SET PREFIX handler that informs user space
- Avoid race conditions between concurrently executed SIGP orders
- Serialize SIGP orders per VCPU

We need to handle all "slow" SIGP orders in user space. The remaining
ones to be handled completely in kernel are:
- SENSE
- SENSE RUNNING
- EXTERNAL CALL
- EMERGENCY SIGNAL
- CONDITIONAL EMERGENCY SIGNAL
According to the PoP, they have to be fast. They can be executed
without conflicting to the actions of other pending/concurrently
executing orders (e.g. STOP vs. START).

This patch introduces a new capability that will - when enabled -
forward all but the mentioned SIGP orders to user space. The
instruction counters in the kernel are still updated.

Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-01-23 13:25:37 +01:00
David Hildenbrand 9fbd80828c KVM: s390: clear the pfault queue if user space sets the invalid token
We need a way to clear the async pfault queue from user space (e.g.
for resets and SIGP SET ARCHITECTURE).

This patch simply clears the queue as soon as user space sets the
invalid pfault token. The definition of the invalid token is moved
to uapi.

Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-01-23 13:25:36 +01:00
David Hildenbrand ea5f496925 KVM: s390: only one external call may be pending at a time
Only one external call may be pending at a vcpu at a time. For this
reason, we have to detect whether the SIGP externcal call interpretation
facility is available. If so, all external calls have to be injected
using this mechanism.

SIGP EXTERNAL CALL orders have to return whether another external
call is already pending. This check was missing until now.

SIGP SENSE hasn't returned yet in all conditions whether an external
call was pending.

If a SIGP EXTERNAL CALL irq is to be injected and one is already
pending, -EBUSY is returned.

Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-01-23 13:25:36 +01:00
David Hildenbrand d614be05c8 s390/sclp: introduce check for the SIGP Interpretation Facility
This patch introduces the infrastructure to check whether the SIGP
Interpretation Facility is installed on all VCPUs in the configuration.

Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-01-23 13:25:35 +01:00
David Hildenbrand a3a9c59a68 KVM: s390: SIGP SET PREFIX cleanup
This patch cleanes up the the SIGP SET PREFIX code.

A SIGP SET PREFIX irq may only be injected if the target vcpu is
stopped. Let's move the checking code into the injection code and
return -EBUSY if the target vcpu is not stopped.

Reviewed-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-01-23 13:25:34 +01:00
David Hildenbrand 9a022067ad KVM: s390: a VCPU may only stop when no interrupts are left pending
As a SIGP STOP is an interrupt with the least priority, it may only result
in stop of the vcpu when no other interrupts are left pending.

To detect whether a non-stop irq is pending, we need a way to mask out
stop irqs from the general kvm_cpu_has_interrupt() function. For this
reason, the existing function (with an outdated name) is replaced by
kvm_s390_vcpu_has_irq() which allows to mask out pending stop irqs.

Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-01-23 13:25:34 +01:00
David Hildenbrand 6cddd432e3 KVM: s390: handle stop irqs without action_bits
This patch removes the famous action_bits and moves the handling of
SIGP STOP AND STORE STATUS directly into the SIGP STOP interrupt.

The new local interrupt infrastructure is used to track pending stop
requests.

STOP irqs are the only irqs that don't get actively delivered. They
remain pending until the stop function is executed (=stop intercept).

If another STOP irq is already pending, -EBUSY will now be returned
(needed for the SIGP handling code).

Migration of pending SIGP STOP (AND STORE STATUS) orders should now
be supported out of the box.

Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-01-23 13:25:33 +01:00
David Hildenbrand 2822545f9f KVM: s390: new parameter for SIGP STOP irqs
In order to get rid of the action_flags and to properly migrate pending SIGP
STOP irqs triggered e.g. by SIGP STOP AND STORE STATUS, we need to remember
whether to store the status when stopping.

For this reason, a new parameter (flags) for the SIGP STOP irq is introduced.
These flags further define details of the requested STOP and can be easily
migrated.

Reviewed-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-01-23 13:25:33 +01:00
David Hildenbrand 2d00f75942 KVM: s390: forward hrtimer if guest ckc not pending yet
Patch 0759d0681c ("KVM: s390: cleanup handle_wait by reusing
kvm_vcpu_block") changed the way pending guest clock comparator
interrupts are detected. It was assumed that as soon as the hrtimer
wakes up, the condition for the guest ckc is satisfied.

This is however only true as long as adjclock() doesn't speed
up the monotonic clock. Reason is that the hrtimer is based on
CLOCK_MONOTONIC, the guest clock comparator detection is based
on the raw TOD clock. If CLOCK_MONOTONIC runs faster than the
TOD clock, the hrtimer wakes the target VCPU up too early and
the target VCPU will not detect any pending interrupts, therefore
going back to sleep. It will never be woken up again because the
hrtimer has finished. The VCPU is stuck.

As a quick fix, we have to forward the hrtimer until the guest
clock comparator is really due, to guarantee properly timed wake
ups.

As the hrtimer callback might be triggered on another cpu, we
have to make sure that the timer is really stopped and not currently
executing the callback on another cpu. This can happen if the vcpu
thread is scheduled onto another physical cpu, but the timer base
is not migrated. So lets use hrtimer_cancel instead of try_to_cancel.

A proper fix might be to introduce a RAW based hrtimer.

Reported-by: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: stable@vger.kernel.org
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-01-23 13:25:32 +01:00
David Hildenbrand 0ac96caf0f KVM: s390: base hrtimer on a monotonic clock
The hrtimer that handles the wait with enabled timer interrupts
should not be disturbed by changes of the host time.

This patch changes our hrtimer to be based on a monotonic clock.

Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Cc: stable@vger.kernel.org
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-01-23 13:25:32 +01:00
David Hildenbrand bda343ef14 KVM: s390: prevent sleep duration underflows in handle_wait()
We sometimes get an underflow for the sleep duration, which most
likely won't result in the short sleep time we wanted.

So let's check for sleep duration underflows and directly continue
to run the guest if we get one.

Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-01-23 13:25:31 +01:00
Dominik Dingel 8c0a7ce606 KVM: s390: Allow userspace to limit guest memory size
With commit c6c956b80b ("KVM: s390/mm: support gmap page tables with less
than 5 levels") we are able to define a limit for the guest memory size.

As we round up the guest size in respect to the levels of page tables
we get to guest limits of: 2048 MB, 4096 GB, 8192 TB and 16384 PB.
We currently limit the guest size to 16 TB, which means we end up
creating a page table structure supporting guest sizes up to 8192 TB.

This patch introduces an interface that allows userspace to tune
this limit. This may bring performance improvements for small guests.

Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-01-23 13:25:30 +01:00
Dominik Dingel dafd032a15 KVM: s390: move vcpu specific initalization to a later point
As we will allow in a later patch to recreate gmaps with new limits,
we need to make sure that vcpus get their reference for that gmap
after they increased the online_vcpu counter, so there is no possible race.

While we are doing this, we also can simplify the vcpu_init function, by
moving ucontrol specifics to an own function.
That way we also start now setting the kvm_valid_regs for the ucontrol path.

Reviewed-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-01-23 13:25:30 +01:00
Christian Borntraeger 0675d92dcf KVM: s390: make local function static
sparse rightfully complains about
warning: symbol '__inject_extcall' was not declared. Should it be static?

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-01-23 13:25:29 +01:00
Dominik Dingel 31928aa586 KVM: remove unneeded return value of vcpu_postcreate
The return value of kvm_arch_vcpu_postcreate is not checked in its
caller.  This is okay, because only x86 provides vcpu_postcreate right
now and it could only fail if vcpu_load failed.  But that is not
possible during KVM_CREATE_VCPU (kvm_arch_vcpu_load is void, too), so
just get rid of the unchecked return value.

Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-01-23 13:24:52 +01:00
Paolo Bonzini c6156df9d3 Merge branch 'arm64/common-esr-macros' of git://git.kernel.org/pub/scm/linux/kernel/git/mark/linux into kvm-next
ESR_ELx definitions clean-up from Mark Rutland.

* 'arm64/common-esr-macros' of git://git.kernel.org/pub/scm/linux/kernel/git/mark/linux:
  arm64: kvm: decode ESR_ELx.EC when reporting exceptions
  arm64: kvm: remove ESR_EL2_* macros
  arm64: remove ESR_EL1_* macros
  arm64: kvm: move to ESR_ELx macros
  arm64: decode ESR_ELx.EC when reporting exceptions
  arm64: move to ESR_ELx macros
  arm64: introduce common ESR_ELx_* definitions

This is required by the patch "arm/arm64: KVM: add tracing support for
arm64 exit handler" in Christoffer's pull request.
2015-01-23 13:09:15 +01:00
Christoffer Dall 4b99058995 KVM: Remove unused config symbol
The dirty patch logging series introduced both
HAVE_KVM_ARCH_DIRTY_LOG_PROTECT and KVM_GENERIC_DIRTYLOG_READ_PROTECT
config symbols, but only KVM_GENERIC_DIRTYLOG_READ_PROTECT is used.
Just remove the unused one.

(The config symbol was renamed during the development of the patch
series and the old name just creeped in by accident.()

Reported-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-01-23 10:52:03 +01:00
Christoffer Dall 227ea818f2 arm/arm64: KVM: Fixup incorrect config symbol in comment
A comment in the dirty page logging patch series mentioned incorrectly
spelled config symbols, just fix them up to match the real thing.

Reported-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-01-23 10:51:58 +01:00
Borislav Petkov cfaa790a3f kvm: Fix CR3_PCID_INVD type on 32-bit
arch/x86/kvm/emulate.c: In function ‘check_cr_write’:
arch/x86/kvm/emulate.c:3552:4: warning: left shift count >= width of type
    rsvd = CR3_L_MODE_RESERVED_BITS & ~CR3_PCID_INVD;

happens because sizeof(UL) on 32-bit is 4 bytes but we shift it 63 bits
to the left.

Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-21 15:59:09 +01:00
Marcelo Tosatti 54750f2cf0 KVM: x86: workaround SuSE's 2.6.16 pvclock vs masterclock issue
SuSE's 2.6.16 kernel fails to boot if the delta between tsc_timestamp
and rdtsc is larger than a given threshold:

 * If we get more than the below threshold into the future, we rerequest
 * the real time from the host again which has only little offset then
 * that we need to adjust using the TSC.
 *
 * For now that threshold is 1/5th of a jiffie. That should be good
 * enough accuracy for completely broken systems, but also give us swing
 * to not call out to the host all the time.
 */
#define PVCLOCK_DELTA_MAX ((1000000000ULL / HZ) / 5)

Disable masterclock support (which increases said delta) in case the
boot vcpu does not use MSR_KVM_SYSTEM_TIME_NEW.

Upstreams kernels which support pvclock vsyscalls (and therefore make
use of PVCLOCK_STABLE_BIT) use MSR_KVM_SYSTEM_TIME_NEW.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-20 20:38:39 +01:00
Fengguang Wu 69b0049a89 KVM: fix "Should it be static?" warnings from sparse
arch/x86/kvm/x86.c:495:5: sparse: symbol 'kvm_read_nested_guest_page' was not declared. Should it be static?
arch/x86/kvm/x86.c:646:5: sparse: symbol '__kvm_set_xcr' was not declared. Should it be static?
arch/x86/kvm/x86.c:1183:15: sparse: symbol 'max_tsc_khz' was not declared. Should it be static?
arch/x86/kvm/x86.c:1237:6: sparse: symbol 'kvm_track_tsc_matching' was not declared. Should it be static?

Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-20 20:38:35 +01:00
Andre Przywara 4fa96afd94 arm/arm64: KVM: force alignment of VGIC dist/CPU/redist addresses
Although the GIC architecture requires us to map the MMIO regions
only at page aligned addresses, we currently do not enforce this from
the kernel side.
Restrict any vGICv2 regions to be 4K aligned and any GICv3 regions
to be 64K aligned. Document this requirement.

Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-01-20 18:25:33 +01:00
Andre Przywara ac3d373564 arm/arm64: KVM: allow userland to request a virtual GICv3
With all of the GICv3 code in place now we allow userland to ask the
kernel for using a virtual GICv3 in the guest.
Also we provide the necessary support for guests setting the memory
addresses for the virtual distributor and redistributors.
This requires some userland code to make use of that feature and
explicitly ask for a virtual GICv3.
Document that KVM_CREATE_IRQCHIP only works for GICv2, but is
considered legacy and using KVM_CREATE_DEVICE is preferred.

Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-01-20 18:25:33 +01:00
Andre Przywara b5d84ff600 arm/arm64: KVM: enable kernel side of GICv3 emulation
With all the necessary GICv3 emulation code in place, we can now
connect the code to the GICv3 backend in the kernel.
The LR register handling is different depending on the emulated GIC
model, so provide different implementations for each.
Also allow non-v2-compatible GICv3 implementations (which don't
provide MMIO regions for the virtual CPU interface in the DT), but
restrict those hosts to support GICv3 guests only.
If the device tree provides a GICv2 compatible GICV resource entry,
but that one is faulty, just disable the GICv2 emulation and let the
user use at least the GICv3 emulation for guests.
To provide proper support for the legacy KVM_CREATE_IRQCHIP ioctl,
note virtual GICv2 compatibility in struct vgic_params and use it
on creating a VGICv2.

Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-01-20 18:25:32 +01:00
Andre Przywara 6d52f35af1 arm64: KVM: add SGI generation register emulation
While the generation of a (virtual) inter-processor interrupt (SGI)
on a GICv2 works by writing to a MMIO register, GICv3 uses the system
register ICC_SGI1R_EL1 to trigger them.
Add a trap handler function that calls the new SGI register handler
in the GICv3 code. As ICC_SRE_EL1.SRE at this point is still always 0,
this will not trap yet, but will only be used later when all the data
structures have been initialized properly.

Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-01-20 18:25:32 +01:00
Andre Przywara 7e5802781c arm64: GICv3: introduce symbolic names for GICv3 ICC_SGI1R_EL1 fields
The gic_send_sgi() function used hardcoded bit shift values to
generate the ICC_SGI1R_EL1 register value.
Replace this with symbolic names to allow reusing them later.

Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-01-20 18:25:31 +01:00
Andre Przywara a0675c25d6 arm/arm64: KVM: add virtual GICv3 distributor emulation
With everything separated and prepared, we implement a model of a
GICv3 distributor and redistributors by using the existing framework
to provide handler functions for each register group.

Currently we limit the emulation to a model enforcing a single
security state, with SRE==1 (forcing system register access) and
ARE==1 (allowing more than 8 VCPUs).

We share some of the functions provided for GICv2 emulation, but take
the different ways of addressing (v)CPUs into account.
Save and restore is currently not implemented.

Similar to the split-off of the GICv2 specific code, the new emulation
code goes into a new file (vgic-v3-emul.c).

Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-01-20 18:25:31 +01:00
Andre Przywara 9fedf14677 arm/arm64: KVM: add opaque private pointer to MMIO data
For a GICv2 there is always only one (v)CPU involved: the one that
does the access. On a GICv3 the access to a CPU redistributor is
memory-mapped, but not banked, so the (v)CPU affected is determined by
looking at the MMIO address region being accessed.
To allow passing the affected CPU into the accessors later, extend
struct kvm_exit_mmio to add an opaque private pointer parameter.
The current GICv2 emulation just does not use it.

Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-01-20 18:25:30 +01:00
Andre Przywara 1d916229e3 arm/arm64: KVM: split GICv2 specific emulation code from vgic.c
vgic.c is currently a mixture of generic vGIC emulation code and
functions specific to emulating a GICv2. To ease the addition of
GICv3, split off strictly v2 specific parts into a new file
vgic-v2-emul.c.

Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>

-------
As the diff isn't always obvious here (and to aid eventual rebases),
here is a list of high-level changes done to the code:
* added new file to respective arm/arm64 Makefiles
* moved GICv2 specific functions to vgic-v2-emul.c:
  - handle_mmio_misc()
  - handle_mmio_set_enable_reg()
  - handle_mmio_clear_enable_reg()
  - handle_mmio_set_pending_reg()
  - handle_mmio_clear_pending_reg()
  - handle_mmio_priority_reg()
  - vgic_get_target_reg()
  - vgic_set_target_reg()
  - handle_mmio_target_reg()
  - handle_mmio_cfg_reg()
  - handle_mmio_sgi_reg()
  - vgic_v2_unqueue_sgi()
  - read_set_clear_sgi_pend_reg()
  - write_set_clear_sgi_pend_reg()
  - handle_mmio_sgi_set()
  - handle_mmio_sgi_clear()
  - vgic_v2_handle_mmio()
  - vgic_get_sgi_sources()
  - vgic_dispatch_sgi()
  - vgic_v2_queue_sgi()
  - vgic_v2_map_resources()
  - vgic_v2_init()
  - vgic_v2_add_sgi_source()
  - vgic_v2_init_model()
  - vgic_v2_init_emulation()
  - handle_cpu_mmio_misc()
  - handle_mmio_abpr()
  - handle_cpu_mmio_ident()
  - vgic_attr_regs_access()
  - vgic_create() (renamed to vgic_v2_create())
  - vgic_destroy() (renamed to vgic_v2_destroy())
  - vgic_has_attr() (renamed to vgic_v2_has_attr())
  - vgic_set_attr() (renamed to vgic_v2_set_attr())
  - vgic_get_attr() (renamed to vgic_v2_get_attr())
  - struct kvm_mmio_range vgic_dist_ranges[]
  - struct kvm_mmio_range vgic_cpu_ranges[]
  - struct kvm_device_ops kvm_arm_vgic_v2_ops {}

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-01-20 18:25:30 +01:00