Book3S HV only (debugging aids, minor performance improvements and some
cleanups). But there are also bug fixes and small cleanups for ARM,
x86 and s390.
The task_migration_notifier revert and real fix is still pending review,
but I'll send it as soon as possible after -rc1.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQEcBAABAgAGBQJVOONLAAoJEL/70l94x66DbsMIAIpZPsaqgXOC1sDEiZuYay+6
rD4n4id7j8hIAzcf3AlZdyf5XgLlr6I1Zyt62s1WcoRq/CCnL7k9EljzSmw31WFX
P2y7/J0iBdkn0et+PpoNThfL2GsgTqNRCLOOQlKgEQwMP9Dlw5fnUbtC1UchOzTg
eAMeBIpYwufkWkXhdMw4PAD4lJ9WxUZ1eXHEBRzJb0o0ZxAATJ1tPZGrFJzoUOSM
WsVNTuBsNd7upT02kQdvA1TUo/OPjseTOEoksHHwfcORt6bc5qvpctL3jYfcr7sk
/L6sIhYGVNkjkuredjlKGLfT2DDJjSEdJb1k2pWrDRsY76dmottQubAE9J9cDTk=
=OAi2
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull second batch of KVM changes from Paolo Bonzini:
"This mostly includes the PPC changes for 4.1, which this time cover
Book3S HV only (debugging aids, minor performance improvements and
some cleanups). But there are also bug fixes and small cleanups for
ARM, x86 and s390.
The task_migration_notifier revert and real fix is still pending
review, but I'll send it as soon as possible after -rc1"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (29 commits)
KVM: arm/arm64: check IRQ number on userland injection
KVM: arm: irqfd: fix value returned by kvm_irq_map_gsi
KVM: VMX: Preserve host CR4.MCE value while in guest mode.
KVM: PPC: Book3S HV: Use msgsnd for signalling threads on POWER8
KVM: PPC: Book3S HV: Translate kvmhv_commence_exit to C
KVM: PPC: Book3S HV: Streamline guest entry and exit
KVM: PPC: Book3S HV: Use bitmap of active threads rather than count
KVM: PPC: Book3S HV: Use decrementer to wake napping threads
KVM: PPC: Book3S HV: Don't wake thread with no vcpu on guest IPI
KVM: PPC: Book3S HV: Get rid of vcore nap_count and n_woken
KVM: PPC: Book3S HV: Move vcore preemption point up into kvmppc_run_vcpu
KVM: PPC: Book3S HV: Minor cleanups
KVM: PPC: Book3S HV: Simplify handling of VCPUs that need a VPA update
KVM: PPC: Book3S HV: Accumulate timing information for real-mode code
KVM: PPC: Book3S HV: Create debugfs file for each guest's HPT
KVM: PPC: Book3S HV: Add ICP real mode counters
KVM: PPC: Book3S HV: Move virtual mode ICP functions to real-mode
KVM: PPC: Book3S HV: Convert ICS mutex lock to spin lock
KVM: PPC: Book3S HV: Add guest->host real mode completion counters
KVM: PPC: Book3S HV: Add helpers for lock/unlock hpte
...
commit b273921356df ("KVM: s390: enable more features that need
no hypervisor changes") also enabled RRBM. Turns out that this
instruction does need some KVM code, so lets disable that bit
again.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Fixes: b273921356df ("KVM: s390: enable more features that need no hypervisor changes")
Message-Id: <1429093624-49611-2-git-send-email-borntraeger@de.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
of the TRACE_DEFINE_ENUM() macro that can be used by tracepoints.
Tracepoints have helper functions for the TP_printk() called
__print_symbolic() and __print_flags() that lets a numeric number be
displayed as a a human comprehensible text. What is placed in the
TP_printk() is also shown in the tracepoint format file such that
user space tools like perf and trace-cmd can parse the binary data
and express the values too. Unfortunately, the way the TRACE_EVENT()
macro works, anything placed in the TP_printk() will be shown pretty
much exactly as is. The problem arises when enums are used. That's
because unlike macros, enums will not be changed into their values
by the C pre-processor. Thus, the enum string is exported to the
format file, and this makes it useless for user space tools.
The TRACE_DEFINE_ENUM() solves this by converting the enum strings
in the TP_printk() format into their number, and that is what is
shown to user space. For example, the tracepoint tlb_flush currently
has this in its format file:
__print_symbolic(REC->reason,
{ TLB_FLUSH_ON_TASK_SWITCH, "flush on task switch" },
{ TLB_REMOTE_SHOOTDOWN, "remote shootdown" },
{ TLB_LOCAL_SHOOTDOWN, "local shootdown" },
{ TLB_LOCAL_MM_SHOOTDOWN, "local mm shootdown" })
After adding:
TRACE_DEFINE_ENUM(TLB_FLUSH_ON_TASK_SWITCH);
TRACE_DEFINE_ENUM(TLB_REMOTE_SHOOTDOWN);
TRACE_DEFINE_ENUM(TLB_LOCAL_SHOOTDOWN);
TRACE_DEFINE_ENUM(TLB_LOCAL_MM_SHOOTDOWN);
Its format file will contain this:
__print_symbolic(REC->reason,
{ 0, "flush on task switch" },
{ 1, "remote shootdown" },
{ 2, "local shootdown" },
{ 3, "local mm shootdown" })
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJVLBTuAAoJEEjnJuOKh9ldjHMIALdRS755TXCZGOf0r7O2akOR
wMPeum7C+ae1mH+jCsJKUC0/jUfQKaMt/UxoHlipDgcGg8kD2jtGnGCw4Xlwvdsr
y4rFmcTRSl1mo0zDSsg6ujoupHlVYN0+JPjrd7S3cv/llJoY49zcanNLF7S2XLeM
dZCtWRLWYpBiWO68ai6AqJTnE/eGFIqBI048qb5Eg8dbK243SSeSIf9Ywhb+VsA+
aq6F7cWI/H6j4tbeza8tAN19dcwenDro5EfCDY8ARQHJu1f6Y3+DLf2imjkd6Aiu
JVAoGIjHIpI+djwCZC1u4gi4urjfOqYartrM3Q54tb3YWYqHeNqP2ASI2a4EpYk=
=Ixwt
-----END PGP SIGNATURE-----
Merge tag 'trace-v4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing updates from Steven Rostedt:
"Some clean ups and small fixes, but the biggest change is the addition
of the TRACE_DEFINE_ENUM() macro that can be used by tracepoints.
Tracepoints have helper functions for the TP_printk() called
__print_symbolic() and __print_flags() that lets a numeric number be
displayed as a a human comprehensible text. What is placed in the
TP_printk() is also shown in the tracepoint format file such that user
space tools like perf and trace-cmd can parse the binary data and
express the values too. Unfortunately, the way the TRACE_EVENT()
macro works, anything placed in the TP_printk() will be shown pretty
much exactly as is. The problem arises when enums are used. That's
because unlike macros, enums will not be changed into their values by
the C pre-processor. Thus, the enum string is exported to the format
file, and this makes it useless for user space tools.
The TRACE_DEFINE_ENUM() solves this by converting the enum strings in
the TP_printk() format into their number, and that is what is shown to
user space. For example, the tracepoint tlb_flush currently has this
in its format file:
__print_symbolic(REC->reason,
{ TLB_FLUSH_ON_TASK_SWITCH, "flush on task switch" },
{ TLB_REMOTE_SHOOTDOWN, "remote shootdown" },
{ TLB_LOCAL_SHOOTDOWN, "local shootdown" },
{ TLB_LOCAL_MM_SHOOTDOWN, "local mm shootdown" })
After adding:
TRACE_DEFINE_ENUM(TLB_FLUSH_ON_TASK_SWITCH);
TRACE_DEFINE_ENUM(TLB_REMOTE_SHOOTDOWN);
TRACE_DEFINE_ENUM(TLB_LOCAL_SHOOTDOWN);
TRACE_DEFINE_ENUM(TLB_LOCAL_MM_SHOOTDOWN);
Its format file will contain this:
__print_symbolic(REC->reason,
{ 0, "flush on task switch" },
{ 1, "remote shootdown" },
{ 2, "local shootdown" },
{ 3, "local mm shootdown" })"
* tag 'trace-v4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (27 commits)
tracing: Add enum_map file to show enums that have been mapped
writeback: Export enums used by tracepoint to user space
v4l: Export enums used by tracepoints to user space
SUNRPC: Export enums in tracepoints to user space
mm: tracing: Export enums in tracepoints to user space
irq/tracing: Export enums in tracepoints to user space
f2fs: Export the enums in the tracepoints to userspace
net/9p/tracing: Export enums in tracepoints to userspace
x86/tlb/trace: Export enums in used by tlb_flush tracepoint
tracing/samples: Update the trace-event-sample.h with TRACE_DEFINE_ENUM()
tracing: Allow for modules to convert their enums to values
tracing: Add TRACE_DEFINE_ENUM() macro to map enums to their values
tracing: Update trace-event-sample with TRACE_SYSTEM_VAR documentation
tracing: Give system name a pointer
brcmsmac: Move each system tracepoints to their own header
iwlwifi: Move each system tracepoints to their own header
mac80211: Move message tracepoints to their own header
tracing: Add TRACE_SYSTEM_VAR to xhci-hcd
tracing: Add TRACE_SYSTEM_VAR to kvm-s390
tracing: Add TRACE_SYSTEM_VAR to intel-sst
...
ARM/ARM64: fixes for live migration, irqfd and ioeventfd support (enabling
vhost, too), page aging
s390: interrupt handling rework, allowing to inject all local interrupts
via new ioctl and to get/set the full local irq state for migration
and introspection. New ioctls to access memory by virtual address,
and to get/set the guest storage keys. SIMD support.
MIPS: FPU and MIPS SIMD Architecture (MSA) support. Includes some patches
from Ralf Baechle's MIPS tree.
x86: bugfixes (notably for pvclock, the others are small) and cleanups.
Another small latency improvement for the TSC deadline timer.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQEcBAABAgAGBQJVJ9vmAAoJEL/70l94x66DoMEH/R3rh8IMf4jTiWRkcqohOMPX
k1+NaSY/lCKayaSgggJ2hcQenMbQoXEOdslvaA/H0oC+VfJGK+lmU6E63eMyyhjQ
Y+Px6L85NENIzDzaVu/TIWWuhil5PvIRr3VO8cvntExRoCjuekTUmNdOgCvN2ObW
wswN2qRdPIeEj2kkulbnye+9IV4G0Ne9bvsmUdOdfSSdi6ZcV43JcvrpOZT++mKj
RrKB+3gTMZYGJXMMLBwMkdl8mK1ozriD+q0mbomT04LUyGlPwYLl4pVRDBqyksD7
KsSSybaK2E4i5R80WEljgDMkNqrCgNfg6VZe4n9Y+CfAAOToNnkMJaFEi+yuqbs=
=yu2b
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM updates from Paolo Bonzini:
"First batch of KVM changes for 4.1
The most interesting bit here is irqfd/ioeventfd support for ARM and
ARM64.
Summary:
ARM/ARM64:
fixes for live migration, irqfd and ioeventfd support (enabling
vhost, too), page aging
s390:
interrupt handling rework, allowing to inject all local interrupts
via new ioctl and to get/set the full local irq state for migration
and introspection. New ioctls to access memory by virtual address,
and to get/set the guest storage keys. SIMD support.
MIPS:
FPU and MIPS SIMD Architecture (MSA) support. Includes some
patches from Ralf Baechle's MIPS tree.
x86:
bugfixes (notably for pvclock, the others are small) and cleanups.
Another small latency improvement for the TSC deadline timer"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (146 commits)
KVM: use slowpath for cross page cached accesses
kvm: mmu: lazy collapse small sptes into large sptes
KVM: x86: Clear CR2 on VCPU reset
KVM: x86: DR0-DR3 are not clear on reset
KVM: x86: BSP in MSR_IA32_APICBASE is writable
KVM: x86: simplify kvm_apic_map
KVM: x86: avoid logical_map when it is invalid
KVM: x86: fix mixed APIC mode broadcast
KVM: x86: use MDA for interrupt matching
kvm/ppc/mpic: drop unused IRQ_testbit
KVM: nVMX: remove unnecessary double caching of MAXPHYADDR
KVM: nVMX: checks for address bits beyond MAXPHYADDR on VM-entry
KVM: x86: cache maxphyaddr CPUID leaf in struct kvm_vcpu
KVM: vmx: pass error code with internal error #2
x86: vdso: fix pvclock races with task migration
KVM: remove kvm_read_hva and kvm_read_hva_atomic
KVM: x86: optimize delivery of TSC deadline timer interrupt
KVM: x86: extract blocking logic from __vcpu_run
kvm: x86: fix x86 eflags fixed bit
KVM: s390: migrate vcpu interrupt state
...
New code will require TRACE_SYSTEM to be a valid C variable name,
but some tracepoints have TRACE_SYSTEM with '-' and not '_', so
it can not be used. Instead, add a TRACE_SYSTEM_VAR that can
give the tracing infrastructure a unique name for the trace system.
Link: http://lkml.kernel.org/r/20150402111500.5e52c1ed.cornelia.huck@de.ibm.com
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: David Hildenbrand <dahi@linux.vnet.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
1. Assorted changes
1.1 allow more feature bits for the guest
1.2 Store breaking event address on program interrupts
2. Interrupt handling rework
2.1 Fix copy_to_user while holding a spinlock (cc stable)
2.2 Rework floating interrupts to follow the priorities
2.3 Allow to inject all local interrupts via new ioctl
2.4 allow to get/set the full local irq state, e.g. for migration
and introspection
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.14 (GNU/Linux)
iQIcBAABAgAGBQJVGvEUAAoJEBF7vIC1phx82tEP/3KrwsDRs+buiBqyv9k+qCFV
v+R94gReBB5ggfbGfUYgBJMR2/4XQ+0jcZ55jfBCC4osOq6Juw/8HIj2nSgbQHmz
F9Go0n8IqJ3DnqPTc0KYdFZ7kqDvMV5ME3XJrFiAHv1TUL9H/KpZArkcVIwD2NOo
w01AVrCDY4bTajYqKShzGFymQl1K5vTGGvgxhh4kAHct4Nt5N5HFmyROm0RrsFZx
Sycx4t177O7zhCN2tv5Zy8iWaEvzHAESoXkhZ2cJ6t+FXii2Eov5IgyyfYRXBfbm
YACyvlFD087UdFGTt85ggPVS/S/5hn9xXmVHuIimHeyZU7CXCN5vYPcn+ZyksYr5
uA8+/2OPAgcaeDa2f7nCjl8jmcLR3hkQ0n/urA+pPYAZANJoFDfiGOr/kVk6aKff
JTGSFUjNK891/IGEsdrSk2p64U5xMd8LFa3Il++kZT91gc2nrZOHNz5FGlXlkLdJ
sADeNFWhoprEt/2P4aX6W2j26L8G874XkldDSjrS41U8L55+IiEm09r8oAWgfc5A
pryeDaN4nSjFC+HOtlPkcVkAcsswiI6nHIm3+/XFetCq+v4pnVKFMHWsTeEjiQgQ
H5aV9mfEKTJaCPrAJMsj8ZsKq0usG+BeRNqpIvxPAQB8fyl3jw9iu+RHeY1xWsTg
BRHB/+CGYIxDu4XdRexv
=Rrx5
-----END PGP SIGNATURE-----
Merge tag 'kvm-s390-next-20150331' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD
Features and fixes for 4.1 (kvm/next)
1. Assorted changes
1.1 allow more feature bits for the guest
1.2 Store breaking event address on program interrupts
2. Interrupt handling rework
2.1 Fix copy_to_user while holding a spinlock (cc stable)
2.2 Rework floating interrupts to follow the priorities
2.3 Allow to inject all local interrupts via new ioctl
2.4 allow to get/set the full local irq state, e.g. for migration
and introspection
This patch adds support to migrate vcpu interrupts. Two new vcpu ioctls
are added which get/set the complete status of pending interrupts in one
go. The ioctls are marked as available with the new capability
KVM_CAP_S390_IRQ_STATE.
We can not use a ONEREG, as the number of pending local interrupts is not
constant and depends on the number of CPUs.
To retrieve the interrupt state we add an ioctl KVM_S390_GET_IRQ_STATE.
Its input parameter is a pointer to a struct kvm_s390_irq_state which
has a buffer and length. For all currently pending interrupts, we copy
a struct kvm_s390_irq into the buffer and pass it to userspace.
To store interrupt state into a buffer provided by userspace, we add an
ioctl KVM_S390_SET_IRQ_STATE. It passes a struct kvm_s390_irq_state into
the kernel and injects all interrupts contained in the buffer.
Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Let's provide a version of kvm_s390_inject_vcpu() that
does not acquire the local-interrupt lock and skips
waking up the vcpu.
To be used in a later patch for vcpu-local interrupt migration,
where we are already holding the lock.
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
We have introduced struct kvm_s390_irq a while ago which allows to
inject all kinds of interrupts as defined in the Principles of
Operation.
Add ioctl to inject interrupts with the extended struct kvm_s390_irq
Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
We now have a mechanism for delivering interrupts according to their priority.
Let's inject them using our new infrastructure (instead of letting only hardware
handle them), so we can be sure that the irq priorities are satisfied.
For s390, the cpu timer and the clock comparator are to be checked for common
code kvm_cpu_has_pending_timer(), although the cpu timer is only stepped when
the guest is being executed.
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
This patch makes interrupt handling compliant to the z/Architecture
Principles of Operation with regard to interrupt priorities.
Add a bitmap for pending floating interrupts. Each bit relates to a
interrupt type and its list. A turned on bit indicates that a list
contains items (interrupts) which need to be delivered. When delivering
interrupts on a cpu we can merge the existing bitmap for cpu-local
interrupts and floating interrupts and have a single mechanism for
delivery.
Currently we have one list for all kinds of floating interrupts and a
corresponding spin lock. This patch adds a separate list per
interrupt type. An exception to this are service signal and machine check
interrupts, as there can be only one pending interrupt at a time.
Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
This fixes a bug introduced with commit c05c4186bb ("KVM: s390:
add floating irq controller").
get_all_floating_irqs() does copy_to_user() while holding
a spin lock. Let's fix this by filling a temporary buffer
first and copy it to userspace after giving up the lock.
Cc: <stable@vger.kernel.org> # 3.18+: 69a8d45626 KVM: s390: no need to hold...
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
After some review about what these facilities do, the following
facilities will work under KVM and can, therefore, be reported
to the guest if the cpu model and the host cpu provide this bit.
There are plans underway to make the whole bit thing more readable,
but its not yet finished. So here are some last bit changes and
we enhance the KVM mask with:
9 The sense-running-status facility is installed in the
z/Architecture architectural mode.
---> handled by SIE or KVM
10 The conditional-SSKE facility is installed in the
z/Architecture architectural mode.
---> handled by SIE. KVM will retry SIE
13 The IPTE-range facility is installed in the
z/Architecture architectural mode.
---> handled by SIE. KVM will retry SIE
36 The enhanced-monitor facility is installed in the
z/Architecture architectural mode.
---> handled by SIE
47 The CMPSC-enhancement facility is installed in the
z/Architecture architectural mode.
---> handled by SIE
48 The decimal-floating-point zoned-conversion facility
is installed in the z/Architecture architectural mode.
---> handled by SIE
49 The execution-hint, load-and-trap, miscellaneous-
instruction-extensions and processor-assist
---> handled by SIE
51 The local-TLB-clearing facility is installed in the
z/Architecture architectural mode.
---> handled by SIE
52 The interlocked-access facility 2 is installed.
---> handled by SIE
53 The load/store-on-condition facility 2 and load-and-
zero-rightmost-byte facility are installed in the
z/Architecture architectural mode.
---> handled by SIE
57 The message-security-assist-extension-5 facility is
installed in the z/Architecture architectural mode.
---> handled by SIE
66 The reset-reference-bits-multiple facility is installed
in the z/Architecture architectural mode.
---> handled by SIE. KVM will retry SIE
80 The decimal-floating-point packed-conversion
facility is installed in the z/Architecture architectural
mode.
---> handled by SIE
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Tested-by: Michael Mueller <mimu@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
If the PER-3 facility is installed, the breaking-event address is to be
stored in the low core.
There is no facility bit for PER-3 in stfl(e) and Linux always uses the
value at address 272 no matter if PER-3 is available or not.
We can't hide its existence from the guest. All program interrupts
injected via the SIE automatically store this information if the PER-3
facility is available in the hypervisor. Also the itdb contains the
address automatically.
As there is no switch to turn this mechanism off, let's simply make it
consistent and also store the breaking event address in case of manual
program interrupt injection.
Reviewed-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
This is needed in e.g. ARM vGIC emulation, where the MMIO handling
depends on the VCPU that does the access.
Signed-off-by: Nikolay Nikolaev <n.nikolaev@virtualopensystems.com>
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
The patch represents capability KVM_CAP_S390_VECTOR_REGISTERS by means
of the SIMD facility bit. This allows to a) disable the use of SIMD when
used in conjunction with a not-SIMD-aware QEMU, b) to enable SIMD when
used with a SIMD-aware version of QEMU and c) finally by means of a QEMU
version using the future cpu model ioctls.
Signed-off-by: Michael Mueller <mimu@linux.vnet.ibm.com>
Reviewed-by: Eric Farman <farman@linux.vnet.ibm.com>
Tested-by: Eric Farman <farman@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Setting the SIMD bit in the KVM mask is an issue because it makes the
facility visible but not usable to the guest, thus it needs to be
removed again.
Signed-off-by: Michael Mueller <mimu@linux.vnet.ibm.com>
Reviewed-by: Eric Farman <farman@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Provide the KVM_S390_GET_SKEYS and KVM_S390_SET_SKEYS ioctl which can be used
to get/set guest storage keys. This functionality is needed for live migration
of s390 guests that use storage keys.
Signed-off-by: Jason J. Herne <jjherne@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
The Store System Information (STSI) instruction currently collects all
information it relays to the caller in the kernel. Some information,
however, is only available in user space. An example of this is the
guest name: The kernel always sets "KVMGuest", but user space knows the
actual guest name.
This patch introduces a new exit, KVM_EXIT_S390_STSI, guarded by a
capability that can be enabled by user space if it wants to be able to
insert such data. User space will be provided with the target buffer
and the requested STSI function code.
Reviewed-by: Eric Farman <farman@linux.vnet.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Ekaterina Tumanova <tumanova@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
On s390, we've got to make sure to hold the IPTE lock while accessing
logical memory. So let's add an ioctl for reading and writing logical
memory to provide this feature for userspace, too.
The maximum transfer size of this call is limited to 64kB to prevent
that the guest can trigger huge copy_from/to_user transfers. QEMU
currently only requests up to one or two pages so far, so 16*4kB seems
to be a reasonable limit here.
Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Access register mode is one of the modes that control dynamic address
translation. In this mode the address space is specified by values of
the access registers. The effective address-space-control element is
obtained from the result of the access register translation. See
the "Access-Register Introduction" section of the chapter 5 "Program
Execution" in "Principles of Operations" for more details.
Signed-off-by: Alexander Yarygin <yarygin@linux.vnet.ibm.com>
Reviewed-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
During dynamic address translation the get_vcpu_asce()
function can be invoked several times. It's ok for usual modes, but will
be slow if CPUs are in AR mode. Let's call the get_vcpu_asce() once and
pass the result to the called functions.
Signed-off-by: Alexander Yarygin <yarygin@linux.vnet.ibm.com>
Reviewed-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
In access register mode, the write_guest() read_guest() and other
functions will invoke the access register translation, which
requires an ar, designated by one of the instruction fields.
Signed-off-by: Alexander Yarygin <yarygin@linux.vnet.ibm.com>
Reviewed-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
The kvm_s390_check_low_addr_protection() function is used only with real
addresses. According to the POP (the "Low-Address Protection"
paragraph in chapter 3), if the effective address is real or absolute,
the low-address protection procedure should raise a PROTECTION exception
only when the low-address protection is enabled in the control register
0 and the address is low.
This patch removes ASCE checks from the function and renames it to
better reflect its behavior.
Cc: Thomas Huth <thuth@linux.vnet.ibm.com>
Signed-off-by: Alexander Yarygin <yarygin@linux.vnet.ibm.com>
Reviewed-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
As all cleanup functions can handle their respective NULL case
there is no need to have more than one error jump label.
Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
POWER supports irqfds but forgot to advertise them. Some userspace does
not check for the capability, but others check it---thus they work on
x86 and s390 but not POWER.
To avoid that other architectures in the future make the same mistake, let
common code handle KVM_CAP_IRQFD the same way as KVM_CAP_IRQFD_RESAMPLE.
Reported-and-tested-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Cc: stable@vger.kernel.org
Fixes: 297e21053a
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
We finally have all the pieces in place, so let's include the
vector facility bit in the mask of available hardware facilities
for the guest to recognize. Also, enable the vector functionality
in the guest control blocks, to avoid a possible vector data
exception that would otherwise occur when a vector instruction
is issued by the guest operating system.
Signed-off-by: Eric Farman <farman@linux.vnet.ibm.com>
Reviewed-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Store additional status in the machine check handler, in order to
collect status (such as vector registers) that is not defined by
store status.
Signed-off-by: Eric Farman <farman@linux.vnet.ibm.com>
Reviewed-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
The new SIGP order Store Additional Status at Address is totally
handled by user space, but we should still record the occurrence
of this order in the kernel code.
Signed-off-by: Eric Farman <farman@linux.vnet.ibm.com>
Reviewed-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
A new exception type for vector instructions is introduced with
the new processor, but is handled exactly like a Data Exception
which is already handled by the system.
Signed-off-by: Eric Farman <farman@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Define and allocate space for both the host and guest views of
the vector registers for a given vcpu. The 32 vector registers
occupy 128 bits each (512 bytes total), but architecturally are
paired with 512 additional bytes of reserved space for future
expansion.
The kvm_sync_regs structs containing the registers are union'ed
with 1024 bytes of padding in the common kvm_run struct. The
addition of 1024 bytes of new register information clearly exceeds
the existing union, so an expansion of that padding is required.
When changing environments, we need to appropriately save and
restore the vector registers viewed by both the host and guest,
into and out of the sync_regs space.
The floating point registers overlay the upper half of vector
registers 0-15, so there's a bit of data duplication here that
needs to be carefully avoided.
Signed-off-by: Eric Farman <farman@linux.vnet.ibm.com>
Reviewed-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
The guest debug functions work on absolute addresses and should use the
read_guest_abs() function rather than general read_guest() that
works with logical addresses.
Cc: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Alexander Yarygin <yarygin@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
The function kvm_s390_vcpu_setup_model() now performs all cpu model realated
setup tasks for a vcpu. Besides cpuid and ibc initialization, facility list
assignment takes place during the setup step as well. The model setup has been
pulled to the begin of vcpu setup to allow kvm facility tests.
There is no need to protect the cpu model setup with a lock since the attributes
can't be changed anymore as soon the first vcpu is online.
Signed-off-by: Michael Mueller <mimu@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
The common s390 function insn_length() results in slightly smaller
(and thus hopefully faster) code than the calculation of the
instruction length via a lookup-table. So let's use that function
in the interrupt delivery code, too.
Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Reviewed-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
When the SIE exited by a DAT access exceptions which we can not
resolve, the guest tried to access a page which is out of bounds
and can not be paged-in. In this case we have to signal the bad
access by injecting an address exception. However, address exceptions
are either suppressing or terminating, i.e. the PSW has to point to
the next instruction when the exception is delivered. Since the
originating DAT access exception is nullifying, the PSW still
points to the offending instruction instead, so we've got to forward
the PSW to the next instruction.
Having fixed this issue, we can now also enable the TPROT
interpretation facility again which had been disabled because
of this problem.
Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
When certain program exceptions (e.g. DAT access exceptions) occur,
the current instruction has to be nullified, i.e. the old PSW that
gets written into the low-core has to point to the beginning of the
instruction again, and not to the beginning of the next instruction.
Thus we have to rewind the PSW before writing it into the low-core.
The list of nullifying exceptions can be found in the POP, chapter 6,
figure 6-1 ("Interruption Action").
Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Reviewed-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
The reinjection of an I/O interrupt can fail if the list is at the limit
and between the dequeue and the reinjection, another I/O interrupt is
injected (e.g. if user space floods kvm with I/O interrupts).
This patch avoids this memory leak and returns -EFAULT in this special
case. This error is not recoverable, so let's fail hard. This can later
be avoided by not dequeuing the interrupt but working directly on the
locked list.
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Cc: stable@vger.kernel.org # 3.16+
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
If the I/O interrupt could not be written to the guest provided
area (e.g. access exception), a program exception was injected into the
guest but "inti" wasn't freed, therefore resulting in a memory leak.
In addition, the I/O interrupt wasn't reinjected. Therefore the dequeued
interrupt is lost.
This patch fixes the problem while cleaning up the function and making the
cc and rc logic easier to handle.
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Cc: stable@vger.kernel.org # 3.16+
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
s390 documentation requires words 0 and 10-15 to be reserved and stored as
zeros. As we fill out all other fields, we can memset the full structure.
Signed-off-by: Ekaterina Tumanova <tumanova@linux.vnet.ibm.com>
Cc: stable@vger.kernel.org
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
With patch "include guest facilities in kvm facility test" it is no
longer necessary to have special handling for the non-LPAR case.
Signed-off-by: Michael Mueller <mimu@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Most facility related decisions in KVM have to take into account:
- the facilities offered by the underlying run container (LPAR/VM)
- the facilities supported by the KVM code itself
- the facilities requested by a guest VM
This patch adds the KVM driver requested facilities to the test routine.
It additionally renames struct s390_model_fac to kvm_s390_fac and its field
names to be more meaningful.
The semantics of the facilities stored in the KVM architecture structure
is changed. The address arch.model.fac->list now points to the guest
facility list and arch.model.fac->mask points to the KVM facility mask.
This patch fixes the behaviour of KVM for some facilities for guests
that ignore the guest visible facility bits, e.g. guests could use
transactional memory intructions on hosts supporting them even if the
chosen cpu model would not offer them.
The userspace interface is not affected by this change.
Signed-off-by: Michael Mueller <mimu@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
The facility lists were not fully copied.
Signed-off-by: Michael Mueller <mimu@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Under z/VM PQAP might trigger an operation exception if no crypto cards
are defined via APVIRTUAL or APDEDICATED.
[ 386.098666] Kernel BUG at 0000000000135c56 [verbose debug info unavailable]
[ 386.098693] illegal operation: 0001 ilc:2 [#1] SMP
[...]
[ 386.098751] Krnl PSW : 0704c00180000000 0000000000135c56 (kvm_s390_apxa_installed+0x46/0x98)
[...]
[ 386.098804] [<000000000013627c>] kvm_arch_init_vm+0x29c/0x358
[ 386.098806] [<000000000012d008>] kvm_dev_ioctl+0xc0/0x460
[ 386.098809] [<00000000002c639a>] do_vfs_ioctl+0x332/0x508
[ 386.098811] [<00000000002c660e>] SyS_ioctl+0x9e/0xb0
[ 386.098814] [<000000000070476a>] system_call+0xd6/0x258
[ 386.098815] [<000003fffc7400a2>] 0x3fffc7400a2
Lets add an extable entry and provide a zeroed config in that case.
Reported-by: Stefan Zimmermann <stzi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Tested-by: Stefan Zimmermann <stzi@linux.vnet.ibm.com>
z/VM and LPAR enable key wrapping by default, lets do the same on KVM.
Signed-off-by: Tony Krowiak <akrowiak@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Common: Optional support for adding a small amount of polling on each HLT
instruction executed in the guest (or equivalent for other architectures).
This can improve latency up to 50% on some scenarios (e.g. O_DSYNC writes
or TCP_RR netperf tests). This also has to be enabled manually for now,
but the plan is to auto-tune this in the future.
ARM/ARM64: the highlights are support for GICv3 emulation and dirty page
tracking
s390: several optimizations and bugfixes. Also a first: a feature
exposed by KVM (UUID and long guest name in /proc/sysinfo) before
it is available in IBM's hypervisor! :)
MIPS: Bugfixes.
x86: Support for PML (page modification logging, a new feature in
Broadwell Xeons that speeds up dirty page tracking), nested virtualization
improvements (nested APICv---a nice optimization), usual round of emulation
fixes. There is also a new option to reduce latency of the TSC deadline
timer in the guest; this needs to be tuned manually.
Some commits are common between this pull and Catalin's; I see you
have already included his tree.
ARM has other conflicts where functions are added in the same place
by 3.19-rc and 3.20 patches. These are not large though, and entirely
within KVM.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQEcBAABAgAGBQJU28rkAAoJEL/70l94x66DXqQH/1TDOfJIjW7P2kb0Sw7Fy1wi
cEX1KO/VFxAqc8R0E/0Wb55CXyPjQJM6xBXuFr5cUDaIjQ8ULSktL4pEwXyyv/s5
DBDkN65mriry2w5VuEaRLVcuX9Wy+tqLQXWNkEySfyb4uhZChWWHvKEcgw5SqCyg
NlpeHurYESIoNyov3jWqvBjr4OmaQENyv7t2c6q5ErIgG02V+iCux5QGbphM2IC9
LFtPKxoqhfeB2xFxTOIt8HJiXrZNwflsTejIlCl/NSEiDVLLxxHCxK2tWK/tUXMn
JfLD9ytXBWtNMwInvtFm4fPmDouv2VDyR0xnK2db+/axsJZnbxqjGu1um4Dqbak=
=7gdx
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM update from Paolo Bonzini:
"Fairly small update, but there are some interesting new features.
Common:
Optional support for adding a small amount of polling on each HLT
instruction executed in the guest (or equivalent for other
architectures). This can improve latency up to 50% on some
scenarios (e.g. O_DSYNC writes or TCP_RR netperf tests). This
also has to be enabled manually for now, but the plan is to
auto-tune this in the future.
ARM/ARM64:
The highlights are support for GICv3 emulation and dirty page
tracking
s390:
Several optimizations and bugfixes. Also a first: a feature
exposed by KVM (UUID and long guest name in /proc/sysinfo) before
it is available in IBM's hypervisor! :)
MIPS:
Bugfixes.
x86:
Support for PML (page modification logging, a new feature in
Broadwell Xeons that speeds up dirty page tracking), nested
virtualization improvements (nested APICv---a nice optimization),
usual round of emulation fixes.
There is also a new option to reduce latency of the TSC deadline
timer in the guest; this needs to be tuned manually.
Some commits are common between this pull and Catalin's; I see you
have already included his tree.
Powerpc:
Nothing yet.
The KVM/PPC changes will come in through the PPC maintainers,
because I haven't received them yet and I might end up being
offline for some part of next week"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (130 commits)
KVM: ia64: drop kvm.h from installed user headers
KVM: x86: fix build with !CONFIG_SMP
KVM: x86: emulate: correct page fault error code for NoWrite instructions
KVM: Disable compat ioctl for s390
KVM: s390: add cpu model support
KVM: s390: use facilities and cpu_id per KVM
KVM: s390/CPACF: Choose crypto control block format
s390/kernel: Update /proc/sysinfo file with Extended Name and UUID
KVM: s390: reenable LPP facility
KVM: s390: floating irqs: fix user triggerable endless loop
kvm: add halt_poll_ns module parameter
kvm: remove KVM_MMIO_SIZE
KVM: MIPS: Don't leak FPU/DSP to guest
KVM: MIPS: Disable HTW while in guest
KVM: nVMX: Enable nested posted interrupt processing
KVM: nVMX: Enable nested virtual interrupt delivery
KVM: nVMX: Enable nested apic register virtualization
KVM: nVMX: Make nested control MSRs per-cpu
KVM: nVMX: Enable nested virtualize x2apic mode
KVM: nVMX: Prepare for using hardware MSR bitmap
...
This patch enables cpu model support in kvm/s390 via the vm attribute
interface.
During KVM initialization, the host properties cpuid, IBC value and the
facility list are stored in the architecture specific cpu model structure.
During vcpu setup, these properties are taken to initialize the related SIE
state. This mechanism allows to adjust the properties from user space and thus
to implement different selectable cpu models.
This patch uses the IBC functionality to block instructions that have not
been implemented at the requested CPU type and GA level compared to the
full host capability.
Userspace has to initialize the cpu model before vcpu creation. A cpu model
change of running vcpus is not possible.
Signed-off-by: Michael Mueller <mimu@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
The patch introduces facilities and cpu_ids per virtual machine.
Different virtual machines may want to expose different facilities and
cpu ids to the guest, so let's make them per-vm instead of global.
Signed-off-by: Michael Mueller <mimu@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>