Commit Graph

22284 Commits

Author SHA1 Message Date
Linus Torvalds 0e1dbccd8f Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "Two families of fixes:

   - Fix an FPU context related boot crash on newer x86 hardware with
     larger context sizes than what most people test.  To fix this
     without ugly kludges or extensive reverts we had to touch core task
     allocator, to allow x86 to determine the task size dynamically, at
     boot time.

     I've tested it on a number of x86 platforms, and I cross-built it
     to a handful of architectures:

                                        (warns)               (warns)
       testing     x86-64:  -git:  pass (    0),  -tip:  pass (    0)
       testing     x86-32:  -git:  pass (    0),  -tip:  pass (    0)
       testing        arm:  -git:  pass ( 1359),  -tip:  pass ( 1359)
       testing       cris:  -git:  pass ( 1031),  -tip:  pass ( 1031)
       testing       m32r:  -git:  pass ( 1135),  -tip:  pass ( 1135)
       testing       m68k:  -git:  pass ( 1471),  -tip:  pass ( 1471)
       testing       mips:  -git:  pass ( 1162),  -tip:  pass ( 1162)
       testing    mn10300:  -git:  pass ( 1058),  -tip:  pass ( 1058)
       testing     parisc:  -git:  pass ( 1846),  -tip:  pass ( 1846)
       testing      sparc:  -git:  pass ( 1185),  -tip:  pass ( 1185)

     ... so I hope the cross-arch impact 'none', as intended.

     (by Dave Hansen)

   - Fix various NMI handling related bugs unearthed by the big asm code
     rewrite and generally make the NMI code more robust and more
     maintainable while at it.  These changes are a bit late in the
     cycle, I hope they are still acceptable.

     (by Andy Lutomirski)"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/fpu, sched: Introduce CONFIG_ARCH_WANTS_DYNAMIC_TASK_STRUCT and use it on x86
  x86/fpu, sched: Dynamically allocate 'struct fpu'
  x86/entry/64, x86/nmi/64: Add CONFIG_DEBUG_ENTRY NMI testing code
  x86/nmi/64: Make the "NMI executing" variable more consistent
  x86/nmi/64: Minor asm simplification
  x86/nmi/64: Use DF to avoid userspace RSP confusing nested NMI detection
  x86/nmi/64: Reorder nested NMI checks
  x86/nmi/64: Improve nested NMI comments
  x86/nmi/64: Switch stacks on userspace NMI entry
  x86/nmi/64: Remove asm code that saves CR2
  x86/nmi: Enable nested do_nmi() handling for 64-bit kernels
2015-07-18 10:49:57 -07:00
Linus Torvalds f79a17bf26 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "Mostly tooling fixes, plus a static key fix fixing /sys/devices/cpu/rdpmc"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf tools: Really allow to specify custom CC, AR or LD
  perf auxtrace: Fix misplaced check for HAVE_SYNC_COMPARE_AND_SWAP_SUPPORT
  perf hists browser: Take the --comm, --dsos, etc filters into account
  perf symbols: Store if there is a filter in place
  x86, perf: Fix static_key bug in load_mm_cr4()
  tools: Copy lib/hweight.c from the kernel sources
  perf tools: Fix the detached tarball wrt rbtree copy
  perf thread_map: Fix the sizeof() calculation for map entries
  tools lib: Improve clean target
  perf stat: Fix shadow declaration of close
  perf tools: Fix lockup using 32-bit compat vdso
2015-07-18 10:44:21 -07:00
Linus Torvalds 59ee762156 Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Ingo Molnar:
 "Misc irq fixes:

   - two driver fixes
   - a Xen regression fix
   - a nested irq thread crash fix"

* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irqchip/gicv3-its: Fix mapping of LPIs to collections
  genirq: Prevent resend to interrupts marked IRQ_NESTED_THREAD
  genirq: Revert sparse irq locking around __cpu_up() and move it to x86 for now
  gpio/davinci: Fix race in installing chained irq handler
2015-07-18 10:27:12 -07:00
Ingo Molnar 5aaeb5c01c x86/fpu, sched: Introduce CONFIG_ARCH_WANTS_DYNAMIC_TASK_STRUCT and use it on x86
Don't burden architectures without dynamic task_struct sizing
with the overhead of dynamic sizing.

Also optimize the x86 code a bit by caching task_struct_size.

Acked-and-Tested-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1437128892-9831-3-git-send-email-mingo@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-18 03:42:51 +02:00
Dave Hansen 0c8c0f03e3 x86/fpu, sched: Dynamically allocate 'struct fpu'
The FPU rewrite removed the dynamic allocations of 'struct fpu'.
But, this potentially wastes massive amounts of memory (2k per
task on systems that do not have AVX-512 for instance).

Instead of having a separate slab, this patch just appends the
space that we need to the 'task_struct' which we dynamically
allocate already.  This saves from doing an extra slab
allocation at fork().

The only real downside here is that we have to stick everything
and the end of the task_struct.  But, I think the
BUILD_BUG_ON()s I stuck in there should keep that from being too
fragile.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1437128892-9831-2-git-send-email-mingo@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-18 03:42:35 +02:00
Laurent Dufour f2abeef9fd mm: clean up per architecture MM hook header files
Commit 2ae416b142 ("mm: new mm hook framework") introduced an empty
header file (mm-arch-hooks.h) for every architecture, even those which
doesn't need to define mm hooks.

As suggested by Geert Uytterhoeven, this could be cleaned through the use
of a generic header file included via each per architecture
asm/include/Kbuild file.

The PowerPC architecture is not impacted here since this architecture has
to defined the arch_remap MM hook.

Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Suggested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Vineet Gupta <vgupta@synopsys.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-07-17 16:39:53 -07:00
Andy Lutomirski a97439aa1a x86/entry/64, x86/nmi/64: Add CONFIG_DEBUG_ENTRY NMI testing code
It turns out to be rather tedious to test the NMI nesting code.
Make it easier: add a new CONFIG_DEBUG_ENTRY option that causes
the NMI handler to pre-emptively unmask NMIs.

With this option set, errors in the repeat_nmi logic or failures
to detect that we're in a nested NMI will result in quick panics
under perf (especially if multiple counters are running at high
frequency) instead of requiring an unusual workload that
generates page faults or breakpoints inside NMIs.

I called it CONFIG_DEBUG_ENTRY instead of CONFIG_DEBUG_NMI_ENTRY
because I want to add new non-NMI checks elsewhere in the entry
code in the future, and I'd rather not add too many new config
options or add this option and then immediately rename it.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-17 12:50:14 +02:00
Andy Lutomirski 36f1a77b3a x86/nmi/64: Make the "NMI executing" variable more consistent
Currently, "NMI executing" is one the first time an outermost
NMI hits repeat_nmi and zero thereafter.  Change it to be zero
each time for consistency.

This is intended to help NMI handling fail harder if it's buggy.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-17 12:50:13 +02:00
Andy Lutomirski 23a781e987 x86/nmi/64: Minor asm simplification
Replace LEA; MOV with an equivalent SUB.  This saves one
instruction.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-17 12:50:13 +02:00
Andy Lutomirski 810bc075f7 x86/nmi/64: Use DF to avoid userspace RSP confusing nested NMI detection
We have a tricky bug in the nested NMI code: if we see RSP
pointing to the NMI stack on NMI entry from kernel mode, we
assume that we are executing a nested NMI.

This isn't quite true.  A malicious userspace program can point
RSP at the NMI stack, issue SYSCALL, and arrange for an NMI to
happen while RSP is still pointing at the NMI stack.

Fix it with a sneaky trick.  Set DF in the region of code that
the RSP check is intended to detect.  IRET will clear DF
atomically.

( Note: other than paravirt, there's little need for all this
  complexity. We could check RIP instead of RSP. )

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-17 12:50:12 +02:00
Andy Lutomirski a27507ca2d x86/nmi/64: Reorder nested NMI checks
Check the repeat_nmi .. end_repeat_nmi special case first.  The
next patch will rework the RSP check and, as a side effect, the
RSP check will no longer detect repeat_nmi .. end_repeat_nmi, so
we'll need this ordering of the checks.

Note: this is more subtle than it appears.  The check for
repeat_nmi .. end_repeat_nmi jumps straight out of the NMI code
instead of adjusting the "iret" frame to force a repeat.  This
is necessary, because the code between repeat_nmi and
end_repeat_nmi sets "NMI executing" and then writes to the
"iret" frame itself.  If a nested NMI comes in and modifies the
"iret" frame while repeat_nmi is also modifying it, we'll end up
with garbage.  The old code got this right, as does the new
code, but the new code is a bit more explicit.

If we were to move the check right after the "NMI executing"
check, then we'd get it wrong and have random crashes.

( Because the "NMI executing" check would jump to the code that would
  modify the "iret" frame without checking if the interrupted NMI was
  currently modifying it. )

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-17 12:50:12 +02:00
Andy Lutomirski 0b22930eba x86/nmi/64: Improve nested NMI comments
I found the nested NMI documentation to be difficult to follow.
Improve the comments.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-17 12:50:11 +02:00
Andy Lutomirski 9b6e6a8334 x86/nmi/64: Switch stacks on userspace NMI entry
Returning to userspace is tricky: IRET can fail, and ESPFIX can
rearrange the stack prior to IRET.

The NMI nesting fixup relies on a precise stack layout and
atomic IRET.  Rather than trying to teach the NMI nesting fixup
to handle ESPFIX and failed IRET, punt: run NMIs that came from
user mode on the normal kernel stack.

This will make some nested NMIs visible to C code, but the C
code is okay with that.

As a side effect, this should speed up perf: it eliminates an
RDMSR when NMIs come from user mode.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-17 12:50:11 +02:00
Andy Lutomirski 0e181bb581 x86/nmi/64: Remove asm code that saves CR2
Now that do_nmi saves CR2, we don't need to save it in asm.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Acked-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-17 12:50:11 +02:00
Andy Lutomirski 9d05041679 x86/nmi: Enable nested do_nmi() handling for 64-bit kernels
32-bit kernels handle nested NMIs in C.  Enable the exact same
handling on 64-bit kernels as well.  This isn't currently
necessary, but it will become necessary once the asm code starts
allowing limited nesting.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-17 12:50:10 +02:00
Linus Torvalds 3e87ee06d0 platform-drivers-x86 for 4.2-3
Fix SMBIOS call handling and hwswitch state coherency in the dell-laptop driver.
 Cleanups for intel_*_ipc drivers.
 
 dell-laptop:
  - Do not cache hwswitch state
  - Check return value of each SMBIOS call
  - Clear buffer before each SMBIOS call
 
 intel_scu_ipc:
  - Move local memory initialization out of a mutex
 
 intel_pmc_ipc:
  - Update kerneldoc formatting
  - Fix compiler casting warnings
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJVqCrgAAoJEKbMaAwKp364aBcIALo/ZB6JFFd3oFDBbZR9bzvp
 senrgC2QSWboFlyJ2aHB09n98m6tR5x8HTE6BijT64bUyPSLTPgDZoeC9ezIu1H0
 rXKJZM7GduxYVOvVgOPVKqt/yUopI55jDhpgvFmxpXgp9zaz4our2y+93VCCBkIm
 9nJMHXIvK+Rg4Rg0MuEkaghLRFivJAYFuyFu6vgWQOGap1QXruPIylK6agZs2E9x
 KhJAlLNjoAAfqFFkWdk7PxMO8QIgV9pLU8RlOQmUdRSe8F+CI3AAJjdn+FdPoXFN
 EBirxMm8NAd9+/JlfU95fUBwPnofY+D3Q8jUyKBBxnZbDQMIA6gWtzGaA/BY/zI=
 =hpkC
 -----END PGP SIGNATURE-----

Merge tag 'platform-drivers-x86-v4.2-3' of git://git.infradead.org/users/dvhart/linux-platform-drivers-x86

Pull x86 platform driver fixes from Darren Hart:
 "Fix SMBIOS call handling and hwswitch state coherency in the
  dell-laptop driver.  Cleanups for intel_*_ipc drivers.  Details:

  dell-laptop:
   - Do not cache hwswitch state
   - Check return value of each SMBIOS call
   - Clear buffer before each SMBIOS call

  intel_scu_ipc:
   - Move local memory initialization out of a mutex

  intel_pmc_ipc:
   - Update kerneldoc formatting
   - Fix compiler casting warnings"

* tag 'platform-drivers-x86-v4.2-3' of git://git.infradead.org/users/dvhart/linux-platform-drivers-x86:
  intel_scu_ipc: move local memory initialization out of a mutex
  intel_pmc_ipc: Update kerneldoc formatting
  dell-laptop: Do not cache hwswitch state
  dell-laptop: Check return value of each SMBIOS call
  dell-laptop: Clear buffer before each SMBIOS call
  intel_pmc_ipc: Fix compiler casting warnings
2015-07-16 20:57:25 -07:00
Linus Torvalds df14a68d63 * Fix FPU refactoring ("kvm: x86: fix load xsave feature warning")
* Fix eager FPU mode (Cc stable).
 * AMD bits of MTRR virtualization.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQEcBAABCAAGBQJVpOA0AAoJEL/70l94x66D59QH/R68oIbcC9eOrs3LgxFwFS/g
 uPY2owxr1MFAwI39S3zpSXXuLdB63E9G6EzP9lQO6UnSXgcNitnpEINEXpCpH7hS
 5DSw/0gPPjXlwyio7wM0EiXC+YO4kgzhLaBmeK5qPQmfvCz5bvLgelrY39T4TH/u
 B7gZN/uuaMfU8xNchCiU9Bx+KzaQhgpUTSOH/j8n2Obe9J/AyUqsSFeXzsN9hlVp
 5YcBOyzthDjphbeyfxas1nTNinO+tfvO5fDNKbqvRgOnm+/wubhXAikDs4u60eXp
 j6TUVKYJRJe9yh5jFB/HOA/0h6DDyoK+t7LSYRPRjONVFoxrnogvA+OQCIX/57Y=
 =vDV4
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM fixes from Paolo Bonzini:

 - Fix FPU refactoring ("kvm: x86: fix load xsave feature warning")

 - Fix eager FPU mode (Cc stable)

 - AMD bits of MTRR virtualization

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  kvm: x86: fix load xsave feature warning
  KVM: x86: apply guest MTRR virtualization on host reserved pages
  KVM: SVM: Sync g_pat with guest-written PAT value
  KVM: SVM: use NPT page attributes
  KVM: count number of assigned devices
  KVM: VMX: fix vmwrite to invalid VMCS
  KVM: x86: reintroduce kvm_is_mmio_pfn
  x86: hyperv: add CPUID bit for crash handlers
2015-07-15 13:30:12 -07:00
Thomas Gleixner ce0d3c0a6f genirq: Revert sparse irq locking around __cpu_up() and move it to x86 for now
Boris reported that the sparse_irq protection around __cpu_up() in the
generic code causes a regression on Xen. Xen allocates interrupts and
some more in the xen_cpu_up() function, so it deadlocks on the
sparse_irq_lock.

There is no simple fix for this and we really should have the
protection for all architectures, but for now the only solution is to
move it to x86 where actual wreckage due to the lack of protection has
been observed.

Reported-and-tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Fixes: a899418167 'hotplug: Prevent alloc/free of irq descriptors during cpu up/down'
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: xiao jin <jin.xiao@intel.com>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Yanmin Zhang <yanmin_zhang@linux.intel.com>
Cc: xen-devel <xen-devel@lists.xenproject.org>
2015-07-15 10:39:17 +02:00
Linus Torvalds 1daa1cfb7a Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:

 - the high latency PIT detection fix, which slipped through the cracks
   for rc1

 - a regression fix for the early printk mechanism

 - the x86 part to plug irq/vector related hotplug races

 - move the allocation of the espfix pages on cpu hotplug to non atomic
   context.  The current code triggers a might_sleep() warning.

 - a series of KASAN fixes addressing boot crashes and usability

 - a trivial typo fix for Kconfig help text

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/kconfig: Fix typo in the CONFIG_CMDLINE_BOOL help text
  x86/irq: Retrieve irq data after locking irq_desc
  x86/irq: Use proper locking in check_irq_vectors_for_cpu_disable()
  x86/irq: Plug irq vector hotplug race
  x86/earlyprintk: Allow early_printk() to use console style parameters like '115200n8'
  x86/espfix: Init espfix on the boot CPU side
  x86/espfix: Add 'cpu' parameter to init_espfix_ap()
  x86/kasan: Move KASAN_SHADOW_OFFSET to the arch Kconfig
  x86/kasan: Add message about KASAN being initialized
  x86/kasan: Fix boot crash on AMD processors
  x86/kasan: Flush TLBs after switching CR3
  x86/kasan: Fix KASAN shadow region page tables
  x86/init: Clear 'init_level4_pgt' earlier
  x86/tsc: Let high latency PIT fail fast in quick_pit_calibrate()
2015-07-12 10:02:38 -07:00
Wanpeng Li ee4100da16 kvm: x86: fix load xsave feature warning
[   68.196974] WARNING: CPU: 1 PID: 2140 at arch/x86/kvm/x86.c:3161 kvm_arch_vcpu_ioctl+0xe88/0x1340 [kvm]()
[   68.196975] Modules linked in: snd_hda_codec_hdmi i915 rfcomm bnep bluetooth i2c_algo_bit rfkill nfsd drm_kms_helper nfs_acl nfs drm lockd grace sunrpc fscache snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_pcm snd_seq_dummy snd_seq_oss x86_pkg_temp_thermal snd_seq_midi kvm_intel snd_seq_midi_event snd_rawmidi kvm snd_seq ghash_clmulni_intel fuse snd_timer aesni_intel parport_pc ablk_helper snd_seq_device cryptd ppdev snd lp parport lrw dcdbas gf128mul i2c_core glue_helper lpc_ich video shpchp mfd_core soundcore serio_raw acpi_cpufreq ext4 mbcache jbd2 sd_mod crc32c_intel ahci libahci libata e1000e ptp pps_core
[   68.197005] CPU: 1 PID: 2140 Comm: qemu-system-x86 Not tainted 4.2.0-rc1+ #2
[   68.197006] Hardware name: Dell Inc. OptiPlex 7020/0F5C5X, BIOS A03 01/08/2015
[   68.197007]  ffffffffa03b0657 ffff8800d984bca8 ffffffff815915a2 0000000000000000
[   68.197009]  0000000000000000 ffff8800d984bce8 ffffffff81057c0a 00007ff6d0001000
[   68.197010]  0000000000000002 ffff880211c1a000 0000000000000004 ffff8800ce0288c0
[   68.197012] Call Trace:
[   68.197017]  [<ffffffff815915a2>] dump_stack+0x45/0x57
[   68.197020]  [<ffffffff81057c0a>] warn_slowpath_common+0x8a/0xc0
[   68.197022]  [<ffffffff81057cfa>] warn_slowpath_null+0x1a/0x20
[   68.197029]  [<ffffffffa037bed8>] kvm_arch_vcpu_ioctl+0xe88/0x1340 [kvm]
[   68.197035]  [<ffffffffa037aede>] ? kvm_arch_vcpu_load+0x4e/0x1c0 [kvm]
[   68.197040]  [<ffffffffa03696a6>] kvm_vcpu_ioctl+0xc6/0x5c0 [kvm]
[   68.197043]  [<ffffffff811252d2>] ? perf_pmu_enable+0x22/0x30
[   68.197044]  [<ffffffff8112663e>] ? perf_event_context_sched_in+0x7e/0xb0
[   68.197048]  [<ffffffff811a6882>] do_vfs_ioctl+0x2c2/0x4a0
[   68.197050]  [<ffffffff8107bf33>] ? finish_task_switch+0x173/0x220
[   68.197053]  [<ffffffff8123307f>] ? selinux_file_ioctl+0x4f/0xd0
[   68.197055]  [<ffffffff8122cac3>] ? security_file_ioctl+0x43/0x60
[   68.197057]  [<ffffffff811a6ad9>] SyS_ioctl+0x79/0x90
[   68.197060]  [<ffffffff81597e57>] entry_SYSCALL_64_fastpath+0x12/0x6a
[   68.197061] ---[ end trace 558a5ebf9445fc80 ]---

After commit (0c4109bec0 'x86/fpu/xstate: Fix up bad get_xsave_addr()
assumptions'), there is no assumption an xsave bit is present in the
hardware (pcntxt_mask) that it is always present in a given xsave buffer.
An enabled state to be present on 'pcntxt_mask', but *not* in 'xstate_bv'
could happen when the last 'xsave' did not request that this feature be
saved (unlikely) or because the "init optimization" caused it to not be
saved. This patch kill the assumption.

Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-07-10 13:26:45 +02:00
Paolo Bonzini fd717f1101 KVM: x86: apply guest MTRR virtualization on host reserved pages
Currently guest MTRR is avoided if kvm_is_reserved_pfn returns true.
However, the guest could prefer a different page type than UC for
such pages. A good example is that pass-throughed VGA frame buffer is
not always UC as host expected.

This patch enables full use of virtual guest MTRRs.

Suggested-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Tested-by: Joerg Roedel <jroedel@suse.de> (on AMD)
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-07-10 13:25:27 +02:00
Jan Kiszka e098223b78 KVM: SVM: Sync g_pat with guest-written PAT value
When hardware supports the g_pat VMCB field, we can use it for emulating
the PAT configuration that the guest configures by writing to the
corresponding MSR.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Tested-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-07-10 13:25:27 +02:00
Paolo Bonzini 3c2e7f7de3 KVM: SVM: use NPT page attributes
Right now, NPT page attributes are not used, and the final page
attribute depends solely on gPAT (which however is not synced
correctly), the guest MTRRs and the guest page attributes.

However, we can do better by mimicking what is done for VMX.
In the absence of PCI passthrough, the guest PAT can be ignored
and the page attributes can be just WB.  If passthrough is being
used, instead, keep respecting the guest PAT, and emulate the guest
MTRRs through the PAT field of the nested page tables.

The only snag is that WP memory cannot be emulated correctly,
because Linux's default PAT setting only includes the other types.

Tested-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-07-10 13:25:26 +02:00
Paolo Bonzini 5544eb9b81 KVM: count number of assigned devices
If there are no assigned devices, the guest PAT are not providing
any useful information and can be overridden to writeback; VMX
always does this because it has the "IPAT" bit in its extended
page table entries, but SVM does not have anything similar.
Hook into VFIO and legacy device assignment so that they
provide this information to KVM.

Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Tested-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-07-10 13:25:26 +02:00
Radim Krčmář 370777daab KVM: VMX: fix vmwrite to invalid VMCS
fpu_activate is called outside of vcpu_load(), which means it should not
touch VMCS, but fpu_activate needs to.  Avoid the call by moving it to a
point where we know that the guest needs eager FPU and VMCS is loaded.

This will get rid of the following trace

 vmwrite error: reg 6800 value 0 (err 1)
  [<ffffffff8162035b>] dump_stack+0x19/0x1b
  [<ffffffffa046c701>] vmwrite_error+0x2c/0x2e [kvm_intel]
  [<ffffffffa045f26f>] vmcs_writel+0x1f/0x30 [kvm_intel]
  [<ffffffffa04617e5>] vmx_fpu_activate.part.61+0x45/0xb0 [kvm_intel]
  [<ffffffffa0461865>] vmx_fpu_activate+0x15/0x20 [kvm_intel]
  [<ffffffffa0560b91>] kvm_arch_vcpu_create+0x51/0x70 [kvm]
  [<ffffffffa0548011>] kvm_vm_ioctl+0x1c1/0x760 [kvm]
  [<ffffffff8118b55a>] ? handle_mm_fault+0x49a/0xec0
  [<ffffffff811e47d5>] do_vfs_ioctl+0x2e5/0x4c0
  [<ffffffff8127abbe>] ? file_has_perm+0xae/0xc0
  [<ffffffff811e4a51>] SyS_ioctl+0xa1/0xc0
  [<ffffffff81630949>] system_call_fastpath+0x16/0x1b

(Note: we also unconditionally activate FPU in vmx_vcpu_reset(), so the
 removed code added nothing.)

Fixes: c447e76b4c ("kvm/fpu: Enable eager restore kvm FPU for MPX")
Cc: <stable@vger.kernel.org>
Reported-by: Vlastimil Holer <vlastimil.holer@gmail.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-07-10 13:25:25 +02:00
Paolo Bonzini d1fe921955 KVM: x86: reintroduce kvm_is_mmio_pfn
The call to get_mt_mask was really using kvm_is_reserved_pfn to
detect an MMIO-backed page.  In this case, we want "false" to be
returned for the zero page.

Reintroduce a separate kvm_is_mmio_pfn predicate for this use
only.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-07-10 13:25:24 +02:00
Paolo Bonzini 5d75a74759 x86: hyperv: add CPUID bit for crash handlers
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-07-10 13:25:19 +02:00
Peter Zijlstra a833581e37 x86, perf: Fix static_key bug in load_mm_cr4()
Mikulas reported his K6-3 not booting. This is because the
static_key API confusion struck and bit Andy, this wants to be
static_key_false().

Reported-by: Mikulas Patocka <mpatocka@redhat.com>
Tested-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <stable@vger.kernel.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
Cc: Vince Weaver <vince@deater.net>
Cc: hillf.zj <hillf.zj@alibaba-inc.com>
Fixes: a66734297f ("perf/x86: Add /sys/devices/cpu/rdpmc=2 to allow rdpmc for all tasks")
Link: http://lkml.kernel.org/r/20150709172338.GC19282@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-10 10:24:38 +02:00
qipeng.zha 02941007f5 intel_pmc_ipc: Update kerneldoc formatting
Update kerneldoc formatting per Documentation/kernel-dec-nano-HOWTO.txt.

Signed-off-by: qipeng.zha <qipeng.zha@intel.com>
Signed-off-by: Darren Hart <dvhart@linux.intel.com>
2015-07-09 11:23:15 -07:00
Sébastien Hinderer 69711ca19b x86/kconfig: Fix typo in the CONFIG_CMDLINE_BOOL help text
Signed-off-by: Sébastien Hinderer <Sebastien.Hinderer@ens-lyon.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Samuel Thibault <Samuel.Thibault@ens-lyon.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-08 11:10:56 +02:00
Thomas Gleixner 09cf92b784 x86/irq: Retrieve irq data after locking irq_desc
irq_data is protected by irq_desc->lock, so retrieving the irq chip
from irq_data outside the lock is racy vs. an concurrent update. Move
it into the lock held region.

While at it add a comment why the vector walk does not require
vector_lock.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: xiao jin <jin.xiao@intel.com>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Yanmin Zhang <yanmin_zhang@linux.intel.com>
Link: http://lkml.kernel.org/r/20150705171102.331320612@linutronix.de
2015-07-07 11:54:04 +02:00
Thomas Gleixner cbb24dc761 x86/irq: Use proper locking in check_irq_vectors_for_cpu_disable()
It's unsafe to examine fields in the irq descriptor w/o holding the
descriptor lock. Add proper locking.

While at it add a comment why the vector check can run lock less

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: xiao jin <jin.xiao@intel.com>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Yanmin Zhang <yanmin_zhang@linux.intel.com>
Link: http://lkml.kernel.org/r/20150705171102.236544164@linutronix.de
2015-07-07 11:54:04 +02:00
Thomas Gleixner 5a3f75e3f0 x86/irq: Plug irq vector hotplug race
Jin debugged a nasty cpu hotplug race which results in leaking a irq
vector on the newly hotplugged cpu.

cpu N				cpu M
native_cpu_up                   device_shutdown
  do_boot_cpu			  free_msi_irqs
  start_secondary                   arch_teardown_msi_irqs
    smp_callin                        default_teardown_msi_irqs
       setup_vector_irq                  arch_teardown_msi_irq
        __setup_vector_irq		   native_teardown_msi_irq
          lock(vector_lock)		     destroy_irq 
          install vectors
          unlock(vector_lock)
					       lock(vector_lock)
--->                                  	       __clear_irq_vector
                                    	       unlock(vector_lock)
    lock(vector_lock)
    set_cpu_online
    unlock(vector_lock)

This leaves the irq vector(s) which are torn down on CPU M stale in
the vector array of CPU N, because CPU M does not see CPU N online
yet. There is a similar issue with concurrent newly setup interrupts.

The alloc/free protection of irq descriptors does not prevent the
above race, because it merily prevents interrupt descriptors from
going away or changing concurrently.

Prevent this by moving the call to setup_vector_irq() into the
vector_lock held region which protects set_cpu_online():

cpu N				cpu M
native_cpu_up                   device_shutdown
  do_boot_cpu			  free_msi_irqs
  start_secondary                   arch_teardown_msi_irqs
    smp_callin                        default_teardown_msi_irqs
       lock(vector_lock)                arch_teardown_msi_irq
       setup_vector_irq()
        __setup_vector_irq		   native_teardown_msi_irq
          install vectors		     destroy_irq 
       set_cpu_online
       unlock(vector_lock)
					       lock(vector_lock)
                                  	       __clear_irq_vector
                                    	       unlock(vector_lock)

So cpu M either sees the cpu N online before clearing the vector or
cpu N installs the vectors after cpu M has cleared it.

Reported-by: xiao jin <jin.xiao@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Yanmin Zhang <yanmin_zhang@linux.intel.com>
Link: http://lkml.kernel.org/r/20150705171102.141898931@linutronix.de
2015-07-07 11:54:04 +02:00
Steven Rostedt 827a82ff39 x86/earlyprintk: Allow early_printk() to use console style parameters like '115200n8'
When I enable early_printk on a kernel, I cut and paste the
console= input and add to earlyprintk parameter. But I notice
recently that ktest has not been detecting triple faults. The
way it detects it, is by seeing the kernel banner "Linux version
.." with a different kernel version pop up. Then I noticed that
early printk was no longer working on my console, which was why
ktest was not seeing it.

I bisected it down and it was added to 4.0 with this commit:

  ea9e9d8029 ("Specify PCI based UART for earlyprintk")

because it converted the simple_strtoul() that converts the baud
number into a kstrtoul(). The problem with this is, I had as my
baud rate, 115200n8 (acceptable for console=ttyS0), but because
of the "n8", the kstrtoul() doesn't parse the baud rate and
returns an error, which sets the baud rate to the default 9600.
This explains the garbage on my screen.

Now, earlyprintk= kernel parameter does not say it accepts that
format. Thus, one answer would simply be me changing my kernel
parameters to remove the "n8" since it isn't parsed anyway. But
I wonder if other people run into this, and it seems strange
that the two consoles for serial accepts different input.

I could also extend this to have earlyprintk do something with
that "n8" or whatever it has and have it match the console
parsing (which, BTW, still uses simple_strtoul(), as I guess it
has to).

This patch just makes my old kernel parameter parsing work like
it use to.

Although, simple_strtoull() is considered obsolete, it is the
only standard string parsing function that parses a number that
is attached to text. Ironically, commit ea9e9d8029 also added
several calls to simple_strtoul()!

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Alan Cox <alan@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Cohen <david.a.cohen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stuart R. Anderson <stuart.r.anderson@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20150706101434.5f6a351b@gandalf.local.home
[ Cleaned it up a bit. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-06 17:33:47 +02:00
Zhu Guihua 20d5e4a9cd x86/espfix: Init espfix on the boot CPU side
As we alloc pages with GFP_KERNEL in init_espfix_ap() which is
called before we enable local irqs, so the lockdep sub-system
would (correctly) trigger a warning about the potentially
blocking API.

So we allocate them on the boot CPU side when the secondary CPU is
brought up by the boot CPU, and hand them over to the secondary
CPU.

And we use alloc_pages_node() with the secondary CPU's node, to
make sure the espfix stack is NUMA-local to the CPU that is
going to use it.

Signed-off-by: Zhu Guihua <zhugh.fnst@cn.fujitsu.com>
Cc: <bp@alien8.de>
Cc: <luto@amacapital.net>
Cc: <luto@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/c97add2670e9abebb90095369f0cfc172373ac94.1435824469.git.zhugh.fnst@cn.fujitsu.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-06 15:00:34 +02:00
Zhu Guihua 1db875631f x86/espfix: Add 'cpu' parameter to init_espfix_ap()
Add a CPU index parameter to init_espfix_ap(), so that the
parameter could be propagated to the function for espfix
page allocation.

Signed-off-by: Zhu Guihua <zhugh.fnst@cn.fujitsu.com>
Cc: <bp@alien8.de>
Cc: <luto@amacapital.net>
Cc: <luto@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/cde3fcf1b3211f3f03feb1a995bce3fee850f0fc.1435824469.git.zhugh.fnst@cn.fujitsu.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-06 15:00:33 +02:00
Andrey Ryabinin d6f2d75a7a x86/kasan: Move KASAN_SHADOW_OFFSET to the arch Kconfig
KASAN_SHADOW_OFFSET is purely arch specific setting,
so it should be in arch's Kconfig file.

Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com>
Cc: Alexander Popov <alpopov@ptsecurity.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Konovalov <adech.fo@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul Bolle <pebolle@tiscali.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1435828178-10975-7-git-send-email-a.ryabinin@samsung.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-06 14:53:15 +02:00
Andrey Ryabinin 8515522949 x86/kasan: Add message about KASAN being initialized
Print informational message to tell user that kernel
runs with KASAN enabled.

Add a "kasan: " prefix to all messages in kasan_init_64.c.

Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com>
Cc: Alexander Popov <alpopov@ptsecurity.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Konovalov <adech.fo@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1435828178-10975-6-git-send-email-a.ryabinin@samsung.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-06 14:53:14 +02:00
Andrey Ryabinin d4f86beacc x86/kasan: Fix boot crash on AMD processors
While populating zero shadow wrong bits in upper level page
tables used. __PAGE_KERNEL_RO that was used for pgd/pud/pmd has
_PAGE_BIT_GLOBAL set. Global bit is present only in the lowest
level of the page translation hierarchy (ptes), and it should be
zero in upper levels.

This bug seems doesn't cause any troubles on Intel cpus, while
on AMDs it cause kernel crash on boot.

Use _KERNPG_TABLE bits for pgds/puds/pmds to fix this.

Reported-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com>
Cc: <stable@vger.kernel.org> # 4.0+
Cc: Alexander Popov <alpopov@ptsecurity.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Konovalov <adech.fo@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1435828178-10975-5-git-send-email-a.ryabinin@samsung.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-06 14:53:14 +02:00
Andrey Ryabinin 241d2c54c6 x86/kasan: Flush TLBs after switching CR3
load_cr3() doesn't cause tlb_flush if PGE enabled.

This may cause tons of false positive reports spamming the
kernel to death.

To fix this __flush_tlb_all() should be called explicitly
after CR3 changed.

Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com>
Cc: <stable@vger.kernel.org> # 4.0+
Cc: Alexander Popov <alpopov@ptsecurity.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Konovalov <adech.fo@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1435828178-10975-4-git-send-email-a.ryabinin@samsung.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-06 14:53:14 +02:00
Alexander Popov 5d5aa3cfca x86/kasan: Fix KASAN shadow region page tables
Currently KASAN shadow region page tables created without
respect of physical offset (phys_base). This causes kernel halt
when phys_base is not zero.

So let's initialize KASAN shadow region page tables in
kasan_early_init() using __pa_nodebug() which considers
phys_base.

This patch also separates x86_64_start_kernel() from KASAN low
level details by moving kasan_map_early_shadow(init_level4_pgt)
into kasan_early_init().

Remove the comment before clear_bss() which stopped bringing
much profit to the code readability. Otherwise describing all
the new order dependencies would be too verbose.

Signed-off-by: Alexander Popov <alpopov@ptsecurity.com>
Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com>
Cc: <stable@vger.kernel.org> # 4.0+
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Konovalov <adech.fo@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1435828178-10975-3-git-send-email-a.ryabinin@samsung.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-06 14:53:13 +02:00
Andrey Ryabinin d0f77d4d04 x86/init: Clear 'init_level4_pgt' earlier
Currently x86_64_start_kernel() has two KASAN related
function calls. The first call maps shadow to early_level4_pgt,
the second maps shadow to init_level4_pgt.

If we move clear_page(init_level4_pgt) earlier, we could hide
KASAN low level detail from generic x86_64 initialization code.
The next patch will do it.

Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com>
Cc: <stable@vger.kernel.org> # 4.0+
Cc: Alexander Popov <alpopov@ptsecurity.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Konovalov <adech.fo@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1435828178-10975-2-git-send-email-a.ryabinin@samsung.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-06 14:53:13 +02:00
Yann Droneaud ebf2d2689d perf/x86: Fix copy_from_user_nmi() return if range is not ok
Commit 0a196848ca ("perf: Fix arch_perf_out_copy_user default"),
changes copy_from_user_nmi() to return the number of
remaining bytes so that it behave like copy_from_user().

Unfortunately, when the range is outside of the process
memory, the return value  is still the number of byte
copied, eg. 0, instead of the remaining bytes.

As all users of copy_from_user_nmi() were modified as
part of commit 0a196848ca, the function should be
fixed to return the total number of bytes if range is
not correct.

Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1435001923-30986-1-git-send-email-ydroneaud@opteya.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-06 14:09:27 +02:00
Adrian Hunter 5aac644a99 x86/tsc: Let high latency PIT fail fast in quick_pit_calibrate()
If it takes longer than 12us to read the PIT counter lsb/msb,
then the error margin will never fall below 500ppm within 50ms,
and Fast TSC calibration will always fail.

This patch detects when that will happen and fails fast. Note
the failure message is not printed in that case because:
1. it will always happen on that class of hardware
2. the absence of the message is more informative than its
presence

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Len Brown <lenb@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Link: http://lkml.kernel.org/r/556EB717.9070607@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-07-06 09:41:00 +02:00
Linus Torvalds a585d2b738 platform-drivers-x86 for 4.2-2
A new intel_pmc_ipc driver, a symmetrical allocation and free fix in
 dell-laptop, a couple minor fixes, and some updated documentation in the
 dell-laptop comments.
 
 intel_pmc_ipc:
  - Add Intel Apollo Lake PMC IPC driver
 
 tc1100-wmi:
  - Delete an unnecessary check before the function call "kfree"
 
 dell-laptop:
  - Fix allocating & freeing SMI buffer page
  - Show info about WiGig and UWB in debugfs
  - Update information about wireless control
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJVmM8aAAoJEKbMaAwKp364iUkH/jihOduWkDTzzzxRP2Dv2nEh
 qyvE94Nc9A9dl87C2+II/Pi1s8h4CJOQpl70syYYPc4FdF70hpvP8TbHkgCWrY/d
 F8CoS9L9keviMtGOWlbEL9hBjfSDNwTMESTrDxrwhA04TSAwjDmXhhiUOF5FjFJm
 CX5+ZQ3iXEH6KsENR+Er54J9+6WKE6IuRcnnKCapnPQ8cEYeVn+WEPyzHCOy8Pg3
 xzzUar3/knS2VMIb5eIVpaKFvD9P9qBsC/gQ0pk1Y+686gwQZMVURDv8lw8hfXpx
 TJDOXk21P8WbSH1r+jwax5wLjLge7vJtYG2Deye6MUgvSgg+O2tSVCv9SMQR088=
 =WUgr
 -----END PGP SIGNATURE-----

Merge tag 'platform-drivers-x86-v4.2-2' of git://git.infradead.org/users/dvhart/linux-platform-drivers-x86

Pull late x86 platform driver updates from Darren Hart:
 "The following came in a bit later and I wanted them to bake in next a
  few more days before submitting, thus the second pull.

  A new intel_pmc_ipc driver, a symmetrical allocation and free fix in
  dell-laptop, a couple minor fixes, and some updated documentation in
  the dell-laptop comments.

  intel_pmc_ipc:
   - Add Intel Apollo Lake PMC IPC driver

  tc1100-wmi:
   - Delete an unnecessary check before the function call "kfree"

  dell-laptop:
   - Fix allocating & freeing SMI buffer page
   - Show info about WiGig and UWB in debugfs
   - Update information about wireless control"

* tag 'platform-drivers-x86-v4.2-2' of git://git.infradead.org/users/dvhart/linux-platform-drivers-x86:
  intel_pmc_ipc: Add Intel Apollo Lake PMC IPC driver
  tc1100-wmi: Delete an unnecessary check before the function call "kfree"
  dell-laptop: Fix allocating & freeing SMI buffer page
  dell-laptop: Show info about WiGig and UWB in debugfs
  dell-laptop: Update information about wireless control
2015-07-05 10:54:09 -07:00
Linus Torvalds 1b3618b60a Except for the preempt notifiers fix, these are all small bugfixes
that could have been waited for -rc2.  Sending them now since I
 was taking care of Peter's patch anyway.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJVmB5pAAoJEL/70l94x66DpLAH/A0p2HICsG5Qw3gnI3NxAmK4
 YUvtMx0d67mFXPg0kuYRMO7C2Is6XHKtnmsX8oqkg3JTRFfn7XYqlwvrrK3Be08U
 tGvhigneJTGDXwU74jyik+D6VLmyJP3CxEvXM3d9AFyy7Ro9Grxx0Ja8c9cmKGQE
 esCwNAEJOcqaQMtNIix3WtXifOVFr40NZlbAawsMyxVw8LZK/K5maXyUTRDI57Qn
 B1wbTN1KD847/0rLrit+8VlsGEZBorUgCFhueeYGy/7EdiY0bNkzhLWb4erlWnRq
 ZlKzsLdfXmEg2CEepaHCm5jlLfIurgbLfoV1tzQ5jAuj/SHmUxq+k3lZZYTYA3w=
 =vDKM
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
 "Except for the preempt notifiers fix, these are all small bugfixes
  that could have been waited for -rc2.  Sending them now since I was
  taking care of Peter's patch anyway"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  kvm: add hyper-v crash msrs values
  KVM: x86: remove data variable from kvm_get_msr_common
  KVM: s390: virtio-ccw: don't overwrite config space values
  KVM: x86: keep track of LVT0 changes under APICv
  KVM: x86: properly restore LVT0
  KVM: x86: make vapics_in_nmi_mode atomic
  sched, preempt_notifier: separate notifier registration from static_key inc/dec
2015-07-04 11:29:59 -07:00
Linus Torvalds b1be9ead13 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "Two FPU rewrite related fixes.  This addresses all known x86
  regressions at this stage.  Also some other misc fixes"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/fpu: Fix boot crash in the early FPU code
  x86/asm/entry/64: Update path names
  x86/fpu: Fix FPU related boot regression when CPUID masking BIOS feature is enabled
  x86/boot/setup: Clean up the e820_reserve_setup_data() code
  x86/kaslr: Fix typo in the KASLR_FLAG documentation
2015-07-04 08:58:50 -07:00
Linus Torvalds c1776a18e3 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates from Ingo Molnar:
 "This tree includes an x86 PMU scheduling fix, but most changes are
  late breaking tooling fixes and updates:

  User visible fixes:

   - Create config.detected into OUTPUT directory, fixing parallel
     builds sharing the same source directory (Aaro Kiskinen)

   - Allow to specify custom linker command, fixing some MIPS64 builds.
     (Aaro Kiskinen)

   - Fix to show proper convergence stats in 'perf bench numa' (Srikar
     Dronamraju)

  User visible changes:

   - Validate syscall list passed via -e argument to 'perf trace'.
     (Arnaldo Carvalho de Melo)

   - Introduce 'perf stat --per-thread' (Jiri Olsa)

   - Check access permission for --kallsyms and --vmlinux (Li Zhang)

   - Move toggling event logic from 'perf top' and into hists browser,
     allowing freeze/unfreeze with event lists with more than one entry
     (Namhyung Kim)

   - Add missing newlines when dumping PERF_RECORD_FINISHED_ROUND and
     showing the Aggregated stats in 'perf report -D' (Adrian Hunter)

  Infrastructure fixes:

   - Add missing break for PERF_RECORD_ITRACE_START, which caused those
     events samples to be parsed as well as PERF_RECORD_LOST_SAMPLES.
     ITRACE_START only appears when Intel PT or BTS are present, so..
     (Jiri Olsa)

   - Call the perf_session destructor when bailing out in the inject,
     kmem, report, kvm and mem tools (Taeung Song)

  Infrastructure changes:

   - Move stuff out of 'perf stat' and into the lib for further use
     (Jiri Olsa)

   - Reference count the cpu_map and thread_map classes (Jiri Olsa)

   - Set evsel->{cpus,threads} from the evlist, if not set, allowing the
     generalization of some 'perf stat' functions that previously were
     accessing private static evlist variable (Jiri Olsa)

   - Delete an unnecessary check before the calling free_event_desc()
     (Markus Elfring)

   - Allow auxtrace data alignment (Adrian Hunter)

   - Allow events with dot (Andi Kleen)

   - Fix failure to 'perf probe' events on arm (He Kuang)

   - Add testing for Makefile.perf (Jiri Olsa)

   - Add test for make install with prefix (Jiri Olsa)

   - Fix single target build dependency check (Jiri Olsa)

   - Access thread_map entries via accessors, prep patch to hold more
     info per entry, for ongoing 'perf stat --per-thread' work (Jiri
     Olsa)

   - Use __weak definition from compiler.h (Sukadev Bhattiprolu)

   - Split perf_pmu__new_alias() (Sukadev Bhattiprolu)"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (54 commits)
  perf tools: Allow to specify custom linker command
  perf tools: Create config.detected into OUTPUT directory
  perf mem: Fill in the missing session freeing after an error occurs
  perf kvm: Fill in the missing session freeing after an error occurs
  perf report: Fill in the missing session freeing after an error occurs
  perf kmem: Fill in the missing session freeing after an error occurs
  perf inject: Fill in the missing session freeing after an error occurs
  perf tools: Add missing break for PERF_RECORD_ITRACE_START
  perf/x86: Fix 'active_events' imbalance
  perf symbols: Check access permission when reading symbol files
  perf stat: Introduce --per-thread option
  perf stat: Introduce print_counters function
  perf stat: Using init_stats instead of memset
  perf stat: Rename print_interval to process_interval
  perf stat: Remove perf_evsel__read_cb function
  perf stat: Move perf_stat initialization counter process code
  perf stat: Move zero_per_pkg into counter process code
  perf stat: Separate counters reading and processing
  perf stat: Introduce read_counters function
  perf stat: Introduce perf_evsel__read function
  ...
2015-07-04 08:17:29 -07:00
Ingo Molnar b96fecbfa8 x86/fpu: Fix boot crash in the early FPU code
Jan Kara and Thomas Gleixner reported boot crashes in the FPU
code:

  general protection fault: 0000 [#1] SMP
  RIP: 0010:[<ffffffff81048a6c>]  [<ffffffff81048a6c>] mxcsr_feature_mask_init+0x1c/0x40

  2b:*  0f ae 85 00 fe ff ff    fxsave -0x200(%rbp)

and bisected it down to the following FPU commit:

   91a8c2a5b4 ("x86/fpu: Clean up and fix MXCSR handling")

The reason is that the on-stack FPU registers state variable,
used by the FXSAVE instruction, did not have the required
minimum alignment of 16 bytes, causing the general protection
fault.

This is most likely a GCC bug in older GCC versions, but the
offending commit also added a bogus extra 32-byte alignment
(which GCC ignored too).

So fix this bug by making the variable static again, but also
mark it __initdata this time, because fpu__init_system_mxcsr()
is now an __init function.

Reported-and-bisected-by: Jan Kara <jack@suse.cz>
Reported-bisected-and-tested-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20150704075819.GA9201@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-04 10:05:56 +02:00
Andrey Smetanin a88464a8b0 kvm: add hyper-v crash msrs values
Added Hyper-V crash msrs values - HV_X64_MSR_CRASH*.

Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Peter Hornyack <peterhornyack@google.com>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Gleb Natapov <gleb@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-07-03 18:55:20 +02:00