Commit Graph

17601 Commits

Author SHA1 Message Date
Linus Torvalds b8ff768b5a Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs fixes from Al Viro:
 "Several fixes for bugs caught while looking through f_pos (ab)users"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  aout32 coredump compat fix
  splice: don't pass the address of ->f_pos to methods
  mconsole: we'd better initialize pos before passing it to vfs_read()...
2013-06-22 08:42:20 -10:00
Steven Rostedt (Red Hat) 2b4bc78956 trace,x86: Do not call local_irq_save() in load_current_idt()
As load_current_idt() is now what is used to update the IDT for the
switches needed for NMI, lockdep debug, and for tracing, it must not
call local_irq_save(). This is because one of the users of this is
lockdep, which does tracing of local_irq_save() and when the debug
trap is hit, we need to update the IDT before tracing interrupts
being disabled. As load_current_idt() is used to do this, calling
local_irq_save() which lockdep traces, defeats the point of calling
load_current_idt().

As interrupts are already disabled when used by lockdep and NMI, the
only other user is tracing that can disable interrupts itself. Simply
have the tracing update disable interrupts before calling load_current_idt()
instead of breaking the other users.

Here's the dump that happened:

------------[ cut here ]------------
WARNING: at /work/autotest/nobackup/linux-test.git/kernel/fork.c:1196 copy_process+0x2c3/0x1398()
DEBUG_LOCKS_WARN_ON(!p->hardirqs_enabled)
Modules linked in:
CPU: 1 PID: 4570 Comm: gdm-simple-gree Not tainted 3.10.0-rc3-test+ #5
Hardware name:                  /DG965MQ, BIOS MQ96510J.86A.0372.2006.0605.1717 06/05/2006
 ffffffff81d2a7a5 ffff88006ed13d50 ffffffff8192822b ffff88006ed13d90
 ffffffff81035f25 ffff8800721c6000 ffff88006ed13da0 0000000001200011
 0000000000000000 ffff88006ed5e000 ffff8800721c6000 ffff88006ed13df0
Call Trace:
 [<ffffffff8192822b>] dump_stack+0x19/0x1b
 [<ffffffff81035f25>] warn_slowpath_common+0x67/0x80
 [<ffffffff81035fe1>] warn_slowpath_fmt+0x46/0x48
 [<ffffffff812bfc5d>] ? __raw_spin_lock_init+0x31/0x52
 [<ffffffff810341f7>] copy_process+0x2c3/0x1398
 [<ffffffff8103539d>] do_fork+0xa8/0x260
 [<ffffffff810ca7b1>] ? trace_preempt_on+0x2a/0x2f
 [<ffffffff812afb3e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
 [<ffffffff81937fe7>] ? sysret_check+0x1b/0x56
 [<ffffffff81937fe7>] ? sysret_check+0x1b/0x56
 [<ffffffff810355cf>] SyS_clone+0x16/0x18
 [<ffffffff81938369>] stub_clone+0x69/0x90
 [<ffffffff81937fc2>] ? system_call_fastpath+0x16/0x1b
---[ end trace 8b157a9d20ca1aa2 ]---

in fork.c:

 #ifdef CONFIG_PROVE_LOCKING
	DEBUG_LOCKS_WARN_ON(!p->hardirqs_enabled); <-- bug here
	DEBUG_LOCKS_WARN_ON(!p->softirqs_enabled);
 #endif

Cc: Seiji Aguchi <seiji.aguchi@hds.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-06-22 13:16:19 -04:00
Al Viro 945fb136df aout32 coredump compat fix
dump_seek() does SEEK_CUR, not SEEK_SET; native binfmt_aout
handles it correctly (seeks by PAGE_SIZE - sizeof(struct user),
getting the current position to PAGE_SIZE), compat one seeks
by PAGE_SIZE and ends up at PAGE_SIZE + already written...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-22 11:01:38 +04:00
Linus Torvalds f71194a7d4 Merge branch 'x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Peter Anvin:
 "This series fixes a couple of build failures, and fixes MTRR cleanup
  and memory setup on very specific memory maps.

  Finally, it fixes triggering backtraces on all CPUs, which was
  inadvertently disabled on x86."

* 'x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/efi: Fix dummy variable buffer allocation
  x86: Fix trigger_all_cpu_backtrace() implementation
  x86: Fix section mismatch on load_ucode_ap
  x86: fix build error and kconfig for ia32_emulation and binfmt
  range: Do not add new blank slot with add_range_with_merge
  x86, mtrr: Fix original mtrr range get for mtrr_cleanup
2013-06-21 06:33:48 -10:00
Linus Torvalds 9d0be540d7 KVM fixes for 3.10-rc6
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.13 (GNU/Linux)
 
 iQIcBAABAgAGBQJRwc4AAAoJEBvWZb6bTYbyT/wQAIAvvjM+jlqOsZCUjnutdWXf
 VGvJB1jXPEa6cuUkaE3RYO8rLLJz/PC4Geg8I/awtjeSkPvldeu0FihSkdbrNkpJ
 3dSAJ0cR8+ScUZB74Lt1pfwwEAdZk61MU1opD1qy3wWTy5Rk2xp4qzE2aCVJoLkh
 K8gscLrCHR47DELcm+AWCtHt/0VOZSAVe9z/7Qf9eAMCNmH/efHgrZZTxF/1aBkI
 7XaZdqT+d84D/Bc2isdRNvFYt9rkuII2GCeRciw2t3QsgVoBXn7FVK2OXbufIaYW
 /XX0YiNpSnUpcpOtGw2MwzpKgDSRE9QbnOUVsWThTDeG7Q5d72ioCFuRDba8p3Lc
 u146nQM495bAdJQz6IW3JCeiZlQ86/QjFquk9hJuCwQpuHmlRpVEqFbLVnZEnYvk
 lc9dnK7BcaftbMgo/j6DMv6Bpq/EDp1JrPXzNT4BwG7ptArEXzmdoktwpONjrZzI
 1oZV5xCcNhTjyqlndiZKxneKXoqz/3NOdR+EV0yh3+0l4PK6C52jd6PaKRv7qN3l
 Lt35uFrmKqzOR12KVStYREyh1Sy43ADE4tIIPnOBbcXz23Df4/vVSb1Bqv0nPq4i
 JhsRsGoR8ZdDBd7c40XlvpxOiMmubFLSklX0WaKn2yIZH9FMj63Ak0wet7qhaqb8
 lbJRUH8gnK0RCVEoUFpB
 =yGPS
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM fixes from Paolo Bonzini:
 "Three one-line fixes for my first pull request; one for x86 host, one
  for x86 guest, one for PPC"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  x86: kvmclock: zero initialize pvclock shared memory area
  kvm/ppc/booke: Delay kvmppc_lazy_ee_enable
  KVM: x86: remove vcpu's CPL check in host-invoked XCR set
2013-06-21 06:29:22 -10:00
Linus Torvalds 92616ee654 Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto fix from Herbert Xu:
 "This fixes an unaligned crash in XTS mode when using aseni_intel"

* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: aesni_intel - fix accessing of unaligned memory
2013-06-21 06:28:39 -10:00
Steven Rostedt (Red Hat) 83ab85140b trace,x86: Move creation of irq tracepoints from apic.c to irq.c
Compiling without CONFIG_X86_LOCAL_APIC set, apic.c will not be
compiled, and the irq tracepoints will not be created via the
CREATE_TRACE_POINTS macro. When CONFIG_X86_LOCAL_APIC is not set,
we get the following build error:

  LD      init/built-in.o
arch/x86/built-in.o: In function `trace_x86_platform_ipi_entry':
linux-test.git/arch/x86/include/asm/trace/irq_vectors.h:66: undefined reference to `__tracepoint_x86_platform_ipi_entry'
arch/x86/built-in.o: In function `trace_x86_platform_ipi_exit':
linux-test.git/arch/x86/include/asm/trace/irq_vectors.h:66: undefined reference to `__tracepoint_x86_platform_ipi_exit'
arch/x86/built-in.o: In function `trace_irq_work_entry':
linux-test.git/arch/x86/include/asm/trace/irq_vectors.h:72: undefined reference to `__tracepoint_irq_work_entry'
arch/x86/built-in.o: In function `trace_irq_work_exit':
linux-test.git/arch/x86/include/asm/trace/irq_vectors.h:72: undefined reference to `__tracepoint_irq_work_exit'
arch/x86/built-in.o:(__jump_table+0x8): undefined reference to `__tracepoint_x86_platform_ipi_entry'
arch/x86/built-in.o:(__jump_table+0x14): undefined reference to `__tracepoint_x86_platform_ipi_exit'
arch/x86/built-in.o:(__jump_table+0x20): undefined reference to `__tracepoint_irq_work_entry'
arch/x86/built-in.o:(__jump_table+0x2c): undefined reference to `__tracepoint_irq_work_exit'
make[1]: *** [vmlinux] Error 1
make: *** [sub-make] Error 2

As irq.c is always compiled for x86, it is a more appropriate location
to create the irq tracepoints.

Cc: Seiji Aguchi <seiji.aguchi@hds.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-06-21 10:33:28 -04:00
H. Peter Anvin df91c3513f * Don't leak random kernel memory to EFI variable NVRAM when attempting
to initiate garbage collection. Also, free the kernel memory when
    we're done with it instead of leaking - Ben Hutchings
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.13 (GNU/Linux)
 
 iQIcBAABAgAGBQJRxCKWAAoJEC84WcCNIz1VIDcP/icYJSdXw/fKUgCG9+zAk3FE
 xivUa6aJKG+GBL4Cg2VYTJ/sBDA1v0xdFcf73amCdTcWuor6ha5LxrYFwFhhKtYk
 3O05spiCZ+F6nveZTrrxqKce3cN5XOYwbF8bZOAuJYmjaTvZQ0Cb89zf/asC4+Fx
 ui6HCRmCy6nUy5Fn4A7fBHlTyuOE4xZXvZjOUXWGiUwmZK/SbHYtN1+4nXBTnAWF
 yDcB8fkl4v7cYLIl9U8dc7Dg8ita4bj6F8RRNJf2WXipsOpMoMRtiua8VLCZ0ECc
 9cpONx7wMQS5nDEfNefWm4vdqS8PEG1CkZ8tN37nIU/AlTi6mTyTqZv0ICQfgVna
 6nChmkz9uxvzTzlMlOR5eGBu68xQLiyHhauTJBpMkwOp4q4gg/Ui+bhV7Vo9ySF3
 DlrSekT0Uo1PLgdAUYg7+hZhN98HGHoGSEx7FsymOjqlY3nF+4k7ZeTerEI4vI2D
 zqyj5QVWqS/M9zj1ojUu3nFDNhfQHO4PvVt9O8ca4/IJS1pbb/J8qtwIOFTNdJHf
 2Iyk6GnT9UaKvOs4trWXmPjiZkBcS8JdQH80fDSIgIxLhwOQksT+q90TY5h3mAED
 Hwm9JywXlxpFncPixAkpoorqDWSZmBTmUw9JR0FQNTFGuENPA0SGn51P2gGbTA+e
 82ECeWRXMMoYA/qNxMI2
 =kfTf
 -----END PGP SIGNATURE-----

Merge tag 'efi-urgent' into x86/urgent

 * Don't leak random kernel memory to EFI variable NVRAM when attempting
   to initiate garbage collection. Also, free the kernel memory when
   we're done with it instead of leaking - Ben Hutchings

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-06-21 03:01:21 -07:00
Ben Hutchings b8cb62f821 x86/efi: Fix dummy variable buffer allocation
1. Check for allocation failure
2. Clear the buffer contents, as they may actually be written to flash
3. Don't leak the buffer

Compile-tested only.

[ Tested successfully on my buggy ASUS machine - Matt ]

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Cc: stable@vger.kernel.org
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2013-06-21 10:52:49 +01:00
Seiji Aguchi cf910e83ae x86, trace: Add irq vector tracepoints
[Purpose of this patch]

As Vaibhav explained in the thread below, tracepoints for irq vectors
are useful.

http://www.spinics.net/lists/mm-commits/msg85707.html

<snip>
The current interrupt traces from irq_handler_entry and irq_handler_exit
provide when an interrupt is handled.  They provide good data about when
the system has switched to kernel space and how it affects the currently
running processes.

There are some IRQ vectors which trigger the system into kernel space,
which are not handled in generic IRQ handlers.  Tracing such events gives
us the information about IRQ interaction with other system events.

The trace also tells where the system is spending its time.  We want to
know which cores are handling interrupts and how they are affecting other
processes in the system.  Also, the trace provides information about when
the cores are idle and which interrupts are changing that state.
<snip>

On the other hand, my usecase is tracing just local timer event and
getting a value of instruction pointer.

I suggested to add an argument local timer event to get instruction pointer before.
But there is another way to get it with external module like systemtap.
So, I don't need to add any argument to irq vector tracepoints now.

[Patch Description]

Vaibhav's patch shared a trace point ,irq_vector_entry/irq_vector_exit, in all events.
But there is an above use case to trace specific irq_vector rather than tracing all events.
In this case, we are concerned about overhead due to unwanted events.

So, add following tracepoints instead of introducing irq_vector_entry/exit.
so that we can enable them independently.
   - local_timer_vector
   - reschedule_vector
   - call_function_vector
   - call_function_single_vector
   - irq_work_entry_vector
   - error_apic_vector
   - thermal_apic_vector
   - threshold_apic_vector
   - spurious_apic_vector
   - x86_platform_ipi_vector

Also, introduce a logic switching IDT at enabling/disabling time so that a time penalty
makes a zero when tracepoints are disabled. Detailed explanations are as follows.
 - Create trace irq handlers with entering_irq()/exiting_irq().
 - Create a new IDT, trace_idt_table, at boot time by adding a logic to
   _set_gate(). It is just a copy of original idt table.
 - Register the new handlers for tracpoints to the new IDT by introducing
   macros to alloc_intr_gate() called at registering time of irq_vector handlers.
 - Add checking, whether irq vector tracing is on/off, into load_current_idt().
   This has to be done below debug checking for these reasons.
   - Switching to debug IDT may be kicked while tracing is enabled.
   - On the other hands, switching to trace IDT is kicked only when debugging
     is disabled.

In addition, the new IDT is created only when CONFIG_TRACING is enabled to avoid being
used for other purposes.

Signed-off-by: Seiji Aguchi <seiji.aguchi@hds.com>
Link: http://lkml.kernel.org/r/51C323ED.5050708@hds.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
2013-06-20 22:25:34 -07:00
Seiji Aguchi 629f4f9d59 x86: Rename variables for debugging
Rename variables for debugging to describe meaning of them precisely.

Also, introduce a generic way to switch IDT by checking a current state,
debug on/off.

Signed-off-by: Seiji Aguchi <seiji.aguchi@hds.com>
Link: http://lkml.kernel.org/r/51C323A8.7050905@hds.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
2013-06-20 22:25:13 -07:00
Seiji Aguchi eddc0e922a x86, trace: Introduce entering/exiting_irq()
When implementing tracepoints in interrupt handers, if the tracepoints are
simply added in the performance sensitive path of interrupt handers,
it may cause potential performance problem due to the time penalty.

To solve the problem, an idea is to prepare non-trace/trace irq handers and
switch their IDTs at the enabling/disabling time.

So, let's introduce entering_irq()/exiting_irq() for pre/post-
processing of each irq handler.

A way to use them is as follows.

Non-trace irq handler:
smp_irq_handler()
{
	entering_irq();		/* pre-processing of this handler */
	__smp_irq_handler();	/*
				 * common logic between non-trace and trace handlers
				 * in a vector.
				 */
	exiting_irq();		/* post-processing of this handler */

}

Trace irq_handler:
smp_trace_irq_handler()
{
	entering_irq();		/* pre-processing of this handler */
	trace_irq_entry();	/* tracepoint for irq entry */
	__smp_irq_handler();	/*
				 * common logic between non-trace and trace handlers
				 * in a vector.
				 */
	trace_irq_exit();	/* tracepoint for irq exit */
	exiting_irq();		/* post-processing of this handler */

}

If tracepoints can place outside entering_irq()/exiting_irq() as follows,
it looks cleaner.

smp_trace_irq_handler()
{
	trace_irq_entry();
	smp_irq_handler();
	trace_irq_exit();
}

But it doesn't work.
The problem is with irq_enter/exit() being called. They must be called before
trace_irq_enter/exit(),  because of the rcu_irq_enter() must be called before
any tracepoints are used, as tracepoints use  rcu to synchronize.

As a possible alternative, we may be able to call irq_enter() first as follows
if irq_enter() can nest.

smp_trace_irq_hander()
{
	irq_entry();
	trace_irq_entry();
	smp_irq_handler();
	trace_irq_exit();
	irq_exit();
}

But it doesn't work, either.
If irq_enter() is nested, it may have a time penalty because it has to check if it
was already called or not. The time penalty is not desired in performance sensitive
paths even if it is tiny.

Signed-off-by: Seiji Aguchi <seiji.aguchi@hds.com>
Link: http://lkml.kernel.org/r/51C3238D.9040706@hds.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
2013-06-20 22:25:01 -07:00
H. Peter Anvin f037e416af x86, reloc: Use xorl instead of xorq in relocate_kernel_64.S
There is no point in using "xorq" to clear a register... use "xorl" to
clear the bottom 32 bits, and the upper 32 bits get cleared by virtue
of zero extension.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Kees Cook <keescook@chromium.org>
Link: http://lkml.kernel.org/n/tip-b76zi1gep39c0zs8fbvkhie9@git.kernel.org
2013-06-20 21:30:04 -07:00
H. Peter Anvin e6bca5a6a8 Linux 3.10-rc6
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.13 (GNU/Linux)
 
 iQEcBAABAgAGBQJRvOHmAAoJEHm+PkMAQRiGpOYIAI/OMDjCMro8S1FjDPGr+wsz
 r/u4h4DiHQTdAUSBYhksgXVBSm02hGhDF3tz7u9oAXuv+4zlGUMZNjmEmLOFfdyd
 Ve9vqMrUkdwA0jt0d7AYVjtCy/j/yVe5szXZCksHXOZiyb78SEZPQgHc13CJ4lKI
 K2jPk0sMko7d5ptALPIX+ome48M+3w3k1HuQ62Znm4pFSADcFGpGdf40HP/5LL+I
 Llp3XHJ5OZrcHRal0t39oHJOCWW61bnYXTWel9J7XoUztebF2cRgFydAhtrto42z
 x7ELOKVY197WH8l56cnVHrISneQd4kgWrooxLYy6QsJl73/qvQBX+7nmkoes7so=
 =qeV0
 -----END PGP SIGNATURE-----

Merge tag 'v3.10-rc6' into x86/cleanups

Linux 3.10-rc6

We need a change that is the mainline tree for further work.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-06-20 21:13:55 -07:00
Borislav Petkov 5f8c421814 x86, fpu: Use static_cpu_has_safe before alternatives
The call stack below shows how this happens: basically eager_fpu_init()
calls __thread_fpu_begin(current) which then does if (!use_eager_fpu()),
which, in turn, uses static_cpu_has.

And we're executing before alternatives so static_cpu_has doesn't work
there yet.

Use the safe variant in this path which becomes optimal after
alternatives have run.

WARNING: at arch/x86/kernel/cpu/common.c:1368 warn_pre_alternatives+0x1e/0x20()
You're using static_cpu_has before alternatives have run!
Modules linked in:
Pid: 0, comm: swapper Not tainted 3.9.0-rc8+ #1
Call Trace:
 warn_slowpath_common
 warn_slowpath_fmt
 ? fpu_finit
 warn_pre_alternatives
 eager_fpu_init
 fpu_init
 cpu_init
 trap_init
 start_kernel
 ? repair_env_string
 x86_64_start_reservations
 x86_64_start_kernel

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1370772454-6106-6-git-send-email-bp@alien8.de
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-06-20 17:38:22 -07:00
Borislav Petkov 4a90a99c4f x86: Add a static_cpu_has_safe variant
We want to use this in early code where alternatives might not have run
yet and for that case we fall back to the dynamic boot_cpu_has.

For that, force a 5-byte jump since the compiler could be generating
differently sized jumps for each label.

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1370772454-6106-5-git-send-email-bp@alien8.de
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-06-20 17:38:14 -07:00
Borislav Petkov 5700f743b5 x86: Sanity-check static_cpu_has usage
static_cpu_has may be used only after alternatives have run. Before that
it always returns false if constant folding with __builtin_constant_p()
doesn't happen. And you don't want that.

This patch is the result of me debugging an issue where I overzealously
put static_cpu_has in code which executed before alternatives have run
and had to spend some time with scratching head and cursing at the
monitor.

So add a jump to a warning which screams loudly when we use this
function too early. The alternatives patch that check away in
conjunction with patching the rest of the kernel image.

[ hpa: factored this into its own configuration option.  If we want to
  have an overarching option, it should be an option which selects
  other options, not as a group option in the source code. ]

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1370772454-6106-4-git-send-email-bp@alien8.de
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-06-20 17:37:19 -07:00
Borislav Petkov c3b83598c1 x86, cpu: Add a synthetic, always true, cpu feature
This will be used in alternatives later as an always-replace flag.

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1370772454-6106-2-git-send-email-bp@alien8.de
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-06-20 17:06:07 -07:00
Linus Torvalds a3d5c3460a Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar:
 "Two smaller fixes - plus a context tracking tracing fix that is a bit
  bigger"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  tracing/context-tracking: Add preempt_schedule_context() for tracing
  sched: Fix clear NOHZ_BALANCE_KICK
  sched/x86: Construct all sibling maps if smt
2013-06-20 08:18:35 -10:00
Linus Torvalds 86c76676cf Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "Four fixes.  The mmap ones are unfortunately larger than desired -
  fuzzing uncovered bugs that needed perf context life time management
  changes to fix properly"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86: Fix broken PEBS-LL support on SNB-EP/IVB-EP
  perf: Fix mmap() accounting hole
  perf: Fix perf mmap bugs
  kprobes: Fix to free gone and unused optprobes
2013-06-20 08:17:36 -10:00
Linus Torvalds 805e318548 Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull cpu idle fixes from Thomas Gleixner:
 - Add a missing irq enable. Fallout of the idle conversion
 - Fix stackprotector wreckage caused by the idle conversion

* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  idle: Enable interrupts in the weak arch_cpu_idle() implementation
  idle: Add the stack canary init to cpu_startup_entry()
2013-06-20 08:16:07 -10:00
Ingo Molnar f070a4dba9 Merge branch 'perf/urgent' into perf/core
Merge in two hw_breakpoint fixes, before applying another 5.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-20 17:57:40 +02:00
Masami Hiramatsu 003002e04e kprobes: Fix arch_prepare_kprobe to handle copy insn failures
Fix arch_prepare_kprobe() to handle failures in copy instruction
correctly. This fix is related to the previous fix: 8101376
which made __copy_instruction return an error result if failed,
but caller site was not updated to handle it. Thus, this is the
other half of the bugfix.

This fix is also related to the following bug-report:

   https://bugzilla.redhat.com/show_bug.cgi?id=910649

Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Tested-by: Jonathan Lebon <jlebon@redhat.com>
Cc: Frank Ch. Eigler <fche@redhat.com>
Cc: systemtap@sourceware.org
Cc: yrl.pp-manager.tt@hitachi.com
Link: http://lkml.kernel.org/r/20130605031216.15285.2001.stgit@mhiramat-M0-7522
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-20 14:25:48 +02:00
Michel Lespinasse b52e0a7c4e x86: Fix trigger_all_cpu_backtrace() implementation
The following change fixes the x86 implementation of
trigger_all_cpu_backtrace(), which was previously (accidentally,
as far as I can tell) disabled to always return false as on
architectures that do not implement this function.

trigger_all_cpu_backtrace(), as defined in include/linux/nmi.h,
should call arch_trigger_all_cpu_backtrace() if available, or
return false if the underlying arch doesn't implement this
function.

x86 did provide a suitable arch_trigger_all_cpu_backtrace()
implementation, but it wasn't actually being used because it was
declared in asm/nmi.h, which linux/nmi.h doesn't include. Also,
linux/nmi.h couldn't easily be fixed by including asm/nmi.h,
because that file is not available on all architectures.

I am proposing to fix this by moving the x86 definition of
arch_trigger_all_cpu_backtrace() to asm/irq.h.

Tested via: echo l > /proc/sysrq-trigger

Before the change, this uses a fallback implementation which
shows backtraces on active CPUs (using
smp_call_function_interrupt() )

After the change, this shows NMI backtraces on all CPUs

Signed-off-by: Michel Lespinasse <walken@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1370518875-1346-1-git-send-email-walken@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-20 14:00:21 +02:00
Borislav Petkov 719038de98 x86/intel/cacheinfo: Shut up last long-standing warning
arch/x86/kernel/cpu/intel_cacheinfo.c: In function ‘init_intel_cacheinfo’:
arch/x86/kernel/cpu/intel_cacheinfo.c:642:28: warning: ‘this_leaf.size’ may be used uninitialized in this function [-Wmaybe-uninitialized] arch/x86/kernel/cpu/intel_cacheinfo.c:643:29: warning: ‘this_leaf.eax.split.num_threads_sharing’ may be used uninitialized in this function [-Wmaybe-uninitialized]

This keeps on happening during randbuilds and the compiler is
wrong here:

In the case where cpuid4_cache_lookup_regs() returns 0, both
this_leaf.size and this_leaf.eax get initialized. In the case
where the CPUID leaf doesn't contain valid cache info, we error
out which init_intel_cacheinfo() handles correctly without
touching the abovementioned fields.

So shut up the warning by clearing out the struct which we hand
down.

While at it, reverse error handling and gain one indentation
level.

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1370710095-20547-1-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-20 12:27:41 +02:00
Paul Gortmaker 949785996e x86: Fix section mismatch on load_ucode_ap
We are in the process of removing all the __cpuinit annotations.
While working on making that change, an existing problem was
made evident:

  WARNING: arch/x86/kernel/built-in.o(.text+0x198f2): Section mismatch
  in reference from the function cpu_init() to the function
  .init.text:load_ucode_ap()   The function cpu_init() references
  the function __init load_ucode_ap().  This is often because cpu_init
  lacks a __init annotation or the annotation of load_ucode_ap is wrong.

This now appears because in my working tree, cpu_init() is no longer
tagged as __cpuinit, and so the audit picks up the mismatch.  The 2nd
hypothesis from the audit is the correct one, as there was an incorrect
__init tag on the prototype in the header (but __cpuinit was used on
the function itself.)

The audit is telling us that the prototype's __init annotation took
effect and the function did land in the .init.text section.  Checking
with objdump on a mainline tree that still has __cpuinit shows that
the __cpuinit on the function takes precedence over the __init on the
prototype, but that won't be true once we make __cpuinit a no-op.

Even though we are removing __cpuinit, we temporarily align both
the function and the prototype on __cpuinit so that the changeset
can be applied to stable trees  if desired.

[ hpa: build fix only, no object code change ]

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: stable <stable@vger.kernel.org> # 3.9+
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Link: http://lkml.kernel.org/r/1371654926-11729-1-git-send-email-paul.gortmaker@windriver.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-06-19 14:43:59 -07:00
Joe Perches f07d91ede6 x86/vdso: Convert use of typedef ctl_table to struct ctl_table
This typedef is unnecessary and should just be removed.

Signed-off-by: Joe Perches <joe@perches.com>
Cc: Jiri Kosina <trivial@kernel.org>
Link: http://lkml.kernel.org/r/a756fa0060e8eea25e8c1863c2764e86c2823617.1371177118.git.joe@perches.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-19 15:06:09 +02:00
Rusty Russell 5a802e1530 x86: Remove weird PTR_ERR() in do_debug
62edab905 changed the argument to notify_die() from dr6 to &dr6,
but weirdly, used PTR_ERR() to cast it to a long.  Since dr6 is
on the stack, this is an abuse of PTR_ERR().  Cast to long, as
per kernel standard.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: K.Prasad <prasad@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/1371357768-4968-8-git-send-email-rusty@rustcorp.com.au
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-19 15:01:36 +02:00
Andi Kleen f9134f36ae perf/x86/intel: Add mem-loads/stores support for Haswell
mem-loads is basically the same as Sandy Bridge,
but we use a separate string for changes later.

Haswell doesn't support the full precise store mode,
so we emulate it using the "DataLA" facility.
This allows to do everything, but for data sources we
can only detect L1 hit or not.

There is no explicit enable bit anymore, so we have
to tie it to a perf internal only flag.

The address is supported for all memory related PEBS
events with DataLA. Instead of only logging for the
load and store events we allow logging it for all
(it will be simply 0 if the current event does not
support it)

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Andi Kleen <ak@linux.jf.intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: http://lkml.kernel.org/r/1371515812-9646-7-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-19 14:43:35 +02:00
Andi Kleen 135c5612c4 perf/x86/intel: Support Haswell/v4 LBR format
Haswell has two additional LBR from flags for TSX: in_tx and
abort_tx, implemented as a new "v4" version of the LBR format.

Handle those in and adjust the sign extension code to still
correctly extend. The flags are exported similarly in the LBR
record to the existing misprediction flag

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Andi Kleen <ak@linux.jf.intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: http://lkml.kernel.org/r/1371515812-9646-6-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-19 14:43:35 +02:00
Andi Kleen 72db559646 perf/x86/intel: Move NMI clearing to end of PMI handler
This avoids some problems with spurious PMIs on Haswell.
Haswell seems to behave more like P4 in this regard. Do
the same thing as the P4 perf handler by unmasking
the NMI only at the end. Shouldn't make any difference
for earlier family 6 cores.

(Tested on Haswell, IvyBridge, Westmere, Saltwell (Atom).)

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Andi Kleen <ak@linux.jf.intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: http://lkml.kernel.org/r/1371515812-9646-5-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-19 14:43:34 +02:00
Andi Kleen 3044318f1f perf/x86/intel: Add Haswell PEBS support
Add simple PEBS support for Haswell.

The constraints are similar to SandyBridge with a few new
events.

Reviewed-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Andi Kleen <ak@linux.jf.intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: http://lkml.kernel.org/r/1371515812-9646-4-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-19 14:43:33 +02:00
Andi Kleen 3a632cb229 perf/x86/intel: Add simple Haswell PMU support
Similar to SandyBridge, but has a few new events and two
new counter bits.

There are some new counter flags that need to be prevented
from being set on fixed counters, and allowed to be set
for generic counters.

Also we add support for the counter 2 constraint to handle
all raw events.

(Contains fixes from Stephane Eranian.)

Reviewed-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Andi Kleen <ak@linux.jf.intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: http://lkml.kernel.org/r/1371515812-9646-3-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-19 14:43:33 +02:00
Andi Kleen 130768b8c9 perf/x86/intel: Add Haswell PEBS record support
Add support for the Haswell extended (fmt2) PEBS format.

It has a superset of the nhm (fmt1) PEBS fields, but has a
longer record so we need to adjust the code paths.

The main advantage is the new "EventingRip" support which
directly gives the instruction, not off-by-one instruction. So
with precise == 2 we use that directly and don't try to use LBRs
and walking basic blocks. This lowers the overhead of using
precise significantly.

Some other features are added in later patches.

Reviewed-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Andi Kleen <ak@linux.jf.intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: http://lkml.kernel.org/r/1371515812-9646-2-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-19 14:43:32 +02:00
Dave Jones 4338774cd4 x86/debug: Only print out DR registers if they are not power-on defaults
The DR registers are rarely useful when decoding oopses.
With screen real estate during oopses at a premium, we can save
two lines by only printing out these registers when they are set
to something other than they power-on state.

Signed-off-by: Dave Jones <davej@redhat.com>
Acked-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20130618160911.GA24487@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-19 14:33:59 +02:00
Ingo Molnar 2e7e98b85d Fix typo in define
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJRsuoUAAoJEBLB8Bhh3lVKpFUP/j6763r5WPtrzUx0sAZBt7SN
 T9VAdqWd3SvxP+lDnRoU1luGQc0j+05njUnc74e2k2rGu2+WDonkxJIv4xMHsuG/
 cItdK+ThmxOQa4ijm3uJ0N8I16k15cZkW0H5hyHSVqbawPaAjCt25DEXexTSgBQM
 wnQSsDNXRjtU/LSBPAZX4t9ePdawy/EwLGO+YYJ1ZKck4oesCwp0j/PVqQG+IGTM
 QUbB3mh0GvdHW5pPgVH6n+yqQ7puN92WHzJicYsYCtJDHa1Yf5D6/C+ZBq8GJ1V0
 rCdxQ8uI5tcn9WOJ1gZrUGPmjEMotnLrZX26b/cTEPzBP0mjf+Kd/Ew+gNmdmvX8
 5jv1Agzb1GQVXesPpv+N3hw5HNqLEQCBdlLQ5kiX63wYwF4vck+JaAGDWUILO6qp
 TtdEiotBHDKpk9YuWeepmCUDB2u/uyMrHSA0HGLBh6ACQXsl5/RnQj3KnGFlbvo6
 FbMvoOEXbNIOB92OEnC9kZteXGlIBtsK0sHmRXfOsckbH/gyYyBcFwqmeHFpzuKV
 N5f3NV8GTrUyrNdn4S28S/wlnx+KOqpqFhllVkgp/OcW2/Ry2eEZYM7ypU8bXMpo
 +HOFwQZimmQonVqRzGrQQI9w5+D5xGJ3gI/0lcjkX2r2Njowx8q13wfKKmqE8EDB
 7a+BsNdTBUt01dskFlSe
 =8FzZ
 -----END PGP SIGNATURE-----

Merge tag 'ras_fixlet_for_3.11' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras into x86/ras

Pull "Fix typo in define" change from Borislav Petkov.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-19 13:51:54 +02:00
Jiri Slaby 062f487190 x86/boot: Close opened file descriptor
During build we open a file, read that but do not close it. Fix
that by sticking fclose() at the right place.

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Link: http://lkml.kernel.org/r/1371628383-11216-1-git-send-email-jslaby@suse.cz
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
2013-06-19 13:32:19 +02:00
Yan, Zheng b2fa344d0c perf/x86/intel: Fix sparse warning
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1370421025-10986-1-git-send-email-zheng.z.yan@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-19 13:04:55 +02:00
Suravee Suthikulpanit 7be6296fdd perf/x86/amd: AMD IOMMU Performance Counter PERF uncore PMU implementation
Implement a perf PMU to handle IOMMU performance counters and events.
The PMU only supports counting mode (e.g. perf stat). Since the counters
are shared across all cores, the PMU is implemented as "system-wide" mode.

To invoke the AMD IOMMU PMU, issue a perf tool command such as:

  ./perf stat -a -e amd_iommu/<events>/ <command>

or:

  ./perf stat -a -e amd_iommu/config=<config-data>,config1=<config1-data>/ <command>

For example:

  ./perf stat -a -e amd_iommu/mem_trans_total/ <command>

The resulting count will be how many IOMMU total peripheral memory
operations were performed during the command execution window.

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1370466709-3212-3-git-send-email-suravee.suthikulpanit@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-19 13:04:53 +02:00
Dave Hansen ae0def05ed perf/x86: Only print PMU state when also WARN()'ing
intel_pmu_handle_irq() has a warning in it if it does too many
loops.  It is a WARN_ONCE(), but the perf_event_print_debug()
call beneath it is unconditional. For the first warning, you get
a nice backtrace and message, but subsequent ones just dump the
PMU state with no leading messages.  I doubt this is what was
intended.

This patch will only print the PMU state when paired with the
WARN_ON() text.  It effectively open-codes WARN_ONCE()'s
one-time-only logic.

My suspicion is that the code really just wants to make sure we
do not sit in the loop and spit out a warning for every loop
iteration after the 100th.  From what I've seen, this is very
unlikely to happen since we also clear the PMU state.

After this patch, instead of seeing the PMU state dumped each
time, you will just see:

	[57494.894540] perf_event_intel: clearing PMU state on CPU#129
	[57579.539668] perf_event_intel: clearing PMU state on CPU#10
	[57587.137762] perf_event_intel: clearing PMU state on CPU#134
	[57623.039912] perf_event_intel: clearing PMU state on CPU#114
	[57644.559943] perf_event_intel: clearing PMU state on CPU#118
	...

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20130530174559.0DB049F4@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-19 12:50:47 +02:00
Andrew Hunter 43b4578071 perf/x86: Reduce stack usage of x86_schedule_events()
x86_schedule_events() caches event constraints on the stack during
scheduling.  Given the number of possible events, this is 512 bytes of
stack; since it can be invoked under schedule() under god-knows-what,
this is causing stack blowouts.

Trade some space usage for stack safety: add a place to cache the
constraint pointer to struct perf_event.  For 8 bytes per event (1% of
its size) we can save the giant stack frame.

This shouldn't change any aspect of scheduling whatsoever and while in
theory the locality's a tiny bit worse, I doubt we'll see any
performance impact either.

Tested: `perf stat whatever` does not blow up and produces
results that aren't hugely obviously wrong.  I'm not sure how to run
particularly good tests of perf code, but this should not produce any
functional change whatsoever.

Signed-off-by: Andrew Hunter <ahh@google.com>
Reviewed-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1369332423-4400-1-git-send-email-ahh@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-19 12:50:44 +02:00
Ingo Molnar eff2108f02 Merge branch 'perf/urgent' into perf/core
Merge in the latest fixes, to avoid conflicts with ongoing work.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-19 12:44:41 +02:00
Stephane Eranian f1a527899e perf/x86: Fix broken PEBS-LL support on SNB-EP/IVB-EP
This patch fixes broken support of PEBS-LL on SNB-EP/IVB-EP.
For some reason, the LDLAT extra reg definition for snb_ep
showed up as duplicate in the snb table.

This patch moves the definition of LDLAT back into the
snb_ep table.

Thanks to Don Zickus for tracking this one down.

Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20130607212210.GA11849@quad
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-19 12:44:16 +02:00
Igor Mammedov 07868fc6aa x86: kvmclock: zero initialize pvclock shared memory area
kernel might hung in pvclock_clocksource_read() due to
uninitialized memory might contain odd version value in
following cycle:

        do {
                version = __pvclock_read_cycles(src, &ret, &flags);
        } while ((src->version & 1) || version != src->version);

if secondary kvmclock is accessed before it's registered with kvm.

Clear garbage in pvclock shared memory area right after it's
allocated to avoid this issue.

Ref: https://bugzilla.kernel.org/show_bug.cgi?id=59521
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
[See BZ for analysis.  We may want a different fix for 3.11, but
 this is the safest for now - Paolo]
Cc: <stable@vger.kernel.org> # 3.8
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-06-19 12:25:28 +02:00
Randy Dunlap d1603990ea x86: fix build error and kconfig for ia32_emulation and binfmt
Fix kconfig warning and build errors on x86_64 by selecting BINFMT_ELF
when COMPAT_BINFMT_ELF is being selected.

warning: (IA32_EMULATION) selects COMPAT_BINFMT_ELF which has unmet direct dependencies (COMPAT && BINFMT_ELF)

fs/built-in.o: In function `elf_core_dump':
compat_binfmt_elf.c:(.text+0x3e093): undefined reference to `elf_core_extra_phdrs'
compat_binfmt_elf.c:(.text+0x3ebcd): undefined reference to `elf_core_extra_data_size'
compat_binfmt_elf.c:(.text+0x3eddd): undefined reference to `elf_core_write_extra_phdrs'
compat_binfmt_elf.c:(.text+0x3f004): undefined reference to `elf_core_write_extra_data'

[ hpa: This was sent to me for -next but it is a low risk build fix ]

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Link: http://lkml.kernel.org/r/51C0B614.5000708@infradead.org
Cc: <stable@vger.kernel.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-06-18 16:20:32 -05:00
Yinghai Lu d8d386c106 x86, mtrr: Fix original mtrr range get for mtrr_cleanup
Joshua reported: Commit cd7b304dfaf1 (x86, range: fix missing merge
during add range) broke mtrr cleanup on his setup in 3.9.5.
corresponding commit in upstream is fbe06b7bae.

  *BAD*gran_size: 64K chunk_size: 16M num_reg: 6 lose cover RAM: -0G

https://bugzilla.kernel.org/show_bug.cgi?id=59491

So it rejects new var mtrr layout.

It turns out we have some problem with initial mtrr range retrieval.
The current sequence is:
	x86_get_mtrr_mem_range
		==> bunchs of add_range_with_merge
		==> bunchs of subract_range
		==> clean_sort_range
	add_range_with_merge for [0,1M)
	sort_range()

add_range_with_merge could have blank slots, so we can not just
sort only, that will have final result have extra blank slot in head.

So move that calling add_range_with_merge for [0,1M), with that we
could avoid extra clean_sort_range calling.

Reported-by: Joshua Covington <joshuacov@googlemail.com>
Tested-by: Joshua Covington <joshuacov@googlemail.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1371154622-8929-2-git-send-email-yinghai@kernel.org
Cc: <stable@vger.kernel.org> v3.9
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-06-18 11:32:02 -05:00
Zhanghaoyu (A) 764bcbc5a6 KVM: x86: remove vcpu's CPL check in host-invoked XCR set
__kvm_set_xcr function does the CPL check when set xcr. __kvm_set_xcr is
called in two flows, one is invoked by guest, call stack shown as below,

  handle_xsetbv(or xsetbv_interception)
    kvm_set_xcr
      __kvm_set_xcr

the other one is invoked by host, for example during system reset:

  kvm_arch_vcpu_ioctl
    kvm_vcpu_ioctl_x86_set_xcrs
      __kvm_set_xcr

The former does need the CPL check, but the latter does not.

Cc: stable@vger.kernel.org
Signed-off-by: Zhang Haoyu <haoyu.zhang@huawei.com>
[Tweaks to commit message. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-06-18 09:55:35 +02:00
Greg Kroah-Hartman bb07b00be7 Merge 3.10-rc6 into driver-core-next
We want these fixes here too.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-06-17 16:57:20 -07:00
Steve Capper 53313b2c91 x86: mm: Remove general hugetlb code from x86.
huge_pte_alloc, huge_pte_offset and follow_huge_p[mu]d have
already been copied over to mm.

This patch removes the x86 copies of these functions and activates
the general ones by enabling:
CONFIG_ARCH_WANT_GENERAL_HUGETLB

Signed-off-by: Steve Capper <steve.capper@linaro.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
2013-06-14 09:40:15 +01:00
Steve Capper cfe28c5d63 x86: mm: Remove x86 version of huge_pmd_share.
The huge_pmd_share code has been copied over to mm/hugetlb.c to
make it accessible to other architectures.

Remove the x86 copy of the huge_pmd_share code and enable the
ARCH_WANT_HUGE_PMD_SHARE config flag. That way we reference the
general one.

Signed-off-by: Steve Capper <steve.capper@linaro.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
2013-06-14 09:39:46 +01:00
Linus Torvalds cb03dc094a Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Peter Anvin:
 "Another set of fixes, the biggest bit of this is yet another tweak to
  the UEFI anti-bricking code; apparently we finally got some feedback
  from Samsung as to what makes at least their systems fail.  This set
  should actually fix the boot regressions that some other systems (e.g.
  SGI) have exhibited.

  Other than that, there is a patch to avoid a panic with particularly
  unhappy memory layouts and two minor protocol fixes which may or may
  not be manifest bugs"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86: Fix typo in kexec register clearing
  x86, relocs: Move __vvar_page from S_ABS to S_REL
  Modify UEFI anti-bricking code
  x86: Fix adjust_range_size_mask calling position
2013-06-13 13:08:51 -07:00
H. Peter Anvin 45df901cc8 * More tweaking to the EFI variable anti-bricking algorithm. Quite a
few users were reporting boot regressions in v3.9. This has now been
    fixed with a more accurate "minimum storage requirement to avoid
    bricking" value from Samsung (5K instead of 50%) and code to trigger
    garbage collection when we near our limit - Matthew Garrett.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.13 (GNU/Linux)
 
 iQIcBAABAgAGBQJRtkY2AAoJEC84WcCNIz1VJOsP/00xwiY4VKh2RfqNkYKSl/w5
 gEshIHFEAXHX5X8C4ReocZVywvdjTgbJoKBbBy3FePYRzLddrmavvjen17hk7BzS
 /cO8/eXForkNWCGR1kLagA6HLpgKP5DPayKizoMb4Mg6muzfT1SCcN6Pzh8cDMWe
 btcq/l9JZejXdJ4Wfoq1My+WdXs19OT/BNeD3y65K4x29vNUjop6oaIdDJWLlH/S
 aeLHh8d4xbSHNWzK1fBP7CnFTYU27xxs1BFNAReU6McxeQCYZAIaRovYnjTZEvfJ
 twd2tLrOn9HBVTbWa8T4XGNSr+QcT4XGMadLvdwuqltmKDfH6Onm8aWQM3IqA7gy
 Qimbcv2B7HrITgXWTzp3DPkXF1LA8/8QHSBXVMUU9Rl6QOLy18vIdKiQy3M1Ng9Z
 0q+Ow93JtnL11zf9wLDMdKaKcA9HOxbG/wRTK6XO4vGaWj9brFv3n5Ib7OreHH6D
 GP58zDEnThFuj97K/NKREBZZFcFOMZpKk5MAipVkzltihUQmNeTF/dAtBJ3Ncu/A
 PqQE6uuKVXjASJR8Gy0bI3WHtSTZK4L/sg9c2MF3bdJa9BswN+m8IEbls+S+iFOx
 +sYPQx7Zw6SFENxDw8cDYNzC14yfr60qyOxTWfkHH7l/FnvhOgwHzqPsLcXx0ouR
 C6k1yPYSTgiqFdWC2sjn
 =TZuM
 -----END PGP SIGNATURE-----

Merge tag 'efi-urgent' into x86/urgent

 * More tweaking to the EFI variable anti-bricking algorithm. Quite a
   few users were reporting boot regressions in v3.9. This has now been
   fixed with a more accurate "minimum storage requirement to avoid
   bricking" value from Samsung (5K instead of 50%) and code to trigger
   garbage collection when we near our limit - Matthew Garrett.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-06-13 08:59:23 -07:00
Jussi Kivilinna fe6510b5d6 crypto: aesni_intel - fix accessing of unaligned memory
The new XTS code for aesni_intel uses input buffers directly as memory operands
for pxor instructions, which causes crash if those buffers are not aligned to
16 bytes.

Patch changes XTS code to handle unaligned memory correctly, by loading memory
with movdqu instead.

Reported-by: Dave Jones <davej@redhat.com>
Tested-by: Dave Jones <davej@redhat.com>
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2013-06-13 14:57:42 +08:00
Kees Cook c8a22d19dd x86: Fix typo in kexec register clearing
Fixes a typo in register clearing code. Thanks to PaX Team for fixing
this originally, and James Troup for pointing it out.

Signed-off-by: Kees Cook <keescook@chromium.org>
Link: http://lkml.kernel.org/r/20130605184718.GA8396@www.outflux.net
Cc: <stable@vger.kernel.org> v2.6.30+
Cc: PaX Team <pageexec@freemail.hu>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-06-12 15:16:18 -07:00
Kees Cook b1983b0a75 x86, relocs: Move __vvar_page from S_ABS to S_REL
The __vvar_page relocation should actually be listed in S_REL instead
of S_ABS. Oddly, this didn't always cause things to break, presumably
because there are no users for relocation information on 64 bits yet.

[ hpa: Not for stable - new code in 3.10 ]

Signed-off-by: Kees Cook <keescook@chromium.org>
Link: http://lkml.kernel.org/r/20130611185652.GA23674@www.outflux.net
Reported-by: Michael Davidson <md@google.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-06-12 15:14:57 -07:00
Thomas Gleixner d7880812b3 idle: Add the stack canary init to cpu_startup_entry()
Moving x86 to the generic idle implementation (commit 7d1a9417 "x86:
Use generic idle loop") wreckaged the stack protector.

I stupidly missed that boot_init_stack_canary() must be inlined from a
function which never returns, but I put that call into
arch_cpu_idle_prepare() which of course returns.

I pondered to play tricks with arch_cpu_idle_prepare() first, but then
I noticed, that the other archs which have implemented the
stackprotector (ARM and SH) do not initialize the canary for the
non-boot cpus.

So I decided to move the boot_init_stack_canary() call into
cpu_startup_entry() ifdeffed with an CONFIG_X86 for now. This #ifdef
is just a temporary measure as I don't want to inflict the
boot_init_stack_canary() call on ARM and SH that late in the cycle.

I'll queue a patch for 3.11 which removes the #ifdef if the ARM/SH
maintainers have no objection.

Reported-by: Wouter van Kesteren <woutershep@gmail.com>
Cc: x86@kernel.org
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2013-06-11 22:04:47 +02:00
Zach Bobroff d3768d885c x86, efi: retry ExitBootServices() on failure
ExitBootServices is absolutely supposed to return a failure if any
ExitBootServices event handler changes the memory map.  Basically the
get_map loop should run again if ExitBootServices returns an error the
first time.  I would say it would be fair that if ExitBootServices gives
an error the second time then Linux would be fine in returning control
back to BIOS.

The second change is the following line:

again:
        size += sizeof(*mem_map) * 2;

Originally you were incrementing it by the size of one memory map entry.
The issue here is all related to the low_alloc routine you are using.
In this routine you are making allocations to get the memory map itself.
Doing this allocation or allocations can affect the memory map by more
than one record.

[ mfleming - changelog, code style ]
Signed-off-by: Zach Bobroff <zacharyb@ami.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2013-06-11 07:51:54 +01:00
Borislav Petkov 43ab0476a6 efi: Convert runtime services function ptrs
... to void * like the boot services and lose all the void * casts. No
functionality change.

Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2013-06-11 07:39:26 +01:00
Matthew Garrett f8b8404337 Modify UEFI anti-bricking code
This patch reworks the UEFI anti-bricking code, including an effective
reversion of cc5a080c and 31ff2f20. It turns out that calling
QueryVariableInfo() from boot services results in some firmware
implementations jumping to physical addresses even after entering virtual
mode, so until we have 1:1 mappings for UEFI runtime space this isn't
going to work so well.

Reverting these gets us back to the situation where we'd refuse to create
variables on some systems because they classify deleted variables as "used"
until the firmware triggers a garbage collection run, which they won't do
until they reach a lower threshold. This results in it being impossible to
install a bootloader, which is unhelpful.

Feedback from Samsung indicates that the firmware doesn't need more than
5KB of storage space for its own purposes, so that seems like a reasonable
threshold. However, there's still no guarantee that a platform will attempt
garbage collection merely because it drops below this threshold. It seems
that this is often only triggered if an attempt to write generates a
genuine EFI_OUT_OF_RESOURCES error. We can force that by attempting to
create a variable larger than the remaining space. This should fail, but if
it somehow succeeds we can then immediately delete it.

I've tested this on the UEFI machines I have available, but I don't have
a Samsung and so can't verify that it avoids the bricking problem.

Signed-off-by: Matthew Garrett <matthew.garrett@nebula.com>
Signed-off-by: Lee, Chun-Y <jlee@suse.com> [ dummy variable cleanup ]
Cc: <stable@vger.kernel.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2013-06-10 21:59:37 +01:00
Linus Torvalds 50e6f8511a Bug-fixes for regressions:
- xen/tmem stopped working after a certain combination of modprobe/swapon was used
  - cpu online/offlining would trigger WARN_ON.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.13 (GNU/Linux)
 
 iQEcBAABAgAGBQJRtgQxAAoJEFjIrFwIi8fJ9QYH/ibEGBlKLfGzN4Apx6evBA68
 l9CuLIpGCkPOiJLgVs10zY77Sg3f95bHFkJrwrEIDeTQ42joNKybsIp+qZCIS5CW
 sAnzSd6Mb8jqQYJpgLl03+z3GdFzILTJ9e0zYTETbW2vfLpAETm87XWH+gjxDK0e
 9I0kZX+Q3+2VWi5xv+UwWkYIOLbggLyKYajTHDwWNuC7vQgkJulCAbmjgN/NBv7A
 xacvXdbEClIfZ5tvJJN0RggdEWo6WyTxyExfLPmpXFbHEvWDUX5LPEf8MhFY1ORK
 U1C0BSV6YuLo350G5lY4I0R75ZEWLeUyVkZFnJIeGnUF1rP6OXEuVWTyeivPDBA=
 =Fy36
 -----END PGP SIGNATURE-----

Merge tag 'stable/for-linus-3.10-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen

Pull xen fixes from Konrad Rzeszutek Wilk:
 "Two bug-fixes for regressions:
   - xen/tmem stopped working after a certain combination of
     modprobe/swapon was used
   - cpu online/offlining would trigger WARN_ON."

* tag 'stable/for-linus-3.10-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
  xen/tmem: Don't over-write tmem_frontswap_poolid after tmem_frontswap_init set it.
  xen/smp: Fixup NOHZ per cpu data when onlining an offline CPU.
2013-06-10 13:27:46 -07:00
Linus Torvalds c51aa6db2a PCI update for v3.10:
PCI ROM from EFI
       x86/PCI: Map PCI setup data with ioremap() so it can be in highmem
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJRsMcBAAoJEFmIoMA60/r8EGMP/RiVo0oHoM1DlNfFonw8BjTl
 iClotD3rfQRgjDx8/563/KfO464mLJLbWdjanbpmWB215Xu4LJwv3wfWWEB1f3C+
 hbsz4aFLmoQFnCzg1BIsuNNu88cXymf4A+osfKl+pCxPY+zChkqNZUPBk0m7aW4/
 Y8JZz0NkxDUUb6ewtDhRDci9d5dHAzL/bmfcdUrP5X56znV72eVH555vY33hzNQ+
 opoB9bFOdjxq8EDMlSaGxHmClUH/1VJd8Gr9Q4oj8OUOMq7At/WbHbFYJPbn/09+
 xdh+78oGKZ6vI1rfIwlGSbJaizZJoE8l3EVEeY1rSgxF2zKNRAyN9IZOPaTigsjd
 5zHTwUp+itxstw3fxjeG0R+Gl8guf13XTWzGcR6sC1TTn/NXSxqI3inORs5PWmpP
 ldfPkOW5Y3pcP6UfmJcAn1z9W8toyCouovFPnSoit6ZzS/+aFY2vYfHBQZ8LdJHv
 Gq4sYmcxWTzfgBxybc6OczztXgTQ+grhs3nnP29/QrGDyJ6RA6E6ENKiqARkALCF
 1qFeOdWcoG4HaAlQhMRwuJTsUw7ofJWmYi6AdbHgI5pBQM4lLR1Kp+hGJnmeTBHm
 Svx98dv525HZ2tVUZwsd+ExsEfQ4IlPECVmOpxFkUf4sRILhvCTP5C6KWHBev7sG
 +F4rwomkmlS1hu82xMwl
 =dMNk
 -----END PGP SIGNATURE-----

Merge tag 'pci-v3.10-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull PCI fixes from Bjorn Helgaas:
 "This fixes a crash when booting a 32-bit kernel via the EFI boot stub.

  PCI ROM from EFI
      x86/PCI: Map PCI setup data with ioremap() so it can be in highmem"

* tag 'pci-v3.10-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
  x86/PCI: Map PCI setup data with ioremap() so it can be in highmem
2013-06-06 16:28:15 -07:00
H. Peter Anvin 60e019eb37 x86: Get rid of ->hard_math and all the FPU asm fu
Reimplement FPU detection code in C and drop old, not-so-recommended
detection method in asm. Move all the relevant stuff into i387.c where
it conceptually belongs. Finally drop cpuinfo_x86.hard_math.

[ hpa: huge thanks to Borislav for taking my original concept patch
  and productizing it ]

[ Boris, note to self: do not use static_cpu_has before alternatives! ]

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Link: http://lkml.kernel.org/r/1367244262-29511-2-git-send-email-bp@alien8.de
Link: http://lkml.kernel.org/r/1365436666-9837-2-git-send-email-bp@alien8.de
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-06-06 14:32:04 -07:00
Matthew Garrett 1acba98f81 UEFI: Don't pass boot services regions to SetVirtualAddressMap()
We need to map boot services regions during startup in order to avoid
firmware bugs, but we shouldn't be passing those regions to
SetVirtualAddressMap(). Ensure that we're only passing regions that are
marked as being mapped at runtime.

Signed-off-by: Matthew Garrett <matthew.garrett@nebula.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2013-06-06 14:28:11 +01:00
Jacob Shin cd1c32ca96 x86, microcode, amd: Allow multiple families' bin files appended together
Add support for parsing through multiple families' microcode patch
container binary files appended together when early loading. This is
already supported on Intel.

Reported-by: Henrique de Moraes Holschuh <hmh@hmh.eng.br>
Signed-off-by: Jacob Shin <jacob.shin@amd.com>
Link: http://lkml.kernel.org/r/1370463236-2115-3-git-send-email-jacob.shin@amd.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-06-05 13:56:55 -07:00
Jacob Shin 275bbe2e29 x86, microcode, amd: Make find_ucode_in_initrd() __init
Change find_ucode_in_initrd() to __init and only let BSP call it
during cold boot. This is the right thing to do because only BSP will
see initrd loaded by the boot loader. APs will offset into
initrd_start to find the microcode patch binary.

Reported-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Jacob Shin <jacob.shin@amd.com>
Link: http://lkml.kernel.org/r/1370463236-2115-2-git-send-email-jacob.shin@amd.com
Cc: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-06-05 13:56:47 -07:00
Matt Fleming 65694c5aad x86/PCI: Map PCI setup data with ioremap() so it can be in highmem
f9a37be0f0 ("x86: Use PCI setup data") added support for using PCI ROM
images from setup_data.  This used phys_to_virt(), which is not valid for
highmem addresses, and can cause a crash when booting a 32-bit kernel via
the EFI boot stub.

pcibios_add_device() assumes that the physical addresses stored in
setup_data are accessible via the direct kernel mapping, and that calling
phys_to_virt() is valid.  This isn't guaranteed to be true on x86 where the
direct mapping range is much smaller than on x86-64.

Calling phys_to_virt() on a highmem address results in the following:

 BUG: unable to handle kernel paging request at 39a3c198
 IP: [<c262be0f>] pcibios_add_device+0x2f/0x90
 ...
 Call Trace:
  [<c2370c73>] pci_device_add+0xe3/0x130
  [<c274640b>] pci_scan_single_device+0x8b/0xb0
  [<c2370d08>] pci_scan_slot+0x48/0x100
  [<c2371904>] pci_scan_child_bus+0x24/0xc0
  [<c262a7b0>] pci_acpi_scan_root+0x2c0/0x490
  [<c23b7203>] acpi_pci_root_add+0x312/0x42f
  ...

The solution is to use ioremap() instead of phys_to_virt() to map the
setup data into the kernel address space.

[bhelgaas: changelog]
Tested-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Seth Forshee <seth.forshee@canonical.com>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: stable@vger.kernel.org	# v3.8+
2013-06-05 10:50:04 -06:00
Mathias Krause a90936845d x86, mce: Fix "braodcast" typo
Fix the typo in MCJ_IRQ_BRAODCAST.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
2013-06-05 11:59:17 +02:00
Linus Torvalds 8b35c35955 Merge branch 'fixes' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm bugfixes from Gleb Natapov:
 "The bulk of the fixes is in MIPS KVM kernel<->userspace ABI.  MIPS KVM
  is new for 3.10 and some problems were found with current ABI.  It is
  better to fix them now and do not have a kernel with broken one"

* 'fixes' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: Fix race in apic->pending_events processing
  KVM: fix sil/dil/bpl/spl in the mod/rm fields
  KVM: Emulate multibyte NOP
  ARM: KVM: be more thorough when invalidating TLBs
  ARM: KVM: prevent NULL pointer dereferences with KVM VCPU ioctl
  mips/kvm: Use ENOIOCTLCMD to indicate unimplemented ioctls.
  mips/kvm: Fix ABI by moving manipulation of CP0 registers to KVM_{G,S}ET_ONE_REG
  mips/kvm: Use ARRAY_SIZE() instead of hardcoded constants in kvm_arch_vcpu_ioctl_{s,g}et_regs
  mips/kvm: Fix name of gpr field in struct kvm_regs.
  mips/kvm: Fix ABI for use of 64-bit registers.
  mips/kvm: Fix ABI for use of FPU.
2013-06-05 09:09:35 +09:00
Michael Wang e8747f10ba x86, cleanups: Remove extra tab in __flush_tlb_one()
Remove the extra tab in __flush_tlb_one().

CC: Alex Shi <alex.shi@intel.com>
CC: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Michael Wang <wangyun@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/51AD8902.60603@linux.vnet.ibm.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-06-04 11:51:26 -07:00
Konrad Rzeszutek Wilk 466318a87f xen/smp: Fixup NOHZ per cpu data when onlining an offline CPU.
The xen_play_dead is an undead function. When the vCPU is told to
offline it ends up calling xen_play_dead wherin it calls the
VCPUOP_down hypercall which offlines the vCPU. However, when the
vCPU is onlined back, it resumes execution right after
VCPUOP_down hypercall.

That was OK (albeit the API for play_dead assumes that the CPU
stays dead and never returns) but with commit 4b0c0f294
(tick: Cleanup NOHZ per cpu data on cpu down) that is no longer safe
as said commit resets the ts->inidle which at the start of the
cpu_idle loop was set.

The net effect is that we get this warn:

Broke affinity for irq 16
installing Xen timer for CPU 1
cpu 1 spinlock event irq 48
------------[ cut here ]------------
WARNING: at /home/konrad/linux-linus/kernel/time/tick-sched.c:935 tick_nohz_idle_exit+0x195/0x1b0()
Modules linked in: dm_multipath dm_mod xen_evtchn iscsi_boot_sysfs
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.10.0-rc3upstream-00068-gdcdbe33 #1
Hardware name: BIOSTAR Group N61PB-M2S/N61PB-M2S, BIOS 6.00 PG 09/03/2009
 ffffffff8193b448 ffff880039da5e60 ffffffff816707c8 ffff880039da5ea0
 ffffffff8108ce8b ffff880039da4010 ffff88003fa8e500 ffff880039da4010
 0000000000000001 ffff880039da4000 ffff880039da4010 ffff880039da5eb0
Call Trace:
 [<ffffffff816707c8>] dump_stack+0x19/0x1b
 [<ffffffff8108ce8b>] warn_slowpath_common+0x6b/0xa0
 [<ffffffff8108ced5>] warn_slowpath_null+0x15/0x20
 [<ffffffff810e4745>] tick_nohz_idle_exit+0x195/0x1b0
 [<ffffffff810da755>] cpu_startup_entry+0x205/0x250
 [<ffffffff81661070>] cpu_bringup_and_idle+0x13/0x15
---[ end trace 915c8c486004dda1 ]---

b/c ts_inidle is set to zero. Thomas suggested that we just add a workaround
to call tick_nohz_idle_enter before returning from xen_play_dead() - and
that is what this patch does and fixes the issue.

We also add the stable part b/c git commit 4b0c0f294 is on the stable
tree.

CC: stable@vger.kernel.org
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-06-04 09:03:18 -04:00
Stephen Rothwell 40b313608a Finally eradicate CONFIG_HOTPLUG
Ever since commit 45f035ab9b ("CONFIG_HOTPLUG should be always on"),
it has been basically impossible to build a kernel with CONFIG_HOTPLUG
turned off.  Remove all the remaining references to it.

Cc: Russell King <linux@arm.linux.org.uk>
Cc: Doug Thompson <dougthompson@xmission.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Acked-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-06-03 14:20:18 -07:00
Gleb Natapov 299018f44a KVM: Fix race in apic->pending_events processing
apic->pending_events processing has a race that may cause INIT and
SIPI
processing to be reordered:

vpu0:                            vcpu1:
set INIT
                               test_and_clear_bit(KVM_APIC_INIT)
                                  process INIT
set INIT
set SIPI
                               test_and_clear_bit(KVM_APIC_SIPI)
                                  process SIPI

At the end INIT is left pending in pending_events. The following patch
fixes this by latching pending event before processing them.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-06-03 11:32:39 +03:00
Paolo Bonzini 8acb42070e KVM: fix sil/dil/bpl/spl in the mod/rm fields
The x86-64 extended low-byte registers were fetched correctly from reg,
but not from mod/rm.

This fixes another bug in the boot of RHEL5.9 64-bit, but it is still
not enough.

Cc: <stable@vger.kernel.org> # 3.9
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-06-03 11:27:12 +03:00
Paolo Bonzini 103f98ea64 KVM: Emulate multibyte NOP
This is encountered when booting RHEL5.9 64-bit.  There is another bug
after this one that is not a simple emulation failure, but this one lets
the boot proceed a bit.

Cc: <stable@vger.kernel.org> # 3.9
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-06-03 11:20:53 +03:00
Jacob Shin 6b3389ac21 x86, microcode, amd: Fix warnings and errors on with CONFIG_MICROCODE=m
Fix section mismatch warnings on microcode_amd_early.
Compile error occurs when CONFIG_MICROCODE=m, change so that early
loading depends on microcode_core.

Reported-by: Yinghai Lu <yinghai@kernel.org>
Reported-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Jacob Shin <jacob.shin@amd.com>
Link: http://lkml.kernel.org/r/20130531150241.GA12006@jshin-Toonie
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-05-31 13:56:58 -07:00
Yinghai Lu 7de3d66b13 x86: Fix adjust_range_size_mask calling position
Commit

    8d57470d x86, mm: setup page table in top-down

causes a kernel panic while setting mem=2G.

     [mem 0x00000000-0x000fffff] page 4k
     [mem 0x7fe00000-0x7fffffff] page 1G
     [mem 0x7c000000-0x7fdfffff] page 1G
     [mem 0x00100000-0x001fffff] page 4k
     [mem 0x00200000-0x7bffffff] page 2M

for last entry is not what we want, we should have
     [mem 0x00200000-0x3fffffff] page 2M
     [mem 0x40000000-0x7bffffff] page 1G

Actually we merge the continuous ranges with same page size too early.
in this case, before merging we have
     [mem 0x00200000-0x3fffffff] page 2M
     [mem 0x40000000-0x7bffffff] page 2M
after merging them, will get
     [mem 0x00200000-0x7bffffff] page 2M
even we can use 1G page to map
     [mem 0x40000000-0x7bffffff]

that will cause problem, because we already map
     [mem 0x7fe00000-0x7fffffff] page 1G
     [mem 0x7c000000-0x7fdfffff] page 1G
with 1G page, aka [0x40000000-0x7fffffff] is mapped with 1G page already.
During phys_pud_init() for [0x40000000-0x7bffffff], it will not
reuse existing that pud page, and allocate new one then try to use
2M page to map it instead, as page_size_mask does not include
PG_LEVEL_1G. At end will have [7c000000-0x7fffffff] not mapped, loop
in phys_pmd_init stop mapping at 0x7bffffff.

That is right behavoir, it maps exact range with exact page size that
we ask, and we should explicitly call it to map [7c000000-0x7fffffff]
before or after mapping 0x40000000-0x7bffffff.
Anyway we need to make sure ranges' page_size_mask correct and consistent
after split_mem_range for each range.

Fix that by calling adjust_range_size_mask before merging range
with same page size.

-v2: update change log.
-v3: add more explanation why [7c000000-0x7fffffff] is not mapped, and
    it causes panic.

Bisected-by: "Xie, ChanglongX" <changlongx.xie@intel.com>
Bisected-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reported-and-tested-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1370015587-20835-1-git-send-email-yinghai@kernel.org
Cc: <stable@vger.kernel.org> v3.9
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-05-31 13:21:32 -07:00
Paul Bolle 71c69f7f4b x86/mce: Remove check for CONFIG_X86_MCE_P4THERMAL
The Kconfig symbol X86_MCE_P4THERMAL was removed in v2.6.32.
Remove a useless check for its macro, as it will now always
evaluate to false.

Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Link: http://lkml.kernel.org/r/1369853850.23034.28.camel@x61.thuisdomein
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-05-31 13:12:35 +02:00
Andrew Jones b0bc225d0e sched/x86: Construct all sibling maps if smt
Commit 316ad24830 ("sched/x86: Rewrite
set_cpu_sibling_map()") broke the construction of sibling maps,
which also broke the booted_cores accounting.

Before the rewrite, if smt was present, then each map was
updated for each smt sibling. After the rewrite only
cpu_sibling_mask gets updated, as the llc and core maps depend
on 'has_mc = x86_max_cores > 1' instead. This leads to problems
with topologies like the following

(qemu -smp sockets=2,cores=1,threads=2)

  processor       : 0
  physical id     : 0
  siblings        : 1    <= should be 2
  core id         : 0
  cpu cores       : 1

  processor       : 1
  physical id     : 0
  siblings        : 1    <= should be 2
  core id         : 0
  cpu cores       : 0    <= should be 1

  processor       : 2
  physical id     : 1
  siblings        : 1    <= should be 2
  core id         : 0
  cpu cores       : 1

  processor       : 3
  physical id     : 1
  siblings        : 1    <= should be 2
  core id         : 0
  cpu cores       : 0    <= should be 1

This patch restores the former construction by defining has_mc
as (has_smt || x86_max_cores > 1). This should be fine as there
were no (has_smt && !has_mc) conditions in the context.

Aso rename has_mc to has_mp now that it's not just for cores.

Signed-off-by: Andrew Jones <drjones@redhat.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: a.p.zijlstra@chello.nl
Cc: fenghua.yu@intel.com
Link: http://lkml.kernel.org/r/1369831695-11970-1-git-send-email-drjones@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-05-31 13:10:38 +02:00
Jan Beulich 1d10f6ee60 x86: __force_order doesn't need to be an actual variable
It being static causes over a dozen instances to be scattered
across the kernel image, with non of them ever being referenced
in any way. Making the variable extern without ever defining it
works as well - all we need is to have the compiler think the
variable is being accessed.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Link: http://lkml.kernel.org/r/51A610B802000078000D99A0@nat28.tlf.novell.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-05-31 13:09:17 +02:00
Jan Beulich cbdce7b251 ix86: Don't waste fixmap entries
The vsyscall related pvclock entries can only ever be used on
x86-64, and hence they shouldn't even get allocated for 32-bit
kernels (the more that it is there where address space is
relatively precious).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Link: http://lkml.kernel.org/r/51A60F1F02000078000D997C@nat28.tlf.novell.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-05-31 13:08:18 +02:00
Jacob Shin 757885e94a x86, microcode, amd: Early microcode patch loading support for AMD
Add early microcode patch loading support for AMD.

Signed-off-by: Jacob Shin <jacob.shin@amd.com>
Link: http://lkml.kernel.org/r/1369940959-2077-5-git-send-email-jacob.shin@amd.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
2013-05-30 20:19:25 -07:00
Jacob Shin a76096a657 x86, microcode, amd: Refactor functions to prepare for early loading
In preparation work for early loading, refactor some common functions
that will be shared, and move some struct defines to a common header file.

Signed-off-by: Jacob Shin <jacob.shin@amd.com>
Link: http://lkml.kernel.org/r/1369940959-2077-4-git-send-email-jacob.shin@amd.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
2013-05-30 20:19:25 -07:00
Jacob Shin f2b3ee820a x86, microcode: Vendor abstract out save_microcode_in_initrd()
Currently save_microcode_in_initrd() is declared in vendor neutural
microcode.h file, but defined in vendor specific
microcode_intel_early.c file. Vendor abstract it out to
microcode_core_early.c with a wrapper function.

Signed-off-by: Jacob Shin <jacob.shin@amd.com>
Link: http://lkml.kernel.org/r/1369940959-2077-3-git-send-email-jacob.shin@amd.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
2013-05-30 20:19:25 -07:00
Borislav Petkov 83b325f1b4 x86, microcode, intel: Correct typo in printk
User-visible so correct it.

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1369940959-2077-2-git-send-email-jacob.shin@amd.com
Signed-off-by: Jacob Shin <jacob.shin@amd.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
2013-05-30 20:19:25 -07:00
Linus Torvalds 484b002e28 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Peter Anvin:

 - Three EFI-related fixes

 - Two early memory initialization fixes

 - build fix for older binutils

 - fix for an eager FPU performance regression -- currently we don't
   allow the use of the FPU at interrupt time *at all* in eager mode,
   which is clearly wrong.

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86: Allow FPU to be used at interrupt time even with eagerfpu
  x86, crc32-pclmul: Fix build with older binutils
  x86-64, init: Fix a possible wraparound bug in switchover in head_64.S
  x86, range: fix missing merge during add range
  x86, efi: initial the local variable of DataSize to zero
  efivar: fix oops in efivar_update_sysfs_entries() caused by memory reuse
  efivarfs: Never return ENOENT from firmware again
2013-05-31 09:44:10 +09:00
Pekka Riikonen 5187b28ff0 x86: Allow FPU to be used at interrupt time even with eagerfpu
With the addition of eagerfpu the irq_fpu_usable() now returns false
negatives especially in the case of ksoftirqd and interrupted idle task,
two common cases for FPU use for example in networking/crypto.  With
eagerfpu=off FPU use is possible in those contexts.  This is because of
the eagerfpu check in interrupted_kernel_fpu_idle():

...
  * For now, with eagerfpu we will return interrupted kernel FPU
  * state as not-idle. TBD: Ideally we can change the return value
  * to something like __thread_has_fpu(current). But we need to
  * be careful of doing __thread_clear_has_fpu() before saving
  * the FPU etc for supporting nested uses etc. For now, take
  * the simple route!
...
 	if (use_eager_fpu())
 		return 0;

As eagerfpu is automatically "on" on those CPUs that also have the
features like AES-NI this patch changes the eagerfpu check to return 1 in
case the kernel_fpu_begin() has not been said yet.  Once it has been the
__thread_has_fpu() will start returning 0.

Notice that with eagerfpu the __thread_has_fpu is always true initially.
FPU use is thus always possible no matter what task is under us, unless
the state has already been saved with kernel_fpu_begin().

[ hpa: this is a performance regression, not a correctness regression,
  but since it can be quite serious on CPUs which need encryption at
  interrupt time I am marking this for urgent/stable. ]

Signed-off-by: Pekka Riikonen <priikone@iki.fi>
Link: http://lkml.kernel.org/r/alpine.GSO.2.00.1305131356320.18@git.silcnet.org
Cc: <stable@vger.kernel.org> v3.7+
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-05-30 16:36:42 -07:00
Jan Beulich 2baad6121e x86, crc32-pclmul: Fix build with older binutils
binutils prior to 2.18 (e.g. the ones found on SLE10) don't support
assembling PEXTRD, so a macro based approach like the one for PCLMULQDQ
in the same file should be used.

This requires making the helper macros capable of recognizing 32-bit
general purpose register operands.

[ hpa: tagging for stable as it is a low risk build fix ]

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Link: http://lkml.kernel.org/r/51A6142A02000078000D99D8@nat28.tlf.novell.com
Cc: Alexander Boyko <alexander_boyko@xyratex.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Huang Ying <ying.huang@intel.com>
Cc: <stable@vger.kernel.org> v3.9
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-05-30 16:36:23 -07:00
Linus Torvalds 3655b22de0 Fixes:
- Use proper error paths
  - Clean up APIC IPI usage (incorrect arguments)
  - Delay XenBus frontend resume is backend (xenstored) is not running
  - Fix build error with various combinations of CONFIG_
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.13 (GNU/Linux)
 
 iQEcBAABAgAGBQJRp59pAAoJEFjIrFwIi8fJRIYIAKvRh2Dp/AB44ZN97MW/QhEN
 NUvrSTYr2HlqcUW7bv0ScrMLb0LlFeo+9s/bo0KI2+2F+zK822WPC+2KEZmzQIVs
 q261dNsA3/HoyBDOLwWjatjsSus+njBOEgDIwARPwhkoon4fRXBnRJVMy+0bZC3I
 fpd1nlUy0J7jW0QLO5ueKqd5ZN0Mkwn2H4+D8TOPVYHCnk3mT2W+qLCEJmkMxOuZ
 iFYy95K1ky5r0leUUwCTUIGLmgftoh0Qo/RweXSmzuLiZrY+5ilike3gxQSiAjsM
 lIjq+gKXNJJGz4M6wbOTfDzb/WQnKD+2PqlsbulrTD7E6RD6wIsqG/zvc1RqHqw=
 =9gi8
 -----END PGP SIGNATURE-----

Merge tag 'stable/for-linus-3.10-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen

Pull Xen fixes from Konrad Rzeszutek Wilk:
 - Use proper error paths
 - Clean up APIC IPI usage (incorrect arguments)
 - Delay XenBus frontend resume is backend (xenstored) is not running
 - Fix build error with various combinations of CONFIG_

* tag 'stable/for-linus-3.10-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
  xenbus_client.c: correct exit path for xenbus_map_ring_valloc_hvm
  xen-pciback: more uses of cached MSI-X capability offset
  xen: Clean up apic ipi interface
  xenbus: save xenstore local status for later use
  xenbus: delay xenbus frontend resume if xenstored is not running
  xmem/tmem: fix 'undefined variable' build error.
2013-05-31 06:01:18 +09:00
Dimitri Sivanich 879d5ad0dc x86/UV: Add GRU distributed mode mappings
GRU hardware will support an optional distributed mode that will
allow  per-node address mapping of local GRU space, as opposed
to mapping all GRU hardware to the same contiguous high space.

If GRU distributed mode is selected, setup per-node page table
mappings.

Signed-off-by: Dimitri Sivanich <sivanich@sgi.com>
Cc: Alexander Gordeev <agordeev@redhat.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Russ Anderson <rja@sgi.com>
Cc: Mike Travis <travis@sgi.com>
Link: http://lkml.kernel.org/r/20130529155609.GB22917@sgi.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-05-30 09:46:09 +02:00
Stefan Bader 1db01b4903 xen: Clean up apic ipi interface
Commit f447d56d36 introduced the
implementation of the PV apic ipi interface. But there were some
odd things (it seems none of which cause really any issue but
maybe they should be cleaned up anyway):
 - xen_send_IPI_mask_allbutself (and by that xen_send_IPI_allbutself)
   ignore the passed in vector and only use the CALL_FUNCTION_SINGLE
   vector. While xen_send_IPI_all and xen_send_IPI_mask use the vector.
 - physflat_send_IPI_allbutself is declared unnecessarily. It is never
   used.

This patch tries to clean up those things.

Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-05-29 09:04:21 -04:00
Zhang Yanfei e9d0626ed4 x86-64, init: Fix a possible wraparound bug in switchover in head_64.S
In head_64.S, a switchover has been used to handle kernel crossing
1G, 512G boundaries.

And commit 8170e6bed4
    x86, 64bit: Use a #PF handler to materialize early mappings on demand
said:
    During the switchover in head_64.S, before #PF handler is available,
    we use three pages to handle kernel crossing 1G, 512G boundaries with
    sharing page by playing games with page aliasing: the same page is
    mapped twice in the higher-level tables with appropriate wraparound.

But from the switchover code, when we set up the PUD table:
114         addq    $4096, %rdx
115         movq    %rdi, %rax
116         shrq    $PUD_SHIFT, %rax
117         andl    $(PTRS_PER_PUD-1), %eax
118         movq    %rdx, (4096+0)(%rbx,%rax,8)
119         movq    %rdx, (4096+8)(%rbx,%rax,8)

It seems line 119 has a potential bug there. For example,
if the kernel is loaded at physical address 511G+1008M, that is
    000000000 111111111 111111000 000000000000000000000
and the kernel _end is 512G+2M, that is
    000000001 000000000 000000001 000000000000000000000
So in this example, when using the 2nd page to setup PUD (line 114~119),
rax is 511.
In line 118, we put rdx which is the address of the PMD page (the 3rd page)
into entry 511 of the PUD table. But in line 119, the entry we calculate from
(4096+8)(%rbx,%rax,8) has exceeded the PUD page. IMO, the entry in line
119 should be wraparound into entry 0 of the PUD table.

The patch fixes the bug.

Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Link: http://lkml.kernel.org/r/5191DE5A.3020302@cn.fujitsu.com
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: <stable@vger.kernel.org> v3.9
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-05-28 15:41:59 -07:00
Linus Torvalds 30a9e50143 Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto fixes from Herbert Xu:
 "This push fixes a crash in the new sha256_ssse3 driver as well as a
  DMA setup/teardown bug in caam"

* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: sha256_ssse3 - fix stack corruption with SSSE3 and AVX implementations
  crypto: caam - fix inconsistent assoc dma mapping direction
2013-05-28 10:09:38 -07:00
Borislav Petkov 46ff53874b x86, platform, kvm, kconfig: Turn existing .config's into KVM-capable configs
Add an config file snippet which enables additional options
useful for running the kernel in a kvm guest. When you execute
'make kvmconfig' it merges those options with an already
existing user config before you build the kernel.

Based on an patch from the external lkvm tree.

Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: penberg@kernel.org
Cc: levinsasha928@gmail.com
Cc: mtosatti@redhat.com
Cc: fengguang.wu@intel.com
Link: http://lkml.kernel.org/r/20130522144638.GB15085@pd.tnic
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-05-28 12:11:32 +02:00
Zhang Yanfei 592a9b8cc8 x86/mm: Drop unneeded include <asm/*pgtable, page*_types.h>
arch/x86/boot/compressed/head_64.S includes <asm/pgtable_types.h> and
 <asm/page_types.h> but it doesn't look like it needs them. So remove them.

Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Link: http://lkml.kernel.org/r/5191FAE2.4020403@cn.fujitsu.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-05-28 11:47:23 +02:00
Zhang Yanfei 1e3b308181 x86_64: Correct phys_addr in cleanup_highmap comment
For x86_64, we have phys_base, which means the delta between the
the address kernel is actually running at and the address kernel
is compiled to run at. Not phys_addr so correct it.

Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Link: http://lkml.kernel.org/r/5192F9BF.2000802@cn.fujitsu.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-05-28 11:41:24 +02:00
Michael S. Tsirkin 016be2e55d x86: uaccess s/might_sleep/might_fault/
The only reason uaccess routines might sleep
is if they fault. Make this explicit.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1369577426-26721-9-git-send-email-mst@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-05-28 09:41:10 +02:00
Peter Zijlstra 1b45adcd9a perf/x86/amd: Rework AMD PMU init code
Josh reported that his QEMU is a bad hardware emulator and trips a
WARN in the AMD PMU init code. He requested the WARN be turned into a
pr_err() or similar.

While there, rework the code a little.

Reported-by: Josh Boyer <jwboyer@redhat.com>
Acked-by: Robert Richter <rric@kernel.org>
Acked-by: Jacob Shin <jacob.shin@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20130521110537.GG26912@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-05-28 09:13:55 +02:00
Stephane Eranian 2b923c8f5d perf/x86: Check branch sampling priv level in generic code
This patch moves commit 7cc23cd to the generic code:

 perf/x86/intel/lbr: Demand proper privileges for PERF_SAMPLE_BRANCH_KERNEL

The check is now implemented in generic code instead of x86 specific
code. That way we do not have to repeat the test in each arch
supporting branch sampling.

Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: http://lkml.kernel.org/r/20130521105337.GA2879@quad
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-05-28 09:13:54 +02:00
Dan Carpenter 13acac3075 perf/x86/intel: Prevent some shift wrapping bugs in the Intel uncore driver
We're trying to use 64 bit masks but the shifts wrap so we can't use the
high 32 bits. I've fixed this by changing several types to unsigned
long long.

This is a static checker fix.  The one change which is clearly needed is
"mask = 0xff << (idx * 8);" where the author obviously intended to use
all 64 bits.  The other changes are mostly to silence my static checker.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/20130518183452.GA14587@elgon.mountain
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-05-28 09:13:52 +02:00
Jiri Olsa ddd40da4cc x86/signals: Merge EFLAGS bit clearing into a single statement
Merging EFLAGS bit clearing into a single statement, to
ensure EFLAGS bits are being cleared in a single instruction.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Tested-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Frederic Weisbecker <fweisbec@gmail.com>
Originally-Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: Stephane Eranian <eranian@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1367421944-19082-4-git-send-email-jolsa@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-05-28 08:46:53 +02:00