linux/arch
Ingo Molnar c1dc0b9c0c debug lockups: Improve lockup detection
When debugging a recent lockup bug i found various deficiencies
in how our current lockup detection helpers work:

 - SysRq-L is not very efficient as it uses a workqueue, hence
   it cannot punch through hard lockups and cannot see through
   most soft lockups either.

 - The SysRq-L code depends on the NMI watchdog - which is off
   by default.

 - We dont print backtraces from the RCU code's built-in
   'RCU state machine is stuck' debug code. This debug
   code tends to be one of the first (and only) mechanisms
   that show that a lockup has occured.

This patch changes the code so taht we:

 - Trigger the NMI backtrace code from SysRq-L instead of using
   a workqueue (which cannot punch through hard lockups)

 - Trigger print-all-CPU-backtraces from the RCU lockup detection
   code

Also decouple the backtrace printing code from the NMI watchdog:

 - Dont use variable size cpumasks (it might not be initialized
   and they are a bit more fragile anyway)

 - Trigger an NMI immediately via an IPI, instead of waiting
   for the NMI tick to occur. This is a lot faster and can
   produce more relevant backtraces. It will also work if the
   NMI watchdog is disabled.

 - Dont print the 'dazed and confused' message when we print
   a backtrace from the NMI

 - Do a show_regs() plus a dump_stack() to get maximum info
   out of the dump. Worst-case we get two stacktraces - which
   is not a big deal. Sometimes, if register content is
   corrupted, the precise stack walker in show_regs() wont
   give us a full backtrace - in this case dump_stack() will
   do it.

Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-02 13:27:17 +02:00
..
alpha mm: Pass virtual address to [__]p{te,ud,md}_free_tlb() 2009-07-27 12:10:38 -07:00
arm Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/async_tx 2009-07-30 16:46:31 -07:00
avr32 mm: Pass virtual address to [__]p{te,ud,md}_free_tlb() 2009-07-27 12:10:38 -07:00
blackfin blackfin: fix wrong CTS inversion 2009-07-20 16:38:44 -07:00
cris mm: Pass virtual address to [__]p{te,ud,md}_free_tlb() 2009-07-27 12:10:38 -07:00
frv mm: Pass virtual address to [__]p{te,ud,md}_free_tlb() 2009-07-27 12:10:38 -07:00
h8300 sched: INIT_PREEMPT_COUNT 2009-07-10 14:24:05 -07:00
ia64 mm: Pass virtual address to [__]p{te,ud,md}_free_tlb() 2009-07-27 12:10:38 -07:00
m32r mm: Pass virtual address to [__]p{te,ud,md}_free_tlb() 2009-07-27 12:10:38 -07:00
m68k mm: Pass virtual address to [__]p{te,ud,md}_free_tlb() 2009-07-27 12:10:38 -07:00
m68knommu Remove multiple KERN_ prefixes from printk formats 2009-07-08 10:30:03 -07:00
microblaze Merge branch 'fixes-for-linus' of git://git.monstr.eu/linux-2.6-microblaze 2009-07-27 12:18:27 -07:00
mips mm: Remove duplicate definitions in MIPS and SH 2009-07-27 17:26:44 -07:00
mn10300 mm: Pass virtual address to [__]p{te,ud,md}_free_tlb() 2009-07-27 12:10:38 -07:00
parisc mm: Pass virtual address to [__]p{te,ud,md}_free_tlb() 2009-07-27 12:10:38 -07:00
powerpc powerpc: Update defconfigs for embedded 6xx/7xxx, 8xx, 8{3,5,6}xxx 2009-07-29 23:34:01 -05:00
s390 Merge branch 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6 2009-07-27 12:16:38 -07:00
sh mm: Remove duplicate definitions in MIPS and SH 2009-07-27 17:26:44 -07:00
sparc mm: Pass virtual address to [__]p{te,ud,md}_free_tlb() 2009-07-27 12:10:38 -07:00
um mm: Pass virtual address to [__]p{te,ud,md}_free_tlb() 2009-07-27 12:10:38 -07:00
x86 debug lockups: Improve lockup detection 2009-08-02 13:27:17 +02:00
xtensa mm: Pass virtual address to [__]p{te,ud,md}_free_tlb() 2009-07-27 12:10:38 -07:00
.gitignore
Kconfig