Commit Graph

276650 Commits

Author SHA1 Message Date
Paul E. McKenney 433cdddcd9 rcu: Add tracing for RCU_FAST_NO_HZ
This commit adds trace_rcu_prep_idle(), which is invoked from
rcu_prepare_for_idle() and rcu_wake_cpu() to trace attempts on
the part of RCU to force CPUs into dyntick-idle mode.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2011-12-11 10:32:00 -08:00
Paul E. McKenney 045fb9315a rcu: Update trace_rcu_dyntick() header comment
This commit updates the trace_rcu_dyntick() header comment to reflect
events added by commit 4b4f421.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2011-12-11 10:31:59 -08:00
Paul E. McKenney 182dd4b277 doc: Add load/store guarantees to Documentation/atomic-ops.txt
An IRC discussion uncovered many conflicting opinions on what types
of data may be atomically loaded and stored.  This commit therefore
calls this out the official set: pointers, longs, ints, and chars (but
not shorts).  This commit also gives some examples of compiler mischief
that can thwart atomicity.

Please note that this discussion is relevant to !SMP kernels if
CONFIG_PREEMPT=y: preemption can cause almost as much trouble as can SMP.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
Cc: Mike Frysinger <vapier@gentoo.org>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: Jes Sorensen <jes@sgi.com>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Helge Deller <deller@gmx.de>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Chen Liqin <liqin.chen@sunplusct.com>
Cc: Lennox Wu <lennox.wu@gmail.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Chris Zankel <chris@zankel.net>
2011-12-11 10:31:58 -08:00
Frederic Weisbecker 1268fbc746 nohz: Remove tick_nohz_idle_enter_norcu() / tick_nohz_idle_exit_norcu()
Those two APIs were provided to optimize the calls of
tick_nohz_idle_enter() and rcu_idle_enter() into a single
irq disabled section. This way no interrupt happening in-between would
needlessly process any RCU job.

Now we are talking about an optimization for which benefits
have yet to be measured. Let's start simple and completely decouple
idle rcu and dyntick idle logics to simplify.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2011-12-11 10:31:57 -08:00
Paul E. McKenney b58bdccaa8 rcu: Add rcutorture CPU-hotplug capability
Running CPU-hotplug operations concurrently with rcutorture has
historically been a good way to find bugs in both RCU and CPU hotplug.
This commit therefore adds an rcutorture module parameter called
"onoff_interval" that causes a randomly selected CPU-hotplug operation to
be executed at the specified interval, in seconds.  The default value of
"onoff_interval" is zero, which disables rcutorture-instigated CPU-hotplug
operations.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2011-12-11 10:31:56 -08:00
Paul E. McKenney a95f8817f8 tile: Make tile use the new is_idle_task() API
Change from direct comparison of ->pid with zero to is_idle_task().

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Acked-by: Chris Metcalf <cmetcalf@tilera.com>
2011-12-11 10:31:55 -08:00
Paul E. McKenney 77aeeebd7b events: Make events use the new is_idle_task() API
Change from direct comparison of ->pid with zero to is_idle_task().

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2011-12-11 10:31:54 -08:00
Paul E. McKenney 7fc20c5cbd kdb: Make KDB use the new is_idle_task() API
Change from direct comparison of ->pid with zero to is_idle_task().

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Jason Wessel <jason.wessel@windriver.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2011-12-11 10:31:52 -08:00
Paul E. McKenney 29f043a2ca sparc: Make SPARC use the new is_idle_task() API
Change from direct comparison of ->pid with zero to is_idle_task().

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: David S. Miller <davem@davemloft.net>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2011-12-11 10:31:49 -08:00
Paul E. McKenney 99745b6a83 rcu: Make RCU use the new is_idle_task() API
Change from direct comparison of ->pid with zero to is_idle_task().

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2011-12-11 10:31:48 -08:00
Paul E. McKenney c4f3060843 sched: Add is_idle_task() to handle invalidated uses of idle_cpu()
Commit 908a3283 (Fix idle_cpu()) invalidated some uses of idle_cpu(),
which used to say whether or not the CPU was running the idle task,
but now instead says whether or not the CPU is running the idle task
in the absence of pending wakeups.  Although this new implementation
gives a better answer to the question "is this CPU idle?", it also
invalidates other uses that were made of idle_cpu().

This commit therefore introduces a new is_idle_task() API member
that determines whether or not the specified task is one of the
idle tasks, allowing open-coded "->pid == 0" sequences to be replaced
by something more meaningful.

Suggested-by: Josh Triplett <josh@joshtriplett.org>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2011-12-11 10:31:47 -08:00
Paul E. McKenney bb3bf7052d rcu: Control rcutorture startup from kernel boot parameters
Currently, if rcutorture is built into the kernel, it must be manually
started or started from an init script.  This is inconvenient for
automated KVM testing, where it is good to be able to fully control
rcutorture execution from the kernel parameters.  This patch therefore
adds a module parameter named "rcutorture_runnable" that defaults
to zero ("don't start automatically"), but which can be set to one
to cause rcutorture to start up immediately during boot.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2011-12-11 10:31:47 -08:00
Paul E. McKenney d5f546d834 rcu: Add rcutorture system-shutdown capability
Although it is easy to run rcutorture tests under KVM, there is currently
no nice way to run such a test for a fixed time period, collect all of
the rcutorture data, and then shut the system down cleanly.  This commit
therefore adds an rcutorture module parameter named "shutdown_secs" that
specified the run duration in seconds, after which rcutorture terminates
the test and powers the system down.  The default value for "shutdown_secs"
is zero, which disables shutdown.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2011-12-11 10:31:46 -08:00
Paul E. McKenney b807fbff31 rcu: Permit RCU_FAST_NO_HZ to be used by TREE_PREEMPT_RCU
The new implementation of RCU_FAST_NO_HZ is compatible with preemptible
RCU, so this commit removes the Kconfig restriction that previously
prohibited this.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2011-12-11 10:31:45 -08:00
Paul E. McKenney 11dbaa8cb7 rcu: Fix idle-task checks
RCU has traditionally relied on idle_cpu() to determine whether a given
CPU is running in the context of an idle task, but commit 908a3283
(Fix idle_cpu()) has invalidated this approach.  After commit 908a3283,
idle_cpu() will return true if the current CPU is currently running the
idle task, and will be doing so for the foreseeable future.  RCU instead
needs to know whether or not the current CPU is currently running the
idle task, regardless of what the near future might bring.

This commit therefore switches from idle_cpu() to "current->pid != 0".

Reported-by: Wu Fengguang <fengguang.wu@intel.com>
Suggested-by: Carsten Emde <C.Emde@osadl.org>
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Tested-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2011-12-11 10:31:44 -08:00
Paul E. McKenney aea1b35e29 rcu: Allow dyntick-idle mode for CPUs with callbacks
Currently, RCU does not permit a CPU to enter dyntick-idle mode if that
CPU has any RCU callbacks queued.  This means that workloads for which
each CPU wakes up and does some RCU updates every few ticks will never
enter dyntick-idle mode.  This can result in significant unnecessary power
consumption, so this patch permits a given to enter dyntick-idle mode if
it has callbacks, but only if that same CPU has completed all current
work for the RCU core.  We determine use rcu_pending() to determine
whether a given CPU has completed all current work for the RCU core.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2011-12-11 10:31:43 -08:00
Paul E. McKenney 0989cb4678 rcu: Add more information to the wrong-idle-task complaint
The current code just complains if the current task is not the idle task.
This commit therefore adds printing of the identity of the idle task.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2011-12-11 10:31:42 -08:00
Paul E. McKenney 4145fa7fbe rcu: Deconfuse dynticks entry-exit tracing
The trace_rcu_dyntick() trace event did not print both the old and
the new value of the nesting level, and furthermore printed only
the low-order 32 bits of it.  This could result in some confusion
when interpreting trace-event dumps, so this commit prints both
the old and the new value, prints the full 64 bits, and also selects
the process-entry/exit increment to print nicely in hexadecimal.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2011-12-11 10:31:42 -08:00
Paul E. McKenney 9ceae0e248 rcu: Add documentation for raw SRCU read-side primitives
Update various files in Documentation/RCU to reflect srcu_read_lock_raw()
and srcu_read_unlock_raw().  Credit to Peter Zijlstra for suggesting
use of the existing _raw suffix instead of the earlier bulkref names.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2011-12-11 10:31:41 -08:00
Paul E. McKenney 0c53dd8b31 rcu: Introduce raw SRCU read-side primitives
The RCU implementations, including SRCU, are designed to be used in a
lock-like fashion, so that the read-side lock and unlock primitives must
execute in the same context for any given read-side critical section.
This constraint is enforced by lockdep-RCU.  However, there is a need
to enter an SRCU read-side critical section within the context of an
exception and then exit in the context of the task that encountered the
exception.  The cost of this capability is that the read-side operations
incur the overhead of disabling interrupts.

Note that although the current implementation allows a given read-side
critical section to be entered by one task and then exited by another, all
known possible implementations that allow this have scalability problems.
Therefore, a given read-side critical section must be exited by the same
task that entered it, though perhaps from an interrupt or exception
handler running within that task's context.  But if you are thinking
in terms of interrupt handlers, make sure that you have considered the
possibility of threaded interrupt handlers.

Credit goes to Peter Zijlstra for suggesting use of the existing _raw
suffix to indicate disabling lockdep over the earlier "bulkref" names.

Requested-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2011-12-11 10:31:40 -08:00
Paul E. McKenney a7b152d534 powerpc: Tell RCU about idle after hcall tracing
The PowerPC pSeries platform (CONFIG_PPC_PSERIES=y) enables
hypervisor-call tracing for CONFIG_TRACEPOINTS=y kernels.  One of the
hypervisor calls that is traced is the H_CEDE call in the idle loop
that tells the hypervisor that this OS instance no longer needs the
current CPU.  However, tracing uses RCU, so this combination of kernel
configuration variables needs to avoid telling RCU about the current CPU's
idleness until after the H_CEDE-entry tracing completes on the one hand,
and must tell RCU that the the current CPU is no longer idle before the
H_CEDE-exit tracing starts.

In all other cases, it suffices to inform RCU of CPU idleness upon
idle-loop entry and exit.

This commit makes the required adjustments.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2011-12-11 10:31:39 -08:00
Frederic Weisbecker 416eb33cd6 rcu: Fix early call to rcu_idle_enter()
On the irq exit path, tick_nohz_irq_exit()
may raise a softirq, which action leads to the wake up
path and select_task_rq_fair() that makes use of rcu
to iterate the domains.

This is an illegal use of RCU because we may be in RCU
extended quiescent state if we interrupted an RCU-idle
window in the idle loop:

[  132.978883] ===============================
[  132.978883] [ INFO: suspicious RCU usage. ]
[  132.978883] -------------------------------
[  132.978883] kernel/sched_fair.c:1707 suspicious rcu_dereference_check() usage!
[  132.978883]
[  132.978883] other info that might help us debug this:
[  132.978883]
[  132.978883]
[  132.978883] rcu_scheduler_active = 1, debug_locks = 0
[  132.978883] RCU used illegally from extended quiescent state!
[  132.978883] 2 locks held by swapper/0:
[  132.978883]  #0:  (&p->pi_lock){-.-.-.}, at: [<ffffffff8105a729>] try_to_wake_up+0x39/0x2f0
[  132.978883]  #1:  (rcu_read_lock){.+.+..}, at: [<ffffffff8105556a>] select_task_rq_fair+0x6a/0xec0
[  132.978883]
[  132.978883] stack backtrace:
[  132.978883] Pid: 0, comm: swapper Tainted: G        W   3.0.0+ #178
[  132.978883] Call Trace:
[  132.978883]  <IRQ>  [<ffffffff810a01f6>] lockdep_rcu_suspicious+0xe6/0x100
[  132.978883]  [<ffffffff81055c49>] select_task_rq_fair+0x749/0xec0
[  132.978883]  [<ffffffff8105556a>] ? select_task_rq_fair+0x6a/0xec0
[  132.978883]  [<ffffffff812fe494>] ? do_raw_spin_lock+0x54/0x150
[  132.978883]  [<ffffffff810a1f2d>] ? trace_hardirqs_on+0xd/0x10
[  132.978883]  [<ffffffff8105a7c3>] try_to_wake_up+0xd3/0x2f0
[  132.978883]  [<ffffffff81094f98>] ? ktime_get+0x68/0xf0
[  132.978883]  [<ffffffff8105aa35>] wake_up_process+0x15/0x20
[  132.978883]  [<ffffffff81069dd5>] raise_softirq_irqoff+0x65/0x110
[  132.978883]  [<ffffffff8108eb65>] __hrtimer_start_range_ns+0x415/0x5a0
[  132.978883]  [<ffffffff812fe3ee>] ? do_raw_spin_unlock+0x5e/0xb0
[  132.978883]  [<ffffffff8108ed08>] hrtimer_start+0x18/0x20
[  132.978883]  [<ffffffff8109c9c3>] tick_nohz_stop_sched_tick+0x393/0x450
[  132.978883]  [<ffffffff810694f2>] irq_exit+0xd2/0x100
[  132.978883]  [<ffffffff81829e96>] do_IRQ+0x66/0xe0
[  132.978883]  [<ffffffff81820d53>] common_interrupt+0x13/0x13
[  132.978883]  <EOI>  [<ffffffff8103434b>] ? native_safe_halt+0xb/0x10
[  132.978883]  [<ffffffff810a1f2d>] ? trace_hardirqs_on+0xd/0x10
[  132.978883]  [<ffffffff810144ea>] default_idle+0xba/0x370
[  132.978883]  [<ffffffff810147fe>] amd_e400_idle+0x5e/0x130
[  132.978883]  [<ffffffff8100a9f6>] cpu_idle+0xb6/0x120
[  132.978883]  [<ffffffff817f217f>] rest_init+0xef/0x150
[  132.978883]  [<ffffffff817f20e2>] ? rest_init+0x52/0x150
[  132.978883]  [<ffffffff81ed9cf3>] start_kernel+0x3da/0x3e5
[  132.978883]  [<ffffffff81ed9346>] x86_64_start_reservations+0x131/0x135
[  132.978883]  [<ffffffff81ed944d>] x86_64_start_kernel+0x103/0x112

Fix this by calling rcu_idle_enter() after tick_nohz_irq_exit().

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2011-12-11 10:31:38 -08:00
Frederic Weisbecker 98ad1cc14a x86: Call idle notifier after irq_enter()
Interrupts notify the idle exit state before calling irq_enter().
But the notifier code calls rcu_read_lock() and this is not
allowed while rcu is in an extended quiescent state. We need
to wait for irq_enter() -> rcu_idle_exit() to be called before
doing so otherwise this results in a grumpy RCU:

[    0.099991] WARNING: at include/linux/rcupdate.h:194 __atomic_notifier_call_chain+0xd2/0x110()
[    0.099991] Hardware name: AMD690VM-FMH
[    0.099991] Modules linked in:
[    0.099991] Pid: 0, comm: swapper Not tainted 3.0.0-rc6+ #255
[    0.099991] Call Trace:
[    0.099991]  <IRQ>  [<ffffffff81051c8a>] warn_slowpath_common+0x7a/0xb0
[    0.099991]  [<ffffffff81051cd5>] warn_slowpath_null+0x15/0x20
[    0.099991]  [<ffffffff817d6fa2>] __atomic_notifier_call_chain+0xd2/0x110
[    0.099991]  [<ffffffff817d6ff1>] atomic_notifier_call_chain+0x11/0x20
[    0.099991]  [<ffffffff81001873>] exit_idle+0x43/0x50
[    0.099991]  [<ffffffff81020439>] smp_apic_timer_interrupt+0x39/0xa0
[    0.099991]  [<ffffffff817da253>] apic_timer_interrupt+0x13/0x20
[    0.099991]  <EOI>  [<ffffffff8100ae67>] ? default_idle+0xa7/0x350
[    0.099991]  [<ffffffff8100ae65>] ? default_idle+0xa5/0x350
[    0.099991]  [<ffffffff8100b19b>] amd_e400_idle+0x8b/0x110
[    0.099991]  [<ffffffff810cb01f>] ? rcu_enter_nohz+0x8f/0x160
[    0.099991]  [<ffffffff810019a0>] cpu_idle+0xb0/0x110
[    0.099991]  [<ffffffff817a7505>] rest_init+0xe5/0x140
[    0.099991]  [<ffffffff817a7468>] ? rest_init+0x48/0x140
[    0.099991]  [<ffffffff81cc5ca3>] start_kernel+0x3d1/0x3dc
[    0.099991]  [<ffffffff81cc5321>] x86_64_start_reservations+0x131/0x135
[    0.099991]  [<ffffffff81cc5412>] x86_64_start_kernel+0xed/0xf4

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Andy Henroid <andrew.d.henroid@intel.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2011-12-11 10:31:38 -08:00
Frederic Weisbecker e37e112de3 x86: Enter rcu extended qs after idle notifier call
The idle notifier, called by enter_idle(), enters into rcu read
side critical section but at that time we already switched into
the RCU-idle window (rcu_idle_enter() has been called). And it's
illegal to use rcu_read_lock() in that state.

This results in rcu reporting its bad mood:

[    1.275635] WARNING: at include/linux/rcupdate.h:194 __atomic_notifier_call_chain+0xd2/0x110()
[    1.275635] Hardware name: AMD690VM-FMH
[    1.275635] Modules linked in:
[    1.275635] Pid: 0, comm: swapper Not tainted 3.0.0-rc6+ #252
[    1.275635] Call Trace:
[    1.275635]  [<ffffffff81051c8a>] warn_slowpath_common+0x7a/0xb0
[    1.275635]  [<ffffffff81051cd5>] warn_slowpath_null+0x15/0x20
[    1.275635]  [<ffffffff817d6f22>] __atomic_notifier_call_chain+0xd2/0x110
[    1.275635]  [<ffffffff817d6f71>] atomic_notifier_call_chain+0x11/0x20
[    1.275635]  [<ffffffff810018a0>] enter_idle+0x20/0x30
[    1.275635]  [<ffffffff81001995>] cpu_idle+0xa5/0x110
[    1.275635]  [<ffffffff817a7465>] rest_init+0xe5/0x140
[    1.275635]  [<ffffffff817a73c8>] ? rest_init+0x48/0x140
[    1.275635]  [<ffffffff81cc5ca3>] start_kernel+0x3d1/0x3dc
[    1.275635]  [<ffffffff81cc5321>] x86_64_start_reservations+0x131/0x135
[    1.275635]  [<ffffffff81cc5412>] x86_64_start_kernel+0xed/0xf4
[    1.275635] ---[ end trace a22d306b065d4a66 ]---

Fix this by entering rcu extended quiescent state later, just before
the CPU goes to sleep.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2011-12-11 10:31:37 -08:00
Frederic Weisbecker 2bbb6817c0 nohz: Allow rcu extended quiescent state handling seperately from tick stop
It is assumed that rcu won't be used once we switch to tickless
mode and until we restart the tick. However this is not always
true, as in x86-64 where we dereference the idle notifiers after
the tick is stopped.

To prepare for fixing this, add two new APIs:
tick_nohz_idle_enter_norcu() and tick_nohz_idle_exit_norcu().

If no use of RCU is made in the idle loop between
tick_nohz_enter_idle() and tick_nohz_exit_idle() calls, the arch
must instead call the new *_norcu() version such that the arch doesn't
need to call rcu_idle_enter() and rcu_idle_exit().

Otherwise the arch must call tick_nohz_enter_idle() and
tick_nohz_exit_idle() and also call explicitly:

- rcu_idle_enter() after its last use of RCU before the CPU is put
to sleep.
- rcu_idle_exit() before the first use of RCU after the CPU is woken
up.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mike Frysinger <vapier@gentoo.org>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: David Miller <davem@davemloft.net>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Hans-Christian Egtvedt <hans-christian.egtvedt@atmel.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2011-12-11 10:31:36 -08:00
Frederic Weisbecker 280f06774a nohz: Separate out irq exit and idle loop dyntick logic
The tick_nohz_stop_sched_tick() function, which tries to delay
the next timer tick as long as possible, can be called from two
places:

- From the idle loop to start the dytick idle mode
- From interrupt exit if we have interrupted the dyntick
idle mode, so that we reprogram the next tick event in
case the irq changed some internal state that requires this
action.

There are only few minor differences between both that
are handled by that function, driven by the ts->inidle
cpu variable and the inidle parameter. The whole guarantees
that we only update the dyntick mode on irq exit if we actually
interrupted the dyntick idle mode, and that we enter in RCU extended
quiescent state from idle loop entry only.

Split this function into:

- tick_nohz_idle_enter(), which sets ts->inidle to 1, enters
dynticks idle mode unconditionally if it can, and enters into RCU
extended quiescent state.

- tick_nohz_irq_exit() which only updates the dynticks idle mode
when ts->inidle is set (ie: if tick_nohz_idle_enter() has been called).

To maintain symmetry, tick_nohz_restart_sched_tick() has been renamed
into tick_nohz_idle_exit().

This simplifies the code and micro-optimize the irq exit path (no need
for local_irq_save there). This also prepares for the split between
dynticks and rcu extended quiescent state logics. We'll need this split to
further fix illegal uses of RCU in extended quiescent states in the idle
loop.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mike Frysinger <vapier@gentoo.org>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: David Miller <davem@davemloft.net>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Hans-Christian Egtvedt <hans-christian.egtvedt@atmel.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2011-12-11 10:31:35 -08:00
Paul E. McKenney 867f236bd1 rcu: Make srcu_read_lock_held() call common lockdep-enabled function
A common debug_lockdep_rcu_enabled() function is used to check whether
RCU lockdep splats should be reported, but srcu_read_lock() does not
use it.  This commit therefore brings srcu_read_lock_held() up to date.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2011-12-11 10:31:34 -08:00
Paul E. McKenney ff195cb69b rcu: Warn when srcu_read_lock() is used in an extended quiescent state
Catch SRCU up to the other variants of RCU by making PROVE_RCU
complain if either srcu_read_lock() or srcu_read_lock_held() are
used from within RCU-idle mode.

Frederic reworked this to allow for the new versions of his patches
that check for extended quiescent states.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2011-12-11 10:31:33 -08:00
Paul E. McKenney d8ab29f8be rcu: Remove one layer of abstraction from PROVE_RCU checking
Simplify things a bit by substituting the definitions of the single-line
rcu_read_acquire(), rcu_read_release(), rcu_read_acquire_bh(),
rcu_read_release_bh(), rcu_read_acquire_sched(), and
rcu_read_release_sched() functions at their call points.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2011-12-11 10:31:32 -08:00
Frederic Weisbecker 00f49e5729 rcu: Warn when rcu_read_lock() is used in extended quiescent state
We are currently able to detect uses of rcu_dereference_check() inside
extended quiescent states (such as the RCU-free window in idle).
But rcu_read_lock() and friends can be used without rcu_dereference(),
so that the earlier commit checking for use of rcu_dereference() and
friends while in RCU idle mode miss some error conditions.  This commit
therefore adds extended quiescent state checking to rcu_read_lock() and
friends.

Uses of RCU from within RCU-idle mode are totally ignored by
RCU, hence the importance of these checks.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2011-12-11 10:31:31 -08:00
Frederic Weisbecker 0464e93748 rcu: Inform the user about extended quiescent state on PROVE_RCU warning
Inform the user if an RCU usage error is detected by lockdep while in
an extended quiescent state (in this case, the RCU-free window in idle).
This is accomplished by adding a line to the RCU lockdep splat indicating
whether or not the splat occurred in extended quiescent state.

Uses of RCU from within extended quiescent state mode are totally ignored
by RCU, hence the importance of this diagnostic.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2011-12-11 10:31:30 -08:00
Frederic Weisbecker e6b80a3b09 rcu: Detect illegal rcu dereference in extended quiescent state
Report that none of the rcu read lock maps are held while in an RCU
extended quiescent state (the section between rcu_idle_enter()
and rcu_idle_exit()). This helps detect any use of rcu_dereference()
and friends from within the section in idle where RCU is not allowed.

This way we can guarantee an extended quiescent window where the CPU
can be put in dyntick idle mode or can simply aoid to be part of any
global grace period completion while in the idle loop.

Uses of RCU from such mode are totally ignored by RCU, hence the
importance of these checks.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2011-12-11 10:31:30 -08:00
Thomas Gleixner a0f8eefb12 rcu: Remove redundant return from rcu_report_exp_rnp()
Empty void functions do not need "return", so this commit removes it
from rcu_report_exp_rnp().

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2011-12-11 10:31:29 -08:00
Thomas Gleixner b40d293eb3 rcu: Omit self-awaken when setting up expedited grace period
When setting up an expedited grace period, if there were no readers, the
task will awaken itself.  This commit removes this useless self-awakening.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2011-12-11 10:31:28 -08:00
Paul E. McKenney 34240697d6 rcu: Disable preemption in rcu_is_cpu_idle()
Because rcu_is_cpu_idle() is to be used to check for extended quiescent
states in RCU-preempt read-side critical sections, it cannot assume that
preemption is disabled.  And preemption must be disabled when accessing
the dyntick-idle state, because otherwise the following sequence of events
could occur:

1.	Task A on CPU 1 enters rcu_is_cpu_idle() and picks up the pointer
	to CPU 1's per-CPU variables.

2.	Task B preempts Task A and starts running on CPU 1.

3.	Task A migrates to CPU 2.

4.	Task B blocks, leaving CPU 1 idle.

5.	Task A continues execution on CPU 2, accessing CPU 1's dyntick-idle
	information using the pointer fetched in step 1 above, and finds
	that CPU 1 is idle.

6.	Task A therefore incorrectly concludes that it is executing in
	an extended quiescent state, possibly issuing a spurious splat.

Therefore, this commit disables preemption within the rcu_is_cpu_idle()
function.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2011-12-11 10:31:27 -08:00
Paul E. McKenney 2c01531f08 rcu: Document failing tick as cause of RCU CPU stall warning
One of lclaudio's systems was seeing RCU CPU stall warnings from idle.
These turned out to be caused by a bug that stopped scheduling-clock
tick interrupts from being sent to a given CPU for several hundred seconds.
This commit therefore updates the documentation to call this out as a
possible cause for RCU CPU stall warnings.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2011-12-11 10:31:26 -08:00
Paul E. McKenney 91afaf3002 rcu: Add failure tracing to rcutorture
Trace the rcutorture RCU accesses and dump the trace buffer when the
first failure is detected.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2011-12-11 10:31:26 -08:00
Paul E. McKenney a8eecf2248 trace: Allow ftrace_dump() to be called from modules
Add an EXPORT_SYMBOL_GPL() so that rcutorture can dump the trace buffer
upon detection of an RCU error.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2011-12-11 10:31:25 -08:00
Paul E. McKenney 9b2e4f1880 rcu: Track idleness independent of idle tasks
Earlier versions of RCU used the scheduling-clock tick to detect idleness
by checking for the idle task, but handled idleness differently for
CONFIG_NO_HZ=y.  But there are now a number of uses of RCU read-side
critical sections in the idle task, for example, for tracing.  A more
fine-grained detection of idleness is therefore required.

This commit presses the old dyntick-idle code into full-time service,
so that rcu_idle_enter(), previously known as rcu_enter_nohz(), is
always invoked at the beginning of an idle loop iteration.  Similarly,
rcu_idle_exit(), previously known as rcu_exit_nohz(), is always invoked
at the end of an idle-loop iteration.  This allows the idle task to
use RCU everywhere except between consecutive rcu_idle_enter() and
rcu_idle_exit() calls, in turn allowing architecture maintainers to
specify exactly where in the idle loop that RCU may be used.

Because some of the userspace upcall uses can result in what looks
to RCU like half of an interrupt, it is not possible to expect that
the irq_enter() and irq_exit() hooks will give exact counts.  This
patch therefore expands the ->dynticks_nesting counter to 64 bits
and uses two separate bitfields to count process/idle transitions
and interrupt entry/exit transitions.  It is presumed that userspace
upcalls do not happen in the idle loop or from usermode execution
(though usermode might do a system call that results in an upcall).
The counter is hard-reset on each process/idle transition, which
avoids the interrupt entry/exit error from accumulating.  Overflow
is avoided by the 64-bitness of the ->dyntick_nesting counter.

This commit also adds warnings if a non-idle task asks RCU to enter
idle state (and these checks will need some adjustment before applying
Frederic's OS-jitter patches (http://lkml.org/lkml/2011/10/7/246).
In addition, validation of ->dynticks and ->dynticks_nesting is added.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2011-12-11 10:31:24 -08:00
Paul E. McKenney b804cb9e91 lockdep: Update documentation for lock-class leak detection
There are a number of bugs that can leak or overuse lock classes,
which can cause the maximum number of lock classes (currently 8191)
to be exceeded.  However, the documentation does not tell you how to
track down these problems.  This commit addresses this shortcoming.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2011-12-11 10:31:23 -08:00
Paul E. McKenney 7077714ec4 rcu: Make synchronize_sched_expedited() better at work sharing
When synchronize_sched_expedited() takes its second and subsequent
snapshots of sync_sched_expedited_started, it subtracts 1.  This
means that the concurrent caller of synchronize_sched_expedited()
that incremented to that value sees our successful completion, it
will not be able to take advantage of it.  This restriction is
pointless, given that our full expedited grace period would have
happened after the other guy started, and thus should be able to
serve as a proxy for the other guy successfully executing
try_stop_cpus().

This commit therefore removes the subtraction of 1.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2011-12-11 10:31:22 -08:00
Paul E. McKenney 389abd48ef rcu: Avoid RCU-preempt expedited grace-period botch
Because rcu_read_unlock_special() samples rcu_preempted_readers_exp(rnp)
after dropping rnp->lock, the following sequence of events is possible:

1.	Task A exits its RCU read-side critical section, and removes
	itself from the ->blkd_tasks list, releases rnp->lock, and is
	then preempted.  Task B remains on the ->blkd_tasks list, and
	blocks the current expedited grace period.

2.	Task B exits from its RCU read-side critical section and removes
	itself from the ->blkd_tasks list.  Because it is the last task
	blocking the current expedited grace period, it ends that
	expedited grace period.

3.	Task A resumes, and samples rcu_preempted_readers_exp(rnp) which
	of course indicates that nothing is blocking the nonexistent
	expedited grace period. Task A is again preempted.

4.	Some other CPU starts an expedited grace period.  There are several
	tasks blocking this expedited grace period queued on the
	same rcu_node structure that Task A was using in step 1 above.

5.	Task A examines its state and incorrectly concludes that it was
	the last task blocking the expedited grace period on the current
	rcu_node structure.  It therefore reports completion up the
	rcu_node tree.

6.	The expedited grace period can then incorrectly complete before
	the tasks blocked on this same rcu_node structure exit their
	RCU read-side critical sections.  Arbitrarily bad things happen.

This commit therefore takes a snapshot of rcu_preempted_readers_exp(rnp)
prior to dropping the lock, so that only the last task thinks that it is
the last task, thus avoiding the failure scenario laid out above.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2011-12-11 10:31:21 -08:00
Paul E. McKenney af446b702c rcu: ->signaled better named ->fqs_state
The ->signaled field was named before complications in the form of
dyntick-idle mode and offlined CPUs.  These complications have required
that force_quiescent_state() be implemented as a state machine, instead
of simply unconditionally sending reschedule IPIs.  Therefore, this
commit renames ->signaled to ->fqs_state to catch up with the new
force_quiescent_state() reality.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2011-12-11 10:31:20 -08:00
Linus Torvalds dc47ce90c3 Linux 3.2-rc5 2011-12-09 15:09:32 -08:00
Linus Torvalds 8def5f51b0 Merge git://git.samba.org/sfrench/cifs-2.6
* git://git.samba.org/sfrench/cifs-2.6:
  cifs: check for NULL last_entry before calling cifs_save_resume_key
  cifs: attempt to freeze while looping on a receive attempt
  cifs: Fix sparse warning when calling cifs_strtoUCS
  CIFS: Add descriptions to the brlock cache functions
2011-12-09 14:45:44 -08:00
Linus Torvalds a776878d6c Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, efi: Calling __pa() with an ioremap()ed address is invalid
  x86, hpet: Immediately disable HPET timer 1 if rtc irq is masked
  x86/intel_mid: Kconfig select fix
  x86/intel_mid: Fix the Kconfig for MID selection
2011-12-09 14:45:12 -08:00
Linus Torvalds e2f4e0bc2a Merge branch 'spi/for-3.2' of git://git.pengutronix.de/git/wsa/linux-2.6
* 'spi/for-3.2' of git://git.pengutronix.de/git/wsa/linux-2.6:
  spi/gpio: fix section mismatch warning
  spi/fsl-espi: disable CONFIG_SPI_FSL_ESPI=m build
  spi/nuc900: Include linux/module.h
  spi/ath79: fix compile error due to missing include
2011-12-09 14:41:50 -08:00
Linus Torvalds af209e0aea Merge branch 'for-linus' of git://neil.brown.name/md
* 'for-linus' of git://neil.brown.name/md:
  md: raid5 crash during degradation
  md/raid5: never wait for bad-block acks on failed device.
  md: ensure new badblocks are handled promptly.
  md: bad blocks shouldn't cause a Blocked status on a Faulty device.
  md: take a reference to mddev during sysfs access.
  md: refine interpretation of "hold_active == UNTIL_IOCTL".
  md/lock: ensure updates to page_attrs are properly locked.
2011-12-09 08:18:08 -08:00
Linus Torvalds 53523d5263 Merge git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile
* git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile:
  arch/tile: use new generic {enable,disable}_percpu_irq() routines
  drivers/net/ethernet/tile: use skb_frag_page() API
  asm-generic/unistd.h: support new process_vm_{readv,write} syscalls
  arch/tile: fix double-free bug in homecache_free_pages()
  arch/tile: add a few #includes and an EXPORT to catch up with kernel changes.
2011-12-09 08:08:57 -08:00
Linus Torvalds 592d44a5f8 Merge branch 'iommu/fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
* 'iommu/fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
  MAINTAINERS: Update amd-iommu F: patterns
  iommu/amd: Fix typo in kernel-parameters.txt
  iommu/msm: Fix compile error in mach-msm/devices-iommu.c
  Fix comparison using wrong pointer variable in dma debug code
2011-12-09 08:08:14 -08:00