Commit Graph

229 Commits

Author SHA1 Message Date
Linus Torvalds 3f4d9925e9 Merge branches 'core-urgent-for-linus' and 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull rcu fix and x86 irq fix from Ingo Molnar:

 - Fix a bug that caused an RCU warning splat.

 - Two x86 irq related fixes: a hotplug crash fix and an ACPI IRQ
   registry fix.

* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  rcu: Clear need_qs flag to prevent splat

* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/irq: Check for valid irq descriptor in check_irq_vectors_for_cpu_disable()
  x86/irq: Fix regression caused by commit b568b8601f
2015-02-21 10:36:06 -08:00
Tejun Heo ad853b48cb rcu: use %*pb[l] to print bitmaps including cpumasks and nodemasks
printk and friends can now format bitmaps using '%*pb[l]'.  cpumask
and nodemask also provide cpumask_pr_args() and nodemask_pr_args()
respectively which can be used to generate the two printf arguments
necessary to format the specified cpu/nodemask.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-13 21:21:37 -08:00
Paul E. McKenney c0135d07b0 rcu: Clear need_qs flag to prevent splat
If the scheduling-clock interrupt sets the current tasks need_qs flag,
but if the current CPU passes through a quiescent state in the meantime,
then rcu_preempt_qs() will fail to clear the need_qs flag, which can fool
RCU into thinking that additional rcu_read_unlock_special() processing
is needed.  This commit therefore clears the need_qs flag before checking
for additional processing.

For this problem to occur, we need rcu_preempt_data.passed_quiesce equal
to true and current->rcu_read_unlock_special.b.need_qs also equal to true.
This condition can occur as follows:

1.	CPU 0 is aware of the current preemptible RCU grace period,
	but has not yet passed through a quiescent state.  Among other
	things, this means that rcu_preempt_data.passed_quiesce is false.

2.	Task A running on CPU 0 enters a preemptible RCU read-side
	critical section.

3.	CPU 0 takes a scheduling-clock interrupt, which notices the
	RCU read-side critical section and the need for a quiescent state,
	and thus sets current->rcu_read_unlock_special.b.need_qs to true.

4.	Task A is preempted, enters the scheduler, eventually invoking
	rcu_preempt_note_context_switch() which in turn invokes
	rcu_preempt_qs().

	Because rcu_preempt_data.passed_quiesce is false,
	control enters the body of the "if" statement, which sets
	rcu_preempt_data.passed_quiesce to true.

5.	At this point, CPU 0 takes an interrupt.  The interrupt
	handler contains an RCU read-side critical section, and
	the rcu_read_unlock() notes that current->rcu_read_unlock_special
	is nonzero, and thus invokes rcu_read_unlock_special().

6.	Once in rcu_read_unlock_special(), the fact that
	current->rcu_read_unlock_special.b.need_qs is true becomes
	apparent, so rcu_read_unlock_special() invokes rcu_preempt_qs().
	Recursively, given that we interrupted out of that same
	function in the preceding step.

7.	Because rcu_preempt_data.passed_quiesce is now true,
	rcu_preempt_qs() does nothing, and simply returns.

8.	Upon return to rcu_read_unlock_special(), it is noted that
	current->rcu_read_unlock_special is still nonzero (because
	the interrupted rcu_preempt_qs() had not yet gotten around
	to clearing current->rcu_read_unlock_special.b.need_qs).

9.	Execution proceeds to the WARN_ON_ONCE(), which notes that
	we are in an interrupt handler and thus duly splats.

The solution, as noted above, is to make rcu_read_unlock_special()
clear out current->rcu_read_unlock_special.b.need_qs after calling
rcu_preempt_qs().  The interrupted rcu_preempt_qs() will clear it again,
but this is harmless.  The worst that happens is that we clobber another
attempt to set this field, but this is not a problem because we just
got done reporting a quiescent state.

Reported-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: Fix embarrassing build bug noted by Sasha Levin. ]
Tested-by: Sasha Levin <sasha.levin@oracle.com>
2015-02-11 15:46:43 -08:00
Paul E. McKenney 78e691f4ae Merge branches 'doc.2015.01.07a', 'fixes.2015.01.15a', 'preempt.2015.01.06a', 'srcu.2015.01.06a', 'stall.2015.01.16a' and 'torture.2015.01.11a' into HEAD
doc.2015.01.07a: Documentation updates.
fixes.2015.01.15a: Miscellaneous fixes.
preempt.2015.01.06a: Changes to handling of lists of preempted tasks.
srcu.2015.01.06a: SRCU updates.
stall.2015.01.16a: RCU CPU stall-warning updates and fixes.
torture.2015.01.11a: RCU torture-test updates and fixes.
2015-01-15 23:34:34 -08:00
Paul E. McKenney 630181c4a9 rcu: Initialize tiny RCU stall-warning timeouts at boot
The current tiny RCU stall-warning code assumes that the jiffies counter
starts at zero, however, it is sometimes initialized to other values,
for example, -30,000.  This commit therefore changes rcu_init() to
invoke reset_cpu_stall_ticks() for both flavors of RCU to initialize
the stall-warning times properly at boot.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-01-15 23:33:17 -08:00
Miroslav Benes ec1fe396ff rcu: Fix RCU CPU stall detection in tiny implementation
The tiny RCU CPU stall detection depends on *rcp->curtail not being
NULL. It is however a tail pointer and thus NULL by definition. Instead we
should check rcp->rcucblist for the presence of pending callbacks which
need to be processed. With this fix INFO about the stall is printed and
jiffies_stall (jiffies at next stall) correctly updated.

Note that the check for pending callback is necessary to avoid spurious
warnings if there are no pendings callbacks.

Signed-off-by: Miroslav Benes <mbenes@suse.cz>
[ paulmck: Fused identical "if" statements, ported to -rcu. ]
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-01-15 23:33:16 -08:00
Paul E. McKenney fb81a44b88 rcu: Add GP-kthread-starvation checks to CPU stall warnings
This commit adds a message that is printed if the relevant grace-period
kthread has not been able to run for the two seconds preceding the
stall warning.  (The two seconds is double the maximum interval between
successive bouts of quiescent-state forcing.)

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-01-15 23:33:15 -08:00
Paul E. McKenney 5cd37193ce rcu: Make cond_resched_rcu_qs() apply to normal RCU flavors
Although cond_resched_rcu_qs() only applies to TASKS_RCU, it is used
in places where it would be useful for it to apply to the normal RCU
flavors, rcu_preempt, rcu_sched, and rcu_bh.  This is especially the
case for workloads that aggressively overload the system, particularly
those that generate large numbers of RCU updates on systems running
NO_HZ_FULL CPUs.  This commit therefore communicates quiescent states
from cond_resched_rcu_qs() to the normal RCU flavors.

Note that it is unfortunately necessary to leave the old ->passed_quiesce
mechanism in place to allow quiescent states that apply to only one
flavor to be recorded.  (Yes, we could decrement ->rcu_qs_ctr_snap in
that case, but that is not so good for debugging of RCU internals.)
In addition, if one of the RCU flavor's grace period has stalled, this
will invoke rcu_momentary_dyntick_idle(), resulting in a heavy-weight
quiescent state visible from other CPUs.

Reported-by: Sasha Levin <sasha.levin@oracle.com>
Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: Merge commit from Sasha Levin fixing a bug where __this_cpu()
  was used in preemptible code. ]
2015-01-15 23:33:14 -08:00
Paul E. McKenney a94844b22a rcu: Optionally run grace-period kthreads at real-time priority
Recent testing has shown that under heavy load, running RCU's grace-period
kthreads at real-time priority can improve performance (according to 0day
test robot) and reduce the incidence of RCU CPU stall warnings.  However,
most systems do just fine with the default non-realtime priorities for
these kthreads, and it does not make sense to expose the entire user
base to any risk stemming from this change, given that this change is
of use only to a few users running extremely heavy workloads.

Therefore, this commit allows users to specify realtime priorities
for the grace-period kthreads, but leaves them running SCHED_OTHER
by default.  The realtime priority may be specified at build time
via the RCU_KTHREAD_PRIO Kconfig parameter, or at boot time via the
rcutree.kthread_prio parameter.  Either way, 0 says to continue the
default SCHED_OTHER behavior and values from 1-99 specify that priority
of SCHED_FIFO behavior.  Note that a value of 0 is not permitted when
the RCU_BOOST Kconfig parameter is specified.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-01-15 23:25:04 -08:00
Paul E. McKenney 7602de4af1 rcutorture: Add more diagnostics in rcu_barrier() test failure case
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-01-10 19:08:06 -08:00
Paul E. McKenney 917963d0b3 rcutorture: Check from beginning to end of grace period
Currently, rcutorture's Reader Batch checks measure from the end of
the previous grace period to the end of the current one.  This commit
tightens up these checks by measuring from the start and end of the same
grace period.  This involves adding rcu_batches_started() and friends
corresponding to the existing rcu_batches_completed() and friends.

We leave SRCU alone for the moment, as it does not yet have a way of
tracking both ends of its grace periods.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-01-10 19:08:02 -08:00
Paul E. McKenney f9103c3902 rcu: Remove redundant rcu_batches_completed() declaration
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-01-10 19:08:01 -08:00
Paul E. McKenney 1e32eaee4c rcutorture: Drop rcu_torture_completed() and friends
Now that the return type of rcu_batches_completed() and friends matches
that of the rcu_torture_ops structure's ->completed field, the wrapper
functions can be deleted.  This commit carries out that deletion, while
also wiring "sched"'s ->completed field to rcu_batches_completed_sched().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-01-10 19:08:00 -08:00
Paul E. McKenney 6b80da42c0 rcutorture: Use unsigned for Reader Batch computations
The counter returned by the various ->completed functions is subject to
overflow, which means that subtracting two such counters might result
in overflow, which invokes undefined behavior in the C standard.  This
commit therefore changes these functions and variables to unsigned to
avoid this undefined behavior.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-01-10 19:07:58 -08:00
Paul E. McKenney 9733e4f0a9 rcu: Make _batches_completed() functions return unsigned long
Long ago, the various ->completed fields were of type long, but now are
unsigned long due to signed-integer-overflow concerns.  However, the
various _batches_completed() functions remained of type long, even though
their only purpose in life is to return the corresponding ->completed
field.  This patch cleans this up by changing these functions' return
types to unsigned long.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-01-10 19:07:56 -08:00
Paul E. McKenney e3663b1024 rcu: Handle gpnum/completed wrap while dyntick idle
Subtle race conditions can result if a CPU stays in dyntick-idle mode
long enough for the ->gpnum and ->completed fields to wrap.  For
example, consider the following sequence of events:

o	CPU 1 encounters a quiescent state while waiting for grace period
	5 to complete, but then enters dyntick-idle mode.

o	While CPU 1 is in dyntick-idle mode, the grace-period counters
	wrap around so that the grace period number is now 4.

o	Just as CPU 1 exits dyntick-idle mode, grace period 4 completes
	and grace period 5 begins.

o	The quiescent state that CPU 1 passed through during the old
	grace period 5 looks like it applies to the new grace period
	5.  Therefore, the new grace period 5 completes without CPU 1
	having passed through a quiescent state.

This could clearly be a fatal surprise to any long-running RCU read-side
critical section that happened to be running on CPU 1 at the time.  At one
time, this was not a problem, given that it takes significant time for
the grace-period counters to overflow even on 32-bit systems.  However,
with the advent of NO_HZ_FULL and SMP embedded systems, arbitrarily long
idle periods are now becoming quite feasible.  It is therefore time to
close this race.

This commit therefore avoids this race condition by having the
quiescent-state forcing code detect when a CPU is falling too far
behind, and setting a new rcu_data field ->gpwrap when this happens.
Whenever this new ->gpwrap field is set, the CPU's ->gpnum and ->completed
fields are known to be untrustworthy, and can be ignored, along with
any associated quiescent states.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-01-06 11:05:28 -08:00
Paul E. McKenney 6ccd2ecd42 rcu: Improve diagnostics for spurious RCU CPU stall warnings
The current RCU CPU stall warning code will print "Stall ended before
state dump start" any time that the stall-warning code is triggered on
a CPU that has already reported a quiescent state for the current grace
period and if all quiescent states have been reported for the current
grace period.  However, a true stall can result in these symptoms, for
example, by preventing RCU's grace-period kthreads from ever running

This commit therefore checks for this condition, reporting the end of
the stall only if one of the grace-period counters has actually advanced.
Otherwise, it reports the last time that the grace-period kthread made
meaningful progress.  (In normal situations, the grace-period kthread
should make meaningful progress at least every jiffies_till_next_fqs
jiffies.)

Reported-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Miroslav Benes <mbenes@suse.cz>
2015-01-06 11:05:27 -08:00
Paul E. McKenney fc908ed33e rcu: Make RCU_CPU_STALL_INFO include number of fqs attempts
One way that an RCU CPU stall warning can happen is if the grace-period
kthread is not allowed to execute.  One proxy for this kthread's
forward progress is the number of force-quiescent-state (fqs) scans.
This commit therefore adds the number of fqs scans to the RCU CPU stall
warning printouts when CONFIG_RCU_CPU_STALL_INFO=y.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-01-06 11:05:25 -08:00
Pranith Kumar 83fe27ea53 rcu: Make SRCU optional by using CONFIG_SRCU
SRCU is not necessary to be compiled by default in all cases. For tinification
efforts not compiling SRCU unless necessary is desirable.

The current patch tries to make compiling SRCU optional by introducing a new
Kconfig option CONFIG_SRCU which is selected when any of the components making
use of SRCU are selected.

If we do not select CONFIG_SRCU, srcu.o will not be compiled at all.

   text    data     bss     dec     hex filename
   2007       0       0    2007     7d7 kernel/rcu/srcu.o

Size of arch/powerpc/boot/zImage changes from

   text    data     bss     dec     hex filename
 831552   64180   23944  919676   e087c arch/powerpc/boot/zImage : before
 829504   64180   23952  917636   e0084 arch/powerpc/boot/zImage : after

so the savings are about ~2000 bytes.

Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
CC: Josh Triplett <josh@joshtriplett.org>
CC: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: resolve conflict due to removal of arch/ia64/kvm/Kconfig. ]
2015-01-06 11:04:29 -08:00
Paul E. McKenney a5c198f4f7 rcu: Expand SRCU ->completed to 64 bits
When rcutorture used only the low-order 32 bits of the grace-period
number, it was not a problem for SRCU to use a 32-bit completed field.
However, rcutorture now uses the full 64 bits on 64-bit systems, so
this commit converts SRCU's ->completed field to unsigned long so as to
provide 64 bits on 64-bit systems.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-01-06 11:04:26 -08:00
Paul E. McKenney ab954c167e rcu: Remove redundant callback-list initialization
The RCU callback lists are initialized in both rcu_boot_init_percpu_data()
and rcu_init_percpu_data().  The former is intended for initializing
immutable data, so this commit removes the initialization from
rcu_boot_init_percpu_data() and leaves it in rcu_init_percpu_data().
This change prepares for permitting callbacks to be queued very early
in boot.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-01-06 11:02:54 -08:00
Paul E. McKenney 6cd534ef8b rcu: Don't scan root rcu_node structure for stalled tasks
Now that blocked tasks are no longer migrated to the root rcu_node
structure, there is no need to scan the root rcu_node structure for
blocked tasks stalling the current grace period.  This commit therefore
removes this scan.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-01-06 11:02:53 -08:00
Lai Jiangshan abaf3f9d27 rcu: Revert "Allow post-unlock reference for rt_mutex" to avoid priority-inversion
The patch dfeb9765ce ("Allow post-unlock reference for rt_mutex")
ensured rcu-boost safe even the rt_mutex has post-unlock reference.

But rt_mutex allowing post-unlock reference is definitely a bug and it was
fixed by the commit 27e35715df ("rtmutex: Plug slow unlock race").
This fix made the previous patch (dfeb9765ce) useless.

And even worse, the priority-inversion introduced by the the previous
patch still exists.

rcu_read_unlock_special() {
	rt_mutex_unlock(&rnp->boost_mtx);
	/* Priority-Inversion:
	 * the current task had been deboosted and preempted as a low
	 * priority task immediately, it could wait long before reschedule in,
	 * and the rcu-booster also waits on this low priority task and sleeps.
	 * This priority-inversion makes rcu-booster can't work
	 * as expected.
	 */
	complete(&rnp->boost_completion);
}

Just revert the patch to avoid it.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-01-06 11:02:52 -08:00
Paul E. McKenney 3ba4d0e09b rcu: Note quiescent state when CPU goes offline
The rcu_cleanup_dead_cpu() function (called after a CPU has gone
completely offline) has not reported a quiescent state because there
was probably at least one synchronize_rcu() between the time the CPU
went offline and the CPU_DEAD notifier, and this would have detected
the CPU's offline state via quiescent-state forcing.  However, the plan
is for CPUs to take themselves offline, at which point it makes sense
for them to report their own quiescent state.  This commit makes this
change in preparation for the new CPU-hotplug setup.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-01-06 11:02:51 -08:00
Paul E. McKenney 5d0b024973 rcu: Don't bother affinitying rcub kthreads away from offline CPUs
When rcu_boost_kthread_setaffinity() sees that all CPUs for a given
rcu_node structure are now offline, it affinities the corresponding
RCU-boost ("rcub") kthread away from those CPUs.  This is pointless
because the kthread cannot run on those offline CPUs in any case.
This commit therefore removes this unneeded code.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-01-06 11:02:50 -08:00
Paul E. McKenney 1be0085b51 rcu: Don't initiate RCU priority boosting on root rcu_node
Because there is no longer any preempted tasks on the root rcu_node, and
because there is no longer ever an rcub kthread for the root rcu_node,
this commit drops the code in force_qs_rnp() that attempts to awaken
the non-existent root rcub kthread.  This is strictly a performance
enhancement, removing a root rcu_node ->lock acquisition and release
along with some tests in rcu_initiate_boost(), ending with the test that
notes that there is no rcub kthread.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-01-06 11:02:48 -08:00
Paul E. McKenney 3e9f5c70d8 rcu: Don't spawn rcub kthreads on root rcu_node structure
Now that offlining CPUs no longer moves leaf rcu_node structures'
->blkd_tasks lists to the root, there is no way for the root rcu_node
structure's ->blkd_task list to be nonempty, unless the root node is also
the sole leaf node.  This commit therefore refrains from creating an rcub
kthread for the root rcu_node structure unless it is also the sole leaf.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-01-06 11:02:47 -08:00
Paul E. McKenney 96e92021d4 rcu: Make use of rcu_preempt_has_tasks()
Given that there is now arcu_preempt_has_tasks() function that checks
to see if the ->blkd_tasks list is non-empty, this commit makes use of it.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-01-06 11:02:46 -08:00
Paul E. McKenney a8f4cbadfb rcu: Shorten irq-disable region in rcu_cleanup_dead_cpu()
Now that we are not migrating callbacks, there is no need to hold the
->orphan_lock across the the ->qsmaskinit bit-clearing process.
This commit therefore releases ->orphan_lock immediately after adopting
the orphaned RCU callbacks.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-01-06 11:02:45 -08:00
Paul E. McKenney d19fb8d1f3 rcu: Don't migrate blocked tasks even if all corresponding CPUs offline
When the last CPU associated with a given leaf rcu_node structure
goes offline, something must be done about the tasks queued on that
rcu_node structure.  Each of these tasks has been preempted on one of
the leaf rcu_node structure's CPUs while in an RCU read-side critical
section that it have not yet exited.  Handling these tasks is the job of
rcu_preempt_offline_tasks(), which migrates them from the leaf rcu_node
structure to the root rcu_node structure.

Unfortunately, this migration has to be done one task at a time because
each tasks allegiance must be shifted from the original leaf rcu_node to
the root, so that future attempts to deal with these tasks will acquire
the root rcu_node structure's ->lock rather than that of the leaf.
Worse yet, this migration must be done with interrupts disabled, which
is not so good for realtime response, especially given that there is
no bound on the number of tasks on a given rcu_node structure's list.
(OK, OK, there is a bound, it is just that it is unreasonably large,
especially on 64-bit systems.)  This was not considered a problem back
when rcu_preempt_offline_tasks() was first written because realtime
systems were assumed not to do CPU-hotplug operations while real-time
applications were running.  This assumption has proved of dubious validity
given that people are starting to run multiple realtime applications
on a single SMP system and that it is common practice to offline then
online a CPU before starting its real-time application in order to clear
extraneous processing off of that CPU.  So we now need CPU hotplug
operations to avoid undue latencies.

This commit therefore avoids migrating these tasks, instead letting
them be dequeued one by one from the original leaf rcu_node structure
by rcu_read_unlock_special().  This means that the clearing of bits
from the upper-level rcu_node structures must be deferred until the
last such task has been dequeued, because otherwise subsequent grace
periods won't wait on them.  This commit has the beneficial side effect
of simplifying the CPU-hotplug code for TREE_PREEMPT_RCU, especially in
CONFIG_RCU_BOOST builds.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-01-06 11:02:44 -08:00
Paul E. McKenney b6a932d1d9 rcu: Make rcu_read_unlock_special() propagate ->qsmaskinit bit clearing
This commit causes rcu_read_unlock_special() to propagate ->qsmaskinit
bit clearing up the rcu_node tree once a given rcu_node structure's
blkd_tasks list becomes empty.  This is the final commit in preparation
for the rework of RCU priority boosting:  It enables preempted tasks to
remain queued on their rcu_node structure even after all of that rcu_node
structure's CPUs have gone offline.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-01-06 11:02:43 -08:00
Paul E. McKenney 8af3a5e78c rcu: Abstract rcu_cleanup_dead_rnp() from rcu_cleanup_dead_cpu()
This commit abstracts rcu_cleanup_dead_rnp() from rcu_cleanup_dead_cpu()
in preparation for the rework of RCU priority boosting.  This new function
will be invoked from rcu_read_unlock_special() in the reworked scheme,
which is why rcu_cleanup_dead_rnp() assumes that the leaf rcu_node
structure's ->qsmaskinit field has already been updated.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-01-06 11:02:41 -08:00
Paul E. McKenney 74e871ac6c rcu: Rename "empty" to "empty_norm" in preparation for boost rework
This commit undertakes a simple variable renaming to make way for
some rework of RCU priority boosting.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-01-06 11:02:40 -08:00
Paul E. McKenney b08ea27d95 rcu: Protect rcu_boost() lockless accesses with ACCESS_ONCE()
This commit prevents random compiler optimizations by applying
ACCESS_ONCE() to lockless accesses.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-01-06 11:02:39 -08:00
Lai Jiangshan 5a43b88e98 rcu: Remove "select IRQ_WORK" from config TREE_RCU
The 48a7639ce8 ("rcu: Make callers awaken grace-period kthread")
removed the irq_work_queue(), so the TREE_RCU doesn't need
irq work any more.  This commit therefore updates RCU's Kconfig and

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-01-06 11:01:17 -08:00
Paul E. McKenney 41050a0096 rcu: Fix rcu_barrier() race that could result in too-short wait
The rcu_barrier() no-callbacks check for no-CBs CPUs has race conditions.
It checks a given CPU's lists of callbacks, and if all three no-CBs lists
are empty, ignores that CPU.  However, these three lists could potentially
be empty even when callbacks are present if the check executed just as
the callbacks were being moved from one list to another.  It turns out
that recent versions of rcutorture can spot this race.

This commit plugs this hole by consolidating the per-list counts of
no-CBs callbacks into a single count, which is incremented before
the corresponding callback is posted and after it is invoked.  Then
rcu_barrier() checks this single count to reliably determine whether
the corresponding CPU has no-CBs callbacks.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-01-06 11:01:15 -08:00
Lai Jiangshan 5f6130fa52 tiny_rcu: Directly force QS when call_rcu_[bh|sched]() on idle_task
For RCU in UP, context-switch = QS = GP, thus we can force a
context-switch when any call_rcu_[bh|sched]() is happened on idle_task.
After doing so, rcu_idle/irq_enter/exit() are useless, so we can simply
make these functions empty.

More important, this change does not change the functionality logically.
Note: raise_softirq(RCU_SOFTIRQ)/rcu_sched_qs() in rcu_idle_enter() and
outmost rcu_irq_exit() will have to wake up the ksoftirqd
(due to in_interrupt() == 0).

Before this patch		After this patch:
call_rcu_sched() in idle;	call_rcu_sched() in idle
				  set resched
do other stuffs;		do other stuffs
outmost rcu_irq_exit()		outmost rcu_irq_exit() (empty function)
  (or rcu_idle_enter())		  (or rcu_idle_enter(), also empty function)
				start to resched. (see above)
  rcu_sched_qs()		rcu_sched_qs()
    QS,and GP and advance cb	  QS,and GP and advance cb
    wake up the ksoftirqd	    wake up the ksoftirqd
      set resched
resched to ksoftirqd (or other)	resched to ksoftirqd (or other)

These two code patches are almost the same.

Size changed after patched:

size kernel/rcu/tiny-old.o kernel/rcu/tiny-patched.o
   text	   data	    bss	    dec	    hex	filename
   3449	    206	      8	   3663	    e4f	kernel/rcu/tiny-old.o
   2406	    144	      8	   2558	    9fe	kernel/rcu/tiny-patched.o

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-01-06 11:01:12 -08:00
Paul E. McKenney 924df8a011 rcu: Fix invoke_rcu_callbacks() comment
Despite what the comment says, it is only softirqs that are disabled,
not interrupts.  This commit therefore fixes the comment.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2014-12-30 17:40:19 -08:00
Alexander Gordeev ca9558a33f rcu: Remove redundant rcu_is_cpu_rrupt_from_idle() from tiny RCU
Let's start assuming that something in the idle loop posts a callback,
and scheduling-clock interrupt occurs:

1. The system is idle and stays that way, no runnable tasks.

2. Scheduling-clock interrupt occurs, rcu_check_callbacks() is called
   as result, which in turn calls rcu_is_cpu_rrupt_from_idle().

3. rcu_is_cpu_rrupt_from_idle() reports the CPU was interrupted from
   idle, which results in rcu_sched_qs() call, which does a
   raise_softirq(RCU_SOFTIRQ).

4. Upon return from interrupt, rcu_irq_exit() is invoked, which calls
   rcu_idle_enter_common(), which in turn calls rcu_sched_qs() again,
   which does another raise_softirq(RCU_SOFTIRQ).

5. The softirq happens shortly and invokes rcu_process_callbacks(),
   which invokes __rcu_process_callbacks().

6. So now callbacks can be invoked. At least they can be if
   ->donetail has been updated. Which it will have been because
   rcu_sched_qs() invokes rcu_qsctr_help().

In the described scenario rcu_sched_qs() and raise_softirq(RCU_SOFTIRQ)
get called twice in steps 3 and 4. This redundancy could be eliminated
by removing rcu_is_cpu_rrupt_from_idle() function.

Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2014-12-30 17:40:18 -08:00
Paul E. McKenney 734d168013 rcu: Make rcu_nmi_enter() handle nesting
The x86 architecture has multiple types of NMI-like interrupts: real
NMIs, machine checks, and, for some values of NMI-like, debugging
and breakpoint interrupts.  These interrupts can nest inside each
other.  Andy Lutomirski is adding RCU support to these interrupts,
so rcu_nmi_enter() and rcu_nmi_exit() must now correctly handle nesting.

This commit therefore introduces nesting, using a clever NMI-coordination
algorithm suggested by Andy.  The trick is to atomically increment
->dynticks (if needed) before manipulating ->dynticks_nmi_nesting on entry
(and, accordingly, after on exit).  In addition, ->dynticks_nmi_nesting
is incremented by one if ->dynticks was incremented and by two otherwise.
This means that when rcu_nmi_exit() sees ->dynticks_nmi_nesting equal
to one, it knows that ->dynticks must be atomically incremented.

This NMI-coordination algorithms has been validated by the following
Promela model:

------------------------------------------------------------------------

/*
 * Promela model for Andy Lutomirski's suggested change to rcu_nmi_enter()
 * that allows nesting.
 *
 * This program is free software; you can redistribute it and/or modify
 * it under the terms of the GNU General Public License as published by
 * the Free Software Foundation; either version 2 of the License, or
 * (at your option) any later version.
 *
 * This program is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 * GNU General Public License for more details.
 *
 * You should have received a copy of the GNU General Public License
 * along with this program; if not, you can access it online at
 * http://www.gnu.org/licenses/gpl-2.0.html.
 *
 * Copyright IBM Corporation, 2014
 *
 * Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
 */

byte dynticks_nmi_nesting = 0;
byte dynticks = 0;

/*
 * Promela verision of rcu_nmi_enter().
 */
inline rcu_nmi_enter()
{
	byte incby;
	byte tmp;

	incby = BUSY_INCBY;
	assert(dynticks_nmi_nesting >= 0);
	if
	:: (dynticks & 1) == 0 ->
		atomic {
			dynticks = dynticks + 1;
		}
		assert((dynticks & 1) == 1);
		incby = 1;
	:: else ->
		skip;
	fi;
	tmp = dynticks_nmi_nesting;
	tmp = tmp + incby;
	dynticks_nmi_nesting = tmp;
	assert(dynticks_nmi_nesting >= 1);
}

/*
 * Promela verision of rcu_nmi_exit().
 */
inline rcu_nmi_exit()
{
	byte tmp;

	assert(dynticks_nmi_nesting > 0);
	assert((dynticks & 1) != 0);
	if
	:: dynticks_nmi_nesting != 1 ->
		tmp = dynticks_nmi_nesting;
		tmp = tmp - BUSY_INCBY;
		dynticks_nmi_nesting = tmp;
	:: else ->
		dynticks_nmi_nesting = 0;
		atomic {
			dynticks = dynticks + 1;
		}
		assert((dynticks & 1) == 0);
	fi;
}

/*
 * Base-level NMI runs non-atomically.  Crudely emulates process-level
 * dynticks-idle entry/exit.
 */
proctype base_NMI()
{
	byte busy;

	busy = 0;
	do
	::	/* Emulate base-level dynticks and not. */
		if
		:: 1 ->	atomic {
				dynticks = dynticks + 1;
			}
			busy = 1;
		:: 1 ->	skip;
		fi;

		/* Verify that we only sometimes have base-level dynticks. */
		if
		:: busy == 0 -> skip;
		:: busy == 1 -> skip;
		fi;

		/* Model RCU's NMI entry and exit actions. */
		rcu_nmi_enter();
		assert((dynticks & 1) == 1);
		rcu_nmi_exit();

		/* Emulated re-entering base-level dynticks and not. */
		if
		:: !busy -> skip;
		:: busy ->
			atomic {
				dynticks = dynticks + 1;
			}
			busy = 0;
		fi;

		/* We had better now be in dyntick-idle mode. */
		assert((dynticks & 1) == 0);
	od;
}

/*
 * Nested NMI runs atomically to emulate interrupting base_level().
 */
proctype nested_NMI()
{
	do
	::	/*
		 * Use an atomic section to model a nested NMI.  This is
		 * guaranteed to interleave into base_NMI() between a pair
		 * of base_NMI() statements, just as a nested NMI would.
		 */
		atomic {
			/* Verify that we only sometimes are in dynticks. */
			if
			:: (dynticks & 1) == 0 -> skip;
			:: (dynticks & 1) == 1 -> skip;
			fi;

			/* Model RCU's NMI entry and exit actions. */
			rcu_nmi_enter();
			assert((dynticks & 1) == 1);
			rcu_nmi_exit();
		}
	od;
}

init {
	run base_NMI();
	run nested_NMI();
}

------------------------------------------------------------------------

The following script can be used to run this model if placed in
rcu_nmi.spin:

------------------------------------------------------------------------

if ! spin -a rcu_nmi.spin
then
	echo Spin errors!!!
	exit 1
fi
if ! cc -DSAFETY -o pan pan.c
then
	echo Compilation errors!!!
	exit 1
fi
./pan -m100000

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
2014-12-30 17:40:16 -08:00
Paul E. McKenney 9ea6c58856 Merge branches 'torture.2014.11.03a', 'cpu.2014.11.03a', 'doc.2014.11.13a', 'fixes.2014.11.13a', 'signal.2014.10.29a' and 'rt.2014.10.29a' into HEAD
cpu.2014.11.03a: Changes for per-CPU variables.
doc.2014.11.13a: Documentation updates.
fixes.2014.11.13a: Miscellaneous fixes.
signal.2014.10.29a: Signal changes.
rt.2014.10.29a: Real-time changes.
torture.2014.11.03a: torture-test changes.
2014-11-13 10:39:04 -08:00
Paul E. McKenney 60ced4950c rcu: Fix FIXME in rcu_tasks_kthread()
This commit affines rcu_tasks_kthread() to the housekeeping CPUs
in CONFIG_NO_HZ_FULL builds.  This is just a default, so systems
administrators are free to put this kthread somewhere else if they wish.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2014-11-13 10:35:41 -08:00
Paul E. McKenney b8969d1a50 rcutorture: Fix rcu_torture_cbflood() memory leak
Commit 38706bc5a2 (rcutorture: Add callback-flood test) vmalloc()ed
a bunch of RCU callbacks, but failed to free them.  This commit fixes
that oversight.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Pranith Kumar <bobby.prani@gmail.com>
2014-11-03 19:26:41 -08:00
Pranith Kumar aa23c6fbc5 rcutorture: Add early boot self tests
Add early boot self tests for RCU under CONFIG_PROVE_RCU.

Currently the only test is adding a dummy callback which increments a counter
which we then later verify after calling rcu_barrier*().

Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2014-11-03 19:26:37 -08:00
Paul E. McKenney 8fa7845df5 rcu: Remove "cpu" argument to rcu_cleanup_after_idle()
The "cpu" argument to rcu_cleanup_after_idle() is always the current
CPU, so drop it.  This moves the smp_processor_id() from the caller to
rcu_cleanup_after_idle(), saving argument-passing overhead.  Again,
the anticipated cross-CPU uses of these functions has been replaced
by NO_HZ_FULL.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Pranith Kumar <bobby.prani@gmail.com>
2014-11-03 19:20:56 -08:00
Paul E. McKenney 198bbf8127 rcu: Remove "cpu" argument to rcu_prepare_for_idle()
The "cpu" argument to rcu_prepare_for_idle() is always the current
CPU, so drop it.  This in turn allows two of the uses of "cpu" in
this function to be replaced with a this_cpu_ptr() and the third by
smp_processor_id(), replacing that of the call to rcu_prepare_for_idle().
Again, the anticipated cross-CPU uses of these functions has been replaced
by NO_HZ_FULL.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Pranith Kumar <bobby.prani@gmail.com>
2014-11-03 19:20:49 -08:00
Paul E. McKenney aa6da5140b rcu: Remove "cpu" argument to rcu_needs_cpu()
The "cpu" argument to rcu_needs_cpu() is always the current CPU, so drop
it.  This in turn allows the "cpu" argument to rcu_cpu_has_callbacks()
to be removed, which allows the uses of "cpu" in both functions to be
replaced with a this_cpu_ptr().  Again, the anticipated cross-CPU uses
of these functions has been replaced by NO_HZ_FULL.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Pranith Kumar <bobby.prani@gmail.com>
2014-11-03 19:20:43 -08:00
Paul E. McKenney 38200cf247 rcu: Remove "cpu" argument to rcu_note_context_switch()
The "cpu" argument to rcu_note_context_switch() is always the current
CPU, so drop it.  This in turn allows the "cpu" argument to
rcu_preempt_note_context_switch() to be removed, which allows the sole
use of "cpu" in both functions to be replaced with a this_cpu_ptr().
Again, the anticipated cross-CPU uses of these functions has been
replaced by NO_HZ_FULL.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Pranith Kumar <bobby.prani@gmail.com>
2014-11-03 19:20:34 -08:00
Paul E. McKenney 86aea0e6e7 rcu: Remove "cpu" argument to rcu_preempt_check_callbacks()
Because rcu_preempt_check_callbacks()'s argument is guaranteed to
always be the current CPU, drop the argument and replace per_cpu()
with __this_cpu_read().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Pranith Kumar <bobby.prani@gmail.com>
2014-11-03 19:20:26 -08:00
Paul E. McKenney e3950ecd55 rcu: Remove "cpu" argument to rcu_pending()
Because rcu_pending()'s argument is guaranteed to always be the current
CPU, drop the argument and replace per_cpu_ptr() with this_cpu_ptr().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Pranith Kumar <bobby.prani@gmail.com>
2014-11-03 19:20:18 -08:00