Commit Graph

6997 Commits

Author SHA1 Message Date
Peter Zijlstra 6de6a7b957 perf_counter: add ioctl(PERF_COUNTER_IOC_RESET)
Provide a way to reset an existing counter - this eases PAPI
libraries around perfcounters.

Similar to read() it doesn't collapse pending child counters.

[ Impact: new perfcounter fd ioctl method to reset counters ]

Suggested-by: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <20090505155437.022272933@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-05 20:18:31 +02:00
Peter Zijlstra c66de4a5be perf_counter: uncouple data_head updates from wakeups
Keep data_head up-to-date irrespective of notifications. This fixes
the case where you disable a counter and don't get a notification for
the last few pending events, and it also allows polling usage.

[ Impact: increase precision of perfcounter mmap-ed fields ]

Suggested-by: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <20090505155436.925084300@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-05 20:18:30 +02:00
Ingo Molnar 1dce8d99b8 perf_counter: convert perf_resource_mutex to a spinlock
Now percpu counters can be initialized very early. But the init
sequence uses mutex_lock(). Fortunately, perf_resource_mutex should
be a spinlock anyway, so convert it.

[ Impact: fix crash due to early init mutex use ]

LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-04 19:30:42 +02:00
Ingo Molnar 0d905bca23 perf_counter: initialize the per-cpu context earlier
percpu scheduling for perfcounters wants to take the context lock,
but that lock first needs to be initialized. Currently it is an
early_initcall() - but that is too late, the task tick runs much
sooner than that.

Call it explicitly from the scheduler init sequence instead.

[ Impact: fix access-before-init crash ]

LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-04 19:30:32 +02:00
Ingo Molnar b82914ce33 perf_counter: round-robin per-CPU counters too
This used to be unstable when we had the rq->lock dependencies,
but now that they are that of the past we can turn on percpu
counter RR too.

[ Impact: handle counter over-commit for per-CPU counters too ]

LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-04 19:29:57 +02:00
Peter Zijlstra c33a0bc4e4 perf_counter: fix race in perf_output_*
When two (or more) contexts output to the same buffer, it is possible
to observe half written output.

Suppose we have CPU0 doing perf_counter_mmap(), CPU1 doing
perf_counter_overflow(). If CPU1 does a wakeup and exposes head to
user-space, then CPU2 can observe the data CPU0 is still writing.

[ Impact: fix occasionally corrupted profiling records ]

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
LKML-Reference: <20090501102533.007821627@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-01 13:23:43 +02:00
Thomas Gleixner 3c56999eec Merge branch 'core/signal' into perfcounters/core
This is necessary to avoid the conflict of syscall numbers.

Conflicts:
	arch/x86/ia32/ia32entry.S
	arch/x86/include/asm/unistd_32.h
	arch/x86/include/asm/unistd_64.h

Fixes up the borked syscall numbers of perfcounters versus
preadv/pwritev as well.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-04-30 21:16:49 +02:00
Thomas Gleixner 62ab4505e3 signals: implement sys_rt_tgsigqueueinfo
sys_kill has the per thread counterpart sys_tgkill. sigqueueinfo is
missing a thread directed counterpart. Such an interface is important
for migrating applications from other OSes which have the per thread
delivery implemented.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Roland McGrath <roland@redhat.com>
Acked-by: Ulrich Drepper <drepper@redhat.com>
2009-04-30 19:24:24 +02:00
Thomas Gleixner 30b4ae8a44 signals: split do_tkill
Split out the code from do_tkill to make it reusable by the follow up
patch which implements sys_rt_tgsigqueueinfo

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
2009-04-30 19:24:23 +02:00
Paul Mackerras c5dd016cdf perf_counter: update copyright notice
This adds my name to the list of copyright holders on the core
perf_counter.c, since I have contributed a significant amount of the
code in there.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Robert Richter <robert.richter@amd.com>
LKML-Reference: <18936.59200.888049.746658@cargo.ozlabs.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-30 08:23:11 +02:00
Ingo Molnar 9814451142 perf_counter: add/update copyrights
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-29 14:52:50 +02:00
Robert Richter 4aeb0b4239 perfcounters: rename struct hw_perf_counter_ops into struct pmu
This patch renames struct hw_perf_counter_ops into struct pmu. It
introduces a structure to describe a cpu specific pmu (performance
monitoring unit). It may contain ops and data. The new name of the
structure fits better, is shorter, and thus better to handle. Where it
was appropriate, names of function and variable have been changed too.

[ Impact: cleanup ]

Signed-off-by: Robert Richter <robert.richter@amd.com>
Cc: Paul Mackerras <paulus@samba.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1241002046-8832-7-git-send-email-robert.richter@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-29 14:51:03 +02:00
Ingo Molnar e7fd5d4b3d Merge branch 'linus' into perfcounters/core
Merge reason: This brach was on -rc1, refresh it to almost-rc4 to pick up
              the latest upstream fixes.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-29 14:47:05 +02:00
Linus Torvalds 51b3960e78 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6:
  ptrace: ptrace_attach: fix the usage of ->cred_exec_mutex
2009-04-27 08:38:51 -07:00
Oleg Nesterov cad81bc252 ptrace: ptrace_attach: fix the usage of ->cred_exec_mutex
ptrace_attach() needs task->cred_exec_mutex, not current->cred_exec_mutex.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Roland McGrath <roland@redhat.com>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
2009-04-27 20:30:51 +10:00
Linus Torvalds fc2e3180a7 Merge branch 'irq-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'irq-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86/irq: mark NUMA_MIGRATE_IRQ_DESC broken
  x86, irq: Remove IRQ_DISABLED check in process context IRQ move
2009-04-26 10:29:18 -07:00
Linus Torvalds 1e4b978154 Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  locking: clarify kernel-taint warning message
  lockdep, x86: account for irqs enabled in paranoid_exit
  lockdep: more robust lockdep_map init sequence
2009-04-26 10:29:01 -07:00
Rafael J. Wysocki 0c8454f566 PM/Hibernate: Fix waiting for image device to appear on resume
Commit c751085943 ("PM/Hibernate: Wait for
SCSI devices scan to complete during resume") added a call to
scsi_complete_async_scans() to software_resume(), so that it waited for
the SCSI scanning to complete, but the call was added at a wrong place.

Namely, it should have been added after wait_for_device_probe(), which
is called only if the image partition hasn't been specified yet.  Also,
it's reasonable to check if the image partition is present and only wait
for the device probing and SCSI scanning to complete if it is not the
case.

Additionally, since noresume is checked right at the beginning of
software_resume() and the function returns immediately if it's set, it
doesn't make sense to check it once again later.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-24 15:31:30 -07:00
Jonathan Corbet 418df63c2d Delete slow-work timers properly
Slow-work appears to delete its timer as soon as the first user
unregisters, even though other users could be active.  At the same time, it
never seems to delete slow_work_oom_timer.  Arrange for both to happen in
the shutdown path.

Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-24 07:47:59 -07:00
Ingo Molnar b48ccb095a locking: clarify kernel-taint warning message
Andi Kleen reported this message triggering on non-lockdep kernels:

   Disabling lockdep due to kernel taint

Clarify the message to say 'lock debugging' - debug_locks_off()
turns off all things lock debugging, not just lockdep.

[ Impact: change kernel warning message text ]

Reported-by: Andi Kleen <andi@firstfloor.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-23 09:36:52 +02:00
Magnus Damm 4614e6adaf clocksource: add enable() and disable() callbacks
Add enable() and disable() callbacks for clocksources.

This allows us to put unused clocksources in power save mode.  The
functions clocksource_enable() and clocksource_disable() wrap the
callbacks and are inserted in the timekeeping code to enable before use
and disable after switching to a new clocksource.

Signed-off-by: Magnus Damm <damm@igel.co.jp>
Acked-by: John Stultz <johnstul@us.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-21 13:41:47 -07:00
Magnus Damm 8e19608e8b clocksource: pass clocksource to read() callback
Pass clocksource pointer to the read() callback for clocksources.  This
allows us to share the callback between multiple instances.

[hugh@veritas.com: fix powerpc build of clocksource pass clocksource mods]
[akpm@linux-foundation.org: cleanup]
Signed-off-by: Magnus Damm <damm@igel.co.jp>
Acked-by: John Stultz <johnstul@us.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-21 13:41:47 -07:00
Al Viro 24b6f16ecf No need for crossing to mountpoint in audit_tag_tree()
is_under() will DTRT anyway.  And yes, is_subdir() behaviour
is intentional.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-04-20 23:01:15 -04:00
Ingo Molnar f1f9b3b179 perfcounters, sched: remove __task_delta_exec()
This function was left orphan by the latest round of sw-counter
cleanups.

[ Impact: remove unused kernel function ]

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-20 20:38:21 +02:00
Rafael J. Wysocki 6a7c7eaf71 PM/Suspend: Introduce two new platform callbacks to avoid breakage
Commit 900af0d973 (PM: Change suspend
code ordering) changed the ordering of suspend code in such a way
that the platform .prepare() callback is now executed after the
device drivers' late suspend callbacks have run.  Unfortunately, this
turns out to break ARM platforms that need to talk via I2C to power
control devices during the .prepare() callback.

For this reason introduce two new platform suspend callbacks,
.prepare_late() and .wake(), that will be called just prior to
disabling non-boot CPUs and right after bringing them back on line,
respectively, and use them instead of .prepare() and .finish() for
ACPI suspend.  Make the PM core execute the .prepare() and .finish()
platform suspend callbacks where they were executed previously (that
is, right after calling the regular suspend methods provided by
device drivers and right before executing their regular resume
methods, respectively).

It is not necessary to make analogous changes to the hibernation
code and data structures at the moment, because they are only used
by ACPI platforms.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Reported-by: Russell King <rmk+kernel@arm.linux.org.uk>
Acked-by: Len Brown <len.brown@intel.com>
2009-04-19 20:08:42 +02:00
Linus Torvalds ff54250a0e Remove 'recurse into child resources' logic from 'reserve_region_with_split()'
This function is not actually used right now, since the original use
case for it was done with insert_resource_expand_to_fit() instead.

However, we now have another usage case that wants to basically do a
"reserve IO resource, splitting around existing resources", however that
one doesn't actually want the "recurse into the conflicting resource"
logic at all.

And since recursing into the conflicting resource was the most complex
part, and isn't wanted, just remove it.  Maybe we'll some day want both
versions, but we can just resurrect the logic then.

Tested-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-18 21:44:24 -07:00
Peter Zijlstra c8a2500586 lockdep: more robust lockdep_map init sequence
Steven Rostedt reported:

> OK, I think I figured this bug out. This is a lockdep issue with respect
> to tracepoints.
>
> The trace points in lockdep are called all the time. Outside the lockdep
> logic. But if lockdep were to trigger an error / warning (which this run
> did) we might be in trouble. For new locks, like the dentry->d_lock, that
> are created, they will not get a name:
>
> void lockdep_init_map(struct lockdep_map *lock, const char *name,
>                       struct lock_class_key *key, int subclass)
> {
>         if (unlikely(!debug_locks))
>                 return;
>
> When a problem is found by lockdep, debug_locks becomes false. Thus we
> stop allocating names for locks. This dentry->d_lock I had, now has no
> name. Worse yet, I have CONFIG_DEBUG_VM set, that scrambles non
> initialized memory. Thus, when the trace point was hit, it had junk for
> the lock->name, and the machine crashed.

Ah, nice catch. I think we should put at least the name in regardless.

Ensure we at least initialize the trivial entries of the depmap so that
they can be relied upon, even when lockdep itself decided to pack up and
go home.

[ Impact: fix lock tracing after lockdep warnings. ]

Reported-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <1239954049.23397.4156.camel@laptop>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-17 18:00:00 +02:00
Linus Torvalds 9f76208c33 Merge branch 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  tracing: Fix branch tracer header
  tracing: Fix power tracer header
2009-04-16 18:17:22 -07:00
Linus Torvalds d06be22150 Merge branch 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  sched: Avoid printing sched_group::__cpu_power for default case
  tracing, sched: mark get_parent_ip() notrace
2009-04-16 18:16:29 -07:00
Linus Torvalds 4d831f53dd Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  kernel/softirq.c: fix sparse warning
  rcu: Make hierarchical RCU less IPI-happy
2009-04-16 17:56:39 -07:00
H Hartley Sweeten 79d381c9f2 kernel/softirq.c: fix sparse warning
Fix sparse warning in kernel/softirq.c.

  warning: do-while statement is not a compound statement

Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
LKML-Reference: <BD79186B4FD85F4B8E60E381CAEE1909015F9033@mi8nycmail19.Mi8.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-17 01:57:54 +02:00
Gautham R Shenoy 381512cf3d sched: Avoid printing sched_group::__cpu_power for default case
Commit 46e0bb9c12 ("sched: Print sched_group::__cpu_power
in sched_domain_debug") produces a messy dmesg output while
attempting to print the sched_group::__cpu_power for each
group in the sched_domain hierarchy.

Fix this by avoid printing the __cpu_power for default cases.
(i.e, __cpu_power == SCHED_LOAD_SCALE).

[ Impact: reduce syslog clutter ]

Reported-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>
Fixed-by: Tony Luck <tony.luck@intel.com>
Cc: a.p.zijlstra@chello.nl
LKML-Reference: <20090414033936.GA534@in.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-17 00:46:05 +02:00
Steven Whitehouse ff7b1b4f00 perfcounters: export perf_tpcounter_event
Needed for modular tracepoint support.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-16 01:10:04 +02:00
David Howells 5b1d07ed0e RCU: Don't try and predeclare inline funcs as it upsets some versions of gcc
Don't try and predeclare inline funcs like this:

	static inline void wait_migrated_callbacks(void)
	...
	static void _rcu_barrier(enum rcu_barrier type)
	{
		...
		wait_migrated_callbacks();
	}
	...
	static inline void wait_migrated_callbacks(void)
	{
		wait_event(rcu_migrate_wq, !atomic_read(&rcu_migrate_type_count));
	}

as it upsets some versions of gcc under some circumstances:

	kernel/rcupdate.c: In function `_rcu_barrier':
	kernel/rcupdate.c:125: sorry, unimplemented: inlining failed in call to 'wait_migrated_callbacks': function body not available
	kernel/rcupdate.c:152: sorry, unimplemented: called from here

This can be dealt with by simply putting the static variables (rcu_migrate_*)
at the top, and moving the implementation of the function up so that it
replaces its forward declaration.

Signed-off-by: David Howells <dhowells@redhat.com>
Cc: Dipankar Sarma <dipankar@in.ibm.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-15 13:55:14 -07:00
Nikanth Karthikesan 297dbf50d7 swap: Remove code handling bio_alloc failure with __GFP_WAIT
Remove code handling bio_alloc failure with __GFP_WAIT.

Signed-off-by: Nikanth Karthikesan <knikanth@suse.de>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-04-15 12:10:13 +02:00
Pallipadi, Venkatesh 6ec3cfeca0 x86, irq: Remove IRQ_DISABLED check in process context IRQ move
As discussed in the thread here:

  http://marc.info/?l=linux-kernel&m=123964468521142&w=2

Eric W. Biederman observed:

> It looks like some additional bugs have slipped in since last I looked.
>
> set_irq_affinity does this:
> ifdef CONFIG_GENERIC_PENDING_IRQ
>        if (desc->status & IRQ_MOVE_PCNTXT || desc->status & IRQ_DISABLED) {
>                cpumask_copy(desc->affinity, cpumask);
>                desc->chip->set_affinity(irq, cpumask);
>        } else {
>                desc->status |= IRQ_MOVE_PENDING;
>                cpumask_copy(desc->pending_mask, cpumask);
>        }
> #else
>
> That IRQ_DISABLED case is a software state and as such it has nothing to
> do with how safe it is to move an irq in process context.

[...]

>
> The only reason we migrate MSIs in interrupt context today is that there
> wasn't infrastructure for support migration both in interrupt context
> and outside of it.

Yes. The idea here was to force the MSI migration to happen in process
context. One of the patches in the series did

        disable_irq(dev->irq);
        irq_set_affinity(dev->irq, cpumask_of(dev->cpu));
        enable_irq(dev->irq);

with the above patch adding irq/manage code check for interrupt disabled
and moving the interrupt in process context.

IIRC, there was no IRQ_MOVE_PCNTXT when we were developing this HPET
code and we ended up having this ugly hack. IRQ_MOVE_PCNTXT was there
when we eventually submitted the patch upstream. But, looks like I did a
blind rebasing instead of using IRQ_MOVE_PCNTXT in hpet MSI code.

Below patch fixes this. i.e., revert commit 932775a4ab
and add PCNTXT to HPET MSI setup. Also removes copying of desc->affinity
in generic code as set_affinity routines are doing it internally.

Reported-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: "Li Shaohua" <shaohua.li@intel.com>
Cc: Gary Hade <garyhade@us.ibm.com>
Cc: "lcm@us.ibm.com" <lcm@us.ibm.com>
Cc: suresh.b.siddha@intel.com
LKML-Reference: <20090413222058.GB8211@linux-os.sc.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-14 15:21:13 +02:00
Paul E. McKenney ef631b0ca0 rcu: Make hierarchical RCU less IPI-happy
This patch fixes a hierarchical-RCU performance bug located by Anton
Blanchard.  The problem stems from a misguided attempt to provide a
work-around for jiffies-counter failure.  This work-around uses a per-CPU
n_rcu_pending counter, which is incremented on each call to rcu_pending(),
which in turn is called from each scheduling-clock interrupt.  Each CPU
then treats this counter as a surrogate for the jiffies counter, so
that if the jiffies counter fails to advance, the per-CPU n_rcu_pending
counter will cause RCU to invoke force_quiescent_state(), which in turn
will (among other things) send resched IPIs to CPUs that have thus far
failed to pass through an RCU quiescent state.

Unfortunately, each CPU resets only its own counter after sending a
batch of IPIs.  This means that the other CPUs will also (needlessly)
send -another- round of IPIs, for a full N-squared set of IPIs in the
worst case every three scheduler-clock ticks until the grace period
finally ends.  It is not reasonable for a given CPU to reset each and
every n_rcu_pending for all the other CPUs, so this patch instead simply
disables the jiffies-counter "training wheels", thus eliminating the
excessive IPIs.

Note that the jiffies-counter IPIs do not have this problem due to
the fact that the jiffies counter is global, so that the CPU sending
the IPIs can easily reset things, thus preventing the other CPUs from
sending redundant IPIs.

Note also that the n_rcu_pending counter remains, as it will continue to
be used for tracing.  It may also see use to update the jiffies counter,
should an appropriate kick-the-jiffies-counter API appear.

Located-by: Anton Blanchard <anton@au1.ibm.com>
Tested-by: Anton Blanchard <anton@au1.ibm.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: anton@samba.org
Cc: akpm@linux-foundation.org
Cc: dipankar@in.ibm.com
Cc: manfred@colorfullife.com
Cc: cl@linux-foundation.org
Cc: josht@linux.vnet.ibm.com
Cc: schamp@sgi.com
Cc: niv@us.ibm.com
Cc: dvhltc@us.ibm.com
Cc: ego@in.ibm.com
Cc: laijs@cn.fujitsu.com
Cc: rostedt@goodmis.org
Cc: peterz@infradead.org
Cc: penberg@cs.helsinki.fi
Cc: andi@firstfloor.org
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
LKML-Reference: <12396834793575-git-send-email->
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-14 11:31:50 +02:00
Zhaolei 557055bebe tracing: Fix branch tracer header
Before patch:

  # tracer: branch
  #
  #           TASK-PID    CPU#    TIMESTAMP  FUNCTION
  #              | |       |          |         |
             <...>-2981  [000] 24008.872738: [  ok  ] trace_irq_handler_exit:irq_event_types.h:41
             <...>-2981  [000] 24008.872742: [  ok  ] note_interrupt:spurious.c:229
  ...

After patch:

  # tracer: branch
  #
  #           TASK-PID    CPU#    TIMESTAMP  CORRECT  FUNC:FILE:LINE
  #              | |       |          |         |       |
             <...>-2985  [000] 26329.142970: [  ok  ] slab_free:slub.c:1776
             <...>-2985  [000] 26329.142972: [  ok  ] trace_kmem_cache_free:kmem_event_types.h:191
  ...

Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tom Zanussi <tzanussi@gmail.com>
LKML-Reference: <49E2F19A.3040006@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-14 02:30:36 +02:00
Lai Jiangshan 132380a06b tracing, sched: mark get_parent_ip() notrace
Impact: remove overly redundant tracing entries

When tracer is "function" or "function_graph", way too much
"get_parent_ip" entries are recorded in ring_buffer.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Steven Rostedt <srostedt@redhat.com>
LKML-Reference: <49D458B1.5000703@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-14 02:11:05 +02:00
Andi Kleen 3d26dcf767 kernel/sys.c: clean up sys_shutdown exit path
Impact: cleanup, fix

Clean up sys_shutdown() exit path.  Factor out common code.  Return
correct error code instead of always 0 on failure.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-13 15:04:32 -07:00
Oleg Nesterov f1671f6d78 ptrace: fix exit_ptrace() vs ptrace_traceme() race
Pointed out by Roland.  The bug was recently introduced by me in
"forget_original_parent: split out the un-ptrace part", commit
39c626ae47.

Since that patch we have a window after exit_ptrace() drops tasklist and
before forget_original_parent() takes it again.  In this window the child
can do ptrace(PTRACE_TRACEME) and nobody can untrace this child after
that.

Change ptrace_traceme() to not attach to the exiting ->real_parent.  We
don't report the error in this case, we pretend we attach right before
->real_parent calls exit_ptrace() which should untrace us anyway.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-13 15:04:31 -07:00
Peter Zijlstra 4be6f6bb66 mm: move the scan_unevictable_pages sysctl to the vm table
vm knobs should go in the vm table.  Probably too late for
randomize_va_space though.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-13 15:04:28 -07:00
Zhaolei a3d03ecaf9 tracing: Fix power tracer header
Before patch:
  # tracer: power
  #
  #           TASK-PID    CPU#    TIMESTAMP  FUNCTION
  #              | |       |          |         |
  [  676.875865889] CSTATE: Going to C1 on cpu 0 for 0.005911463
  [  676.882938805] CSTATE: Going to C1 on cpu 0 for 0.104796532
  ...

After patch:
  # tracer: power
  #
  #   TIMESTAMP      STATE  EVENT
  #       |            |      |
  [  676.875865889] CSTATE: Going to C1 on cpu 0 for 0.005911463
  [  676.882938805] CSTATE: Going to C1 on cpu 0 for 0.104796532
  ...

v2: Use seq_puts instead of seq_printf

Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Tom Zanussi <tzanussi@gmail.com>
LKML-Reference: <49E2E889.5000903@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-13 23:39:57 +02:00
Rafael J. Wysocki c751085943 PM/Hibernate: Wait for SCSI devices scan to complete during resume
There is a race between resume from hibernation and the asynchronous
scanning of SCSI devices and to prevent it from happening we need to
call scsi_complete_async_scans() during resume from hibernation.

In addition, if the resume from hibernation is userland-driven, it's
better to wait for all device probes in the kernel to complete before
attempting to open the resume device.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-13 11:37:07 -07:00
Linus Torvalds 8255309b88 Merge branch 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  tracing/filters: return proper error code when writing filter file
  tracing/filters: allow user input integer to be oct or hex
  tracing/filters: fix NULL pointer dereference
  tracing/filters: NIL-terminate user input filter
  ftrace: Output REC->var instead of __entry->var for trace format
  Make __stringify support variable argument macros too
  tracing: fix document references
  tracing: fix splice return too large
  tracing: update file->f_pos when splice(2) it
  tracing: allocate page when needed
  tracing: disable seeking for trace_pipe_raw
2009-04-13 11:31:28 -07:00
Linus Torvalds bf20753c0c Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  lockdep: continue lock debugging despite some taints
  lockdep: warn about lockdep disabling after kernel taint
2009-04-13 11:30:26 -07:00
Linus Torvalds d811f236d9 Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  percpu: unbreak alpha percpu
  mutex: have non-spinning mutexes on s390 by default
2009-04-13 08:22:43 -07:00
Frederic Weisbecker 574bbe7820 lockdep: continue lock debugging despite some taints
Impact: broaden lockdep checks

Lockdep is disabled after any kernel taints. This might be convenient
to ignore bad locking issues which sources come from outside the kernel
tree. Nevertheless, it might be a frustrating experience for the
staging developers or those who experience a warning but are focused
on another things that require lockdep.

The v2 of this patch simply don't disable anymore lockdep in case
of TAINT_CRAP and TAINT_WARN events.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: LTP <ltp-list@lists.sourceforge.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Greg KH <gregkh@suse.de>
LKML-Reference: <1239412638-6739-2-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-12 16:10:52 +02:00
Frederic Weisbecker 9eeba6138c lockdep: warn about lockdep disabling after kernel taint
Impact: provide useful missing info for developers

Kernel taint can occur in several situations such as warnings,
load of prorietary or staging modules, bad page, etc...

But when such taint happens, a developer might still be working on
the kernel, expecting that lockdep is still enabled. But a taint
disables lockdep without ever warning about it.
Such a kernel behaviour doesn't really help for kernel development.

This patch adds this missing warning.

Since the taint is done most of the time after the main message that
explain the real source issue, it seems safe to warn about it inside
add_taint() so that it appears at last, without hurting the main
information.

v2: Use a generic helper to disable lockdep instead of an
    open coded xchg().

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <1239412638-6739-1-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-12 16:10:51 +02:00
Li Zefan 44e9c8b7ad tracing/filters: return proper error code when writing filter file
- propagate return value of filter_add_pred() to the user

- return -ENOSPC but not -ENOMEM or -EINVAL when the filter array
  is full

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Tom Zanussi <tzanussi@gmail.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <49E04CF0.3010105@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-12 11:59:29 +02:00