Commit Graph

704 Commits

Author SHA1 Message Date
Jiri Olsa f56e7f8efb tracing, function: Fix trace header to follow context-info option
The header display of function tracer does not follow
the context-info option, so field names are displayed even
if this option is off.

Added check for TRACE_ITER_CONTEXT_INFO trace_flags.

With following commands:
	# echo function > ./current_tracer
	# echo 0 > options/context-info
	# cat trace

This is what it looked like before:
# tracer: function
#
#           TASK-PID    CPU#    TIMESTAMP  FUNCTION
#              | |       |          |         |
add_preempt_count <-schedule
rcu_note_context_switch <-schedule
...

This is what it looks like now:
# tracer: function
#
_raw_spin_unlock_irqrestore <-hrtimer_try_to_cancel
...

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1307113131-10045-4-git-send-email-jolsa@redhat.com
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-06-14 22:48:48 -04:00
Steven Rostedt cf30cf67d6 tracing: Add disable_on_free option
Add a trace option to disable tracing on free. When this option is
set, a write into the free_buffer file will not only shrink the
ring buffer down to zero, but it will also disable tracing.

Cc: Vaibhav Nagarnaik <vnagarnaik@google.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-06-14 22:48:45 -04:00
Vaibhav Nagarnaik 4f271a2a60 tracing: Add a proc file to stop tracing and free buffer
The proc file entry buffer_size_kb is used to set the size of tracing
buffer. The memory to expand the buffer size is kernel memory. Consider
a use case where tracing is handled by a user space utility, which acts
as a gate keeper for tracing requests. In an OOM condition, tracing is
considered a low priority task and if the utility gets killed the ring
buffer memory cannot be released back to the kernel.

This patch adds a proc file called "free_buffer" whose purpose is to
stop tracing and free up the ring buffer when it is closed.

The user space process can then set the desired size in buffer_size_kb
file and open the fd to the "free_buffer" file. Under OOM condition, if
the process gets killed, the kernel closes the file descriptor. The
release handler stops the tracing and releases the kernel memory
automatically.

Cc: Ingo Molnar <mingo@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Michael Rubin <mrubin@google.com>
Cc: David Sharp <dhsharp@google.com>
Signed-off-by: Vaibhav Nagarnaik <vnagarnaik@google.com>
Link: http://lkml.kernel.org/r/1308012717-11148-1-git-send-email-vnagarnaik@google.com
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-06-14 22:48:37 -04:00
Vaibhav Nagarnaik 7ea5906405 tracing: Use NUMA allocation for per-cpu ring buffer pages
The tracing ring buffer is a group of per-cpu ring buffers where
allocation and logging is done on a per-cpu basis. The events that are
generated on a particular CPU are logged in the corresponding buffer.
This is to provide wait-free writes between CPUs and good NUMA node
locality while accessing the ring buffer.

However, the allocation routines consider NUMA locality only for buffer
page metadata and not for the actual buffer page. This causes the pages
to be allocated on the NUMA node local to the CPU where the allocation
routine is running at the time.

This patch fixes the problem by using a NUMA node specific allocation
routine so that the pages are allocated from a NUMA node local to the
logging CPU.

I tested with the getuid_microbench from autotest. It is a simple binary
that calls getuid() in a loop and measures the average time for the
syscall to complete. The following command was used to test:
$ getuid_microbench 1000000

Compared the numbers found on kernel with and without this patch and
found that logging latency decreases by 30-50 ns/call.
tracing with non-NUMA allocation - 569 ns/call
tracing with NUMA allocation     - 512 ns/call

Signed-off-by: Vaibhav Nagarnaik <vnagarnaik@google.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Michael Rubin <mrubin@google.com>
Cc: David Sharp <dhsharp@google.com>
Link: http://lkml.kernel.org/r/1304470602-20366-1-git-send-email-vnagarnaik@google.com
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-06-14 22:04:39 -04:00
Vaibhav Nagarnaik e7e2ee89a9 tracing: Schedule a delayed work to call wakeup()
In using syscall tracing by concurrent processes, the wakeup() that is
called in the event commit function causes contention on the spin lock
of the waitqueue. I enabled sys_enter_getuid and sys_exit_getuid
tracepoints, and by running getuid_microbench from autotest in parallel
I found that the contention causes exponential latency increase in the
tracing path.

The autotest binary getuid_microbench calls getuid() in a tight loop for
the given number of iterations and measures the average time required to
complete a single invocation of syscall.

The patch schedules a delayed work after 2 ms once an event commit calls
to wake up the trace wait_queue. This removes the delay caused by
contention on spin lock in wakeup() and amortizes the wakeup() calls
scheduled over the 2 ms period.

In the following example, the script enables the sys_enter_getuid and
sys_exit_getuid tracepoints and runs the getuid_microbench in parallel
with the given number of processes. The output clearly shows the latency
increase caused by contentions.

$ ~/getuid.sh 1
1000000 calls in 0.720974253 s (720.974253 ns/call)

$ ~/getuid.sh 2
1000000 calls in 1.166457554 s (1166.457554 ns/call)
1000000 calls in 1.168933765 s (1168.933765 ns/call)

$ ~/getuid.sh 3
1000000 calls in 1.783827516 s (1783.827516 ns/call)
1000000 calls in 1.795553270 s (1795.553270 ns/call)
1000000 calls in 1.796493376 s (1796.493376 ns/call)

$ ~/getuid.sh 4
1000000 calls in 4.483041796 s (4483.041796 ns/call)
1000000 calls in 4.484165388 s (4484.165388 ns/call)
1000000 calls in 4.484850762 s (4484.850762 ns/call)
1000000 calls in 4.485643576 s (4485.643576 ns/call)

$ ~/getuid.sh 5
1000000 calls in 6.497521653 s (6497.521653 ns/call)
1000000 calls in 6.502000236 s (6502.000236 ns/call)
1000000 calls in 6.501709115 s (6501.709115 ns/call)
1000000 calls in 6.502124100 s (6502.124100 ns/call)
1000000 calls in 6.502936358 s (6502.936358 ns/call)

After the patch, the latencies scale better.
1000000 calls in 0.728720455 s (728.720455 ns/call)

1000000 calls in 0.842782857 s (842.782857 ns/call)
1000000 calls in 0.883803135 s (883.803135 ns/call)

1000000 calls in 0.902077764 s (902.077764 ns/call)
1000000 calls in 0.902838202 s (902.838202 ns/call)
1000000 calls in 0.908896885 s (908.896885 ns/call)

1000000 calls in 0.932523515 s (932.523515 ns/call)
1000000 calls in 0.958009672 s (958.009672 ns/call)
1000000 calls in 0.986188020 s (986.188020 ns/call)
1000000 calls in 0.989771102 s (989.771102 ns/call)

1000000 calls in 0.933518391 s (933.518391 ns/call)
1000000 calls in 0.958897947 s (958.897947 ns/call)
1000000 calls in 1.031038897 s (1031.038897 ns/call)
1000000 calls in 1.089516025 s (1089.516025 ns/call)
1000000 calls in 1.141998347 s (1141.998347 ns/call)

Signed-off-by: Vaibhav Nagarnaik <vnagarnaik@google.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Michael Rubin <mrubin@google.com>
Cc: David Sharp <dhsharp@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1305059241-7629-1-git-send-email-vnagarnaik@google.com
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-06-14 21:59:41 -04:00
Ingo Molnar 932fed4e2e Merge commit 'v2.6.39-rc7' into perf/core
Merge reason: pull in the latest fixes.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-10 17:05:45 +02:00
Arjan van de Ven a3a4a5acd3 Regression: partial revert "tracing: Remove lock_depth from event entry"
This partially reverts commit e6e1e25935.

That commit changed the structure layout of the trace structure, which
in turn broke PowerTOP (1.9x generation) quite badly.

I appreciate not wanting to expose the variable in question, and
PowerTOP was not using it, so I've replaced the variable with just a
padding field - that way if in the future a new field is needed it can
just use this padding field.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-06 13:20:59 -07:00
Ingo Molnar 32673822e4 Merge branch 'tip/perf/core' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into perf/core
Conflicts:
	include/linux/perf_event.h

Merge reason: pick up the latest jump-label enhancements, they are cooked ready.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-04-27 10:40:21 +02:00
Jiri Olsa ee5e51f51b tracing: Avoid soft lockup in trace_pipe
running following commands:

  # enable the binary option
  echo 1 > ./options/bin
  # disable context info option
  echo 0 > ./options/context-info
  # tracing only events
  echo 1 > ./events/enable
  cat trace_pipe

plus forcing system to generate many tracing events,
is causing lockup (in NON preemptive kernels) inside
tracing_read_pipe function.

The issue is also easily reproduced by running ltp stress test.
(ftrace_stress_test.sh)

The reasons are:
 - bin/hex/raw output functions for events are set to
   trace_nop_print function, which prints nothing and
   returns TRACE_TYPE_HANDLED value
 - LOST EVENT trace do not handle trace_seq overflow

These reasons force the while loop in tracing_read_pipe
function never to break.

The attached patch fixies handling of lost event trace, and
changes trace_nop_print to print minimal info, which is needed
for the correct tracing_read_pipe processing.

v2 changes:
 - omit the cond_resched changes by trace_nop_print changes
 - WARN changed to WARN_ONCE and added info to be able
   to find out the culprit

v3 changes:
 - make more accurate patch comment

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
LKML-Reference: <20110325110518.GC1922@jolsa.brq.redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-04-04 12:18:24 -04:00
Lucas De Marchi 25985edced Fix common misspellings
Fixes generated by 'codespell' and manually reviewed.

Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>
2011-03-31 11:26:23 -03:00
Steven Rostedt 4a0b1665db tracing: Fix irqoff selftest expanding max buffer
If the kernel command line declares a tracer "ftrace=sometracer" and
that tracer is either not defined or is enabled after irqsoff,
then the irqs off selftest will fail with the following error:

Testing tracer irqsoff:
------------[ cut here ]------------
WARNING: at /home/rostedt/work/autotest/nobackup/linux-test.git/kernel/trace/tra
ce.c:713 update_max_tr_single+0xfa/0x11b()
Hardware name:
Modules linked in:
Pid: 1, comm: swapper Not tainted 2.6.38-rc8-test #1
Call Trace:
 [<c0441d9d>] ? warn_slowpath_common+0x65/0x7a
 [<c049adb2>] ? update_max_tr_single+0xfa/0x11b
 [<c0441dc1>] ? warn_slowpath_null+0xf/0x13
 [<c049adb2>] ? update_max_tr_single+0xfa/0x11b
 [<c049e454>] ? stop_critical_timing+0x154/0x204
 [<c049b54b>] ? trace_selftest_startup_irqsoff+0x5b/0xc1
 [<c049b54b>] ? trace_selftest_startup_irqsoff+0x5b/0xc1
 [<c049b54b>] ? trace_selftest_startup_irqsoff+0x5b/0xc1
 [<c049e529>] ? time_hardirqs_on+0x25/0x28
 [<c0468bca>] ? trace_hardirqs_on_caller+0x18/0x12f
 [<c0468cec>] ? trace_hardirqs_on+0xb/0xd
 [<c049b54b>] ? trace_selftest_startup_irqsoff+0x5b/0xc1
 [<c049b6b8>] ? register_tracer+0xf8/0x1a3
 [<c14e93fe>] ? init_irqsoff_tracer+0xd/0x11
 [<c040115e>] ? do_one_initcall+0x71/0x121
 [<c14e93f1>] ? init_irqsoff_tracer+0x0/0x11
 [<c14ce3a9>] ? kernel_init+0x13a/0x1b6
 [<c14ce26f>] ? kernel_init+0x0/0x1b6
 [<c0403842>] ? kernel_thread_helper+0x6/0x10
---[ end trace e93713a9d40cd06c ]---
.. no entries found ..FAILED!

What happens is the "ftrace=..." will expand the ring buffer to its
default size (from its minimum size) but it will not expand the
max ring buffer (the ring buffer to store maximum latencies).
When the irqsoff test runs, it will call the ring buffer swap routine
that checks if the max ring buffer is the same size as the normal
ring buffer, and will fail if it is not. This causes the test to fail.

The solution is to expand the max ring buffer before running the self
test if the max ring buffer is used by that tracer and the normal ring
buffer is expanded. The max ring buffer should be shrunk again after
the test is done to save space.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-03-10 10:34:58 -05:00
Steven Rostedt e6e1e25935 tracing: Remove lock_depth from event entry
The lock_depth field in the event headers was added as a temporary
data point for help in removing the BKL. Now that the BKL is pretty
much been removed, we can remove this field.

This in turn changes the header from 12 bytes to 8 bytes,
removing the 4 byte buffer that gcc would insert if the first field
in the data load was 8 bytes in size.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-03-10 10:31:48 -05:00
David Sharp 750912fa36 tracing: Add an 'overwrite' trace_option.
Add an "overwrite" trace_option for ftrace to control whether the buffer should
be overwritten on overflow or not. The default remains to overwrite old events
when the buffer is full. This patch adds the option to instead discard newest
events when the buffer is full. This is useful to get a snapshot of traces just
after enabling traces. Dropping the current event is also a simpler code path.

Signed-off-by: David Sharp <dhsharp@google.com>
LKML-Reference: <1291844807-15481-1-git-send-email-dhsharp@google.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-03-09 13:52:27 -05:00
Steven Rostedt 6752ab4a9c tracing: Deprecate tracing_enabled for tracing_on
tracing_enabled should not be used, it is heavy weight and does not
do much in helping lower the overhead.

tracing_on should be used instead. Warn users to use tracing_on
when tracing_enabled is used as it will soon be removed from the
tracing directory.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-02-08 17:14:58 -05:00
Li Zefan 1dbd1951f3 tracing: Fix preempt count leak
While running my ftrace stress test, this showed up:

BUG: sleeping function called from invalid context at mm/mmap.c:233
...
note: cat[3293] exited with preempt_count 1

The bug was introduced by commit 91e86e560d
("tracing: Fix recursive user stack trace")

Cc: <stable@kernel.org>
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
LKML-Reference: <4D0089AC.1020802@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-01-07 23:20:02 -05:00
Ingo Molnar d949750fed Merge branch 'tip/perf/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into perf/urgent 2010-12-16 11:21:24 +01:00
Slava Pestov 364829b126 tracing: Fix panic when lseek() called on "trace" opened for writing
The file_ops struct for the "trace" special file defined llseek as seq_lseek().
However, if the file was opened for writing only, seq_open() was not called,
and the seek would dereference a null pointer, file->private_data.

This patch introduces a new wrapper for seq_lseek() which checks if the file
descriptor is opened for reading first. If not, it does nothing.

Cc: <stable@kernel.org>
Signed-off-by: Slava Pestov <slavapestov@google.com>
LKML-Reference: <1290640396-24179-1-git-send-email-slavapestov@google.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-11-30 12:18:17 -05:00
Linus Torvalds d2f30c73ab Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  perf symbols: Remove incorrect open-coded container_of()
  perf record: Handle restrictive permissions in /proc/{kallsyms,modules}
  x86/kprobes: Prevent kprobes to probe on save_args()
  irq_work: Drop cmpxchg() result
  perf: Fix owner-list vs exit
  x86, hw_nmi: Move backtrace_mask declaration under ARCH_HAS_NMI_WATCHDOG
  tracing: Fix recursive user stack trace
  perf,hw_breakpoint: Initialize hardware api earlier
  x86: Ignore trap bits on single step exceptions
  tracing: Force arch_local_irq_* notrace for paravirt
  tracing: Fix module use of trace_bprintk()
2010-11-27 07:28:17 +09:00
Arnd Bergmann 451a3c24b0 BKL: remove extraneous #include <smp_lock.h>
The big kernel lock has been removed from all these files at some point,
leaving only the #include.

Remove this too as a cleanup.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-11-17 08:59:32 -08:00
Steven Rostedt 91e86e560d tracing: Fix recursive user stack trace
The user stack trace can fault when examining the trace. Which
would call the do_page_fault handler, which would trace again,
which would do the user stack trace, which would fault and call
do_page_fault again ...

Thus this is causing a recursive bug. We need to have a recursion
detector here.

[ Resubmitted by Jiri Olsa ]

[ Eric Dumazet recommended using __this_cpu_* instead of __get_cpu_* ]

Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
LKML-Reference: <1289390172-9730-3-git-send-email-jolsa@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-11-12 21:20:08 -05:00
Steven Rostedt dd49a38cf3 tracing: Do not limit the size of the number of CPU buffers
The tracing per_cpu buffers were limited to 999 CPUs for a mear
savings in stack space of a char array. Up the array to 30 characters
which is more than enough to hold a 64 bit number.

Reported-by: Robin Holt <holt@sgi.com>
Suggested-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-10-21 08:55:06 -04:00
matt mooney 907f278409 tracing/trivial: Remove cast from void*
Unnecessary cast from void* in assignment.

Signed-off-by: matt mooney <mfm@muteddisk.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-10-18 10:53:22 -04:00
Steven Rostedt d244b6bd41 Merge branch 'tip/perf/urgent-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into trace/tip/perf/urgent-4
Conflicts:
	kernel/trace/trace_events.c

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-08-16 11:17:30 -04:00
Marcin Slusarz 1aa54bca6e tracing: Sanitize value returned from write(trace_marker, "...", len)
When userspace code writes non-new-line-terminated string to trace_marker
file, write handler appends new-line and returns number of bytes written
to trace buffer, so
write(fd, "abc", 3) will return 4

That's unexpected and unfortunately it confuses glibc's fprintf function.

Example:
int main() {
  fprintf(stderr, "abc");
  return 0;
}

$ gcc test.c -o test
$ echo mmiotrace > /sys/kernel/debug/tracing/current_tracer
$ ./test 2>/sys/kernel/debug/tracing/trace_marker

results in infinite loop:
write(fd, "abc", 3) = 4
write(fd, "", 1) = 0
write(fd, "", 1) = 0
write(fd, "", 1) = 0
write(fd, "", 1) = 0
write(fd, "", 1) = 0
write(fd, "", 1) = 0
write(fd, "", 1) = 0
(...)

...and kernel trace buffer full of empty markers.

Fix it by sanitizing write return value.

Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
LKML-Reference: <20100727231801.GB2826@joi.lan>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-08-13 15:23:16 -04:00
Linus Torvalds 78417334b5 Merge branch 'bkl/core' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing
* 'bkl/core' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing:
  do_coredump: Do not take BKL
  init: Remove the BKL from startup code
2010-08-07 17:06:54 -07:00
Linus Torvalds 4aed2fd8e3 Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (162 commits)
  tracing/kprobes: unregister_trace_probe needs to be called under mutex
  perf: expose event__process function
  perf events: Fix mmap offset determination
  perf, powerpc: fsl_emb: Restore setting perf_sample_data.period
  perf, powerpc: Convert the FSL driver to use local64_t
  perf tools: Don't keep unreferenced maps when unmaps are detected
  perf session: Invalidate last_match when removing threads from rb_tree
  perf session: Free the ref_reloc_sym memory at the right place
  x86,mmiotrace: Add support for tracing STOS instruction
  perf, sched migration: Librarize task states and event headers helpers
  perf, sched migration: Librarize the GUI class
  perf, sched migration: Make the GUI class client agnostic
  perf, sched migration: Make it vertically scrollable
  perf, sched migration: Parameterize cpu height and spacing
  perf, sched migration: Fix key bindings
  perf, sched migration: Ignore unhandled task states
  perf, sched migration: Handle ignored migrate out events
  perf: New migration tool overview
  tracing: Drop cpparg() macro
  perf: Use tracepoint_synchronize_unregister() to flush any pending tracepoint call
  ...

Fix up trivial conflicts in Makefile and drivers/cpufreq/cpufreq.c
2010-08-06 09:30:52 -07:00
Jason Wessel 955b61e597 ftrace,kdb: Extend kdb to be able to dump the ftrace buffer
Add in a helper function to allow the kdb shell to dump the ftrace
buffer.

Modify trace.c to expose the capability to iterate over the ftrace
buffer in a read only capacity.

Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
CC: Frederic Weisbecker <fweisbec@gmail.com>
2010-08-05 09:22:23 -05:00
Ingo Molnar 3a01736e70 Merge branch 'tip/perf/core' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into perf/core 2010-07-23 09:10:29 +02:00
Dan Carpenter 24a461d537 trace: strlen() return doesn't account for the NULL
We need to add one to the strlen() return because of the NULL
character.  The type->name here generally comes from the kernel and I
don't think any of them come close to being MAX_TRACER_SIZE (100)
characters long so this is basically a cleanup.

Signed-off-by: Dan Carpenter <error27@gmail.com>
LKML-Reference: <20100710100644.GV19184@bicker>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-07-22 14:56:41 -04:00
KOSAKI Motohiro ef710e100c tracing: Shrink max latency ringbuffer if unnecessary
Documentation/trace/ftrace.txt says

  buffer_size_kb:

        This sets or displays the number of kilobytes each CPU
        buffer can hold. The tracer buffers are the same size
        for each CPU. The displayed number is the size of the
        CPU buffer and not total size of all buffers. The
        trace buffers are allocated in pages (blocks of memory
        that the kernel uses for allocation, usually 4 KB in size).
        If the last page allocated has room for more bytes
        than requested, the rest of the page will be used,
        making the actual allocation bigger than requested.
        ( Note, the size may not be a multiple of the page size
          due to buffer management overhead. )

        This can only be updated when the current_tracer
        is set to "nop".

But it's incorrect. currently total memory consumption is
'buffer_size_kb x CPUs x 2'.

Why two times difference is there? because ftrace implicitly allocate
the buffer for max latency too.

That makes sad result when admin want to use large buffer. (If admin
want full logging and makes detail analysis). example, If admin
have 24 CPUs machine and write 200MB to buffer_size_kb, the system
consume ~10GB memory (200MB x 24 x 2). umm.. 5GB memory waste is
usually unacceptable.

Fortunatelly, almost all users don't use max latency feature.
The max latency buffer can be disabled easily.

This patch shrink buffer size of the max latency buffer if
unnecessary.

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
LKML-Reference: <20100701104554.DA2D.A69D9226@jp.fujitsu.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-07-21 10:20:17 -04:00
Li Zefan e870e9a124 tracing: Allow to disable cmdline recording
We found that even enabling a single trace event that will rarely be
triggered can add big overhead to context switch.

(lmbench context switch test)
 -------------------------------------------------
 2p/0K 2p/16K 2p/64K 8p/16K 8p/64K 16p/16K 16p/64K
 ctxsw  ctxsw  ctxsw ctxsw  ctxsw   ctxsw   ctxsw
------ ------ ------ ------ ------ ------- -------
  2.19   2.3   2.21   2.56   2.13     2.54    2.07
  2.39   2.51  2.35   2.75   2.27     2.81    2.24

The overhead is 6% ~ 11%.

It's because when a trace event is enabled 3 tracepoints (sched_switch,
sched_wakeup, sched_wakeup_new) will be activated to map pid to cmdname.

We'd like to avoid this overhead, so add a trace option '(no)record-cmd'
to allow to disable cmdline recording.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
LKML-Reference: <4C2D57F4.2050204@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-07-20 21:52:33 -04:00
Arnd Bergmann b444786f1a tracing: Use generic_file_llseek for debugfs
The default for llseek will change to no_llseek,
so the tracing debugfs files need to add explicit
.llseek assignments. Since we're dealing with regular
files from a VFS perspective, use generic_file_llseek.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: John Kacur <jkacur@redhat.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
LKML-Reference: <1278538820-1392-10-git-send-email-arnd@arndb.de>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2010-07-20 14:31:24 +02:00
Frederic Weisbecker eb7beb5c09 tracing: Remove special traces
Special traces type was only used by sysprof. Lets remove it now
that sysprof ftrace plugin has been dropped.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Soeren Sandmann <sandmann@daimi.au.dk>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Li Zefan <lizf@cn.fujitsu.com>
2010-07-20 14:31:07 +02:00
Frederic Weisbecker f376bf5ffb tracing: Remove sysprof ftrace plugin
The sysprof ftrace plugin doesn't seem to be seriously used
somewhere. There is a branch in the sysprof tree that makes
an interface to it, but the real sysprof tool uses either its
own module or perf events.

Drop the sysprof ftrace plugin then, as it's mostly useless.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Soeren Sandmann <sandmann@daimi.au.dk>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Li Zefan <lizf@cn.fujitsu.com>
2010-07-20 14:29:46 +02:00
Arnd Bergmann 5e3d20a68f init: Remove the BKL from startup code
I have shown by code review that no driver takes
the BKL at init time any more, so whatever the
init code was locking against is no longer there
and it is now safe to remove the BKL there.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Steven Rostedt <rostedt@goodmis>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2010-07-09 15:40:32 +02:00
Ingo Molnar c726b61c6a Merge branch 'perf/core' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing into perf/core 2010-06-09 18:55:57 +02:00
Américo Wang 30dbb20e68 tracing: Remove boot tracer
The boot tracer is useless. It simply logs the initcalls
but in fact these initcalls are also logged through printk
while using the initcall_debug kernel parameter.

Nobody seem to be using it so far. Then just remove it.

Signed-off-by: WANG Cong <xiyou.wangcong@gmail.com>
Cc: Chase Douglas <chase.douglas@canonical.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Li Zefan <lizf@cn.fujitsu.com>
LKML-Reference: <20100526105753.GA5677@cr0.nay.redhat.com>
[ remove the hooks in main.c, and the headers ]
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2010-06-08 23:31:28 +02:00
Ingo Molnar 6113e45f83 Merge branch 'tip/perf/core-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into perf/core 2010-06-08 19:34:40 +02:00
Steven Rostedt 5168ae50a6 tracing: Remove ftrace_preempt_disable/enable
The ftrace_preempt_disable/enable functions were to address a
recursive race caused by the function tracer. The function tracer
traces all functions which makes it easily susceptible to recursion.
One area was preempt_enable(). This would call the scheduler and
the schedulre would call the function tracer and loop.
(So was it thought).

The ftrace_preempt_disable/enable was made to protect against recursion
inside the scheduler by storing the NEED_RESCHED flag. If it was
set before the ftrace_preempt_disable() it would not call schedule
on ftrace_preempt_enable(), thinking that if it was set before then
it would have already scheduled unless it was already in the scheduler.

This worked fine except in the case of SMP, where another task would set
the NEED_RESCHED flag for a task on another CPU, and then kick off an
IPI to trigger it. This could cause the NEED_RESCHED to be saved at
ftrace_preempt_disable() but the IPI to arrive in the the preempt
disabled section. The ftrace_preempt_enable() would not call the scheduler
because the flag was already set before entring the section.

This bug would cause a missed preemption check and cause lower latencies.

Investigating further, I found that the recusion caused by the function
tracer was not due to schedule(), but due to preempt_schedule(). Now
that preempt_schedule is completely annotated with notrace, the recusion
no longer is an issue.

Reported-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-06-03 19:32:38 -04:00
Linus Torvalds bc7d352c5e Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  perf tui: Fix last use_browser problem related to .perfconfig
  perf symbols: Add the build id cache to the vmlinux path
  perf tui: Reset use_browser if stdout is not a tty
  ring-buffer: Move zeroing out excess in page to ring buffer code
  ring-buffer: Reset "real_end" when page is filled
2010-05-30 12:35:01 -07:00
Linus Torvalds c5617b200a Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (61 commits)
  tracing: Add __used annotation to event variable
  perf, trace: Fix !x86 build bug
  perf report: Support multiple events on the TUI
  perf annotate: Fix up usage of the build id cache
  x86/mmiotrace: Remove redundant instruction prefix checks
  perf annotate: Add TUI interface
  perf tui: Remove annotate from popup menu after failure
  perf report: Don't start the TUI if -D is used
  perf: Fix getline undeclared
  perf: Optimize perf_tp_event_match()
  perf: Remove more code from the fastpath
  perf: Optimize the !vmalloc backed buffer
  perf: Optimize perf_output_copy()
  perf: Fix wakeup storm for RO mmap()s
  perf-record: Share per-cpu buffers
  perf-record: Remove -M
  perf: Ensure that IOC_OUTPUT isn't used to create multi-writer buffers
  perf, trace: Optimize tracepoints by using per-tracepoint-per-cpu hlist to track events
  perf, trace: Optimize tracepoints by removing IRQ-disable from perf/tracepoint interaction
  perf tui: Allow disabling the TUI on a per command basis in ~/.perfconfig
  ...
2010-05-27 15:23:47 -07:00
Steven Rostedt 2711ca237a ring-buffer: Move zeroing out excess in page to ring buffer code
Currently the trace splice code zeros out the excess bytes in the page before
sending it off to userspace.

This is to make sure userspace is not getting anything it should not be
when reading the pages, because the excess data was never initialized
to zero before writing (for perfomance reasons).

But the splice code has no business in doing this work, it should be
done by the ring buffer. With the latest changes for recording lost
events, the splice code gets it wrong anyway.

Move the zeroing out of excess bytes into the ring buffer code.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-05-25 11:57:26 -04:00
Jens Axboe ee9a3607fb Merge branch 'master' into for-2.6.35
Conflicts:
	fs/ext3/fsync.c

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2010-05-21 21:27:26 +02:00
Jens Axboe 35f3d14dbb pipe: add support for shrinking and growing pipes
This patch adds F_GETPIPE_SZ and F_SETPIPE_SZ fcntl() actions for
growing and shrinking the size of a pipe and adjusts pipe.c and splice.c
(and relay and network splice) usage to work with these larger (or smaller)
pipes.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2010-05-21 21:12:40 +02:00
Steven Rostedt a9a5776380 tracing: Allow events to share their print functions
Multiple events may use the same method to print their data.
Instead of having all events have a pointer to their print funtions,
the trace_event structure now points to a trace_event_functions structure
that will hold the way to print ouf the event.

The event itself is now passed to the print function to let the print
function know what kind of event it should print.

This opens the door to consolidating the way several events print
their output.

   text	   data	    bss	    dec	    hex	filename
4913961	1088356	 861512	6863829	 68bbd5	vmlinux.orig
4900382	1048964	 861512	6810858	 67ecea	vmlinux.init
4900446	1049028	 861512	6810986	 67ed6a	vmlinux.preprint

This change slightly increases the size but is needed for the next change.

v3: Fix the branch tracer events to handle this change.

v2: Fix the new function graph tracer event calls to handle this change.

Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: Masami Hiramatsu <mhiramat@redhat.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-05-14 14:20:32 -04:00
David Miller 72c9ddfd4c ring-buffer: Make non-consuming read less expensive with lots of cpus.
When performing a non-consuming read, a synchronize_sched() is
performed once for every cpu which is actively tracing.

This is very expensive, and can make it take several seconds to open
up the 'trace' file with lots of cpus.

Only one synchronize_sched() call is actually necessary.  What is
desired is for all cpus to see the disabling state change.  So we
transform the existing sequence:

	for_each_cpu() {
		ring_buffer_read_start();
	}

where each ring_buffer_start() call performs a synchronize_sched(),
into the following:

	for_each_cpu() {
		ring_buffer_read_prepare();
	}
	ring_buffer_read_prepare_sync();
	for_each_cpu() {
		ring_buffer_read_start();
	}

wherein only the single ring_buffer_read_prepare_sync() call needs to
do the synchronize_sched().

The first phase, via ring_buffer_read_prepare(), allocates the 'iter'
memory and increments ->record_disabled.

In the second phase, ring_buffer_read_prepare_sync() makes sure this
->record_disabled state is visible fully to all cpus.

And in the final third phase, the ring_buffer_read_start() calls reset
the 'iter' objects allocated in the first phase since we now know that
none of the cpus are adding trace entries any more.

This makes openning the 'trace' file nearly instantaneous on a
sparc64 Niagara2 box with 128 cpus tracing.

Signed-off-by: David S. Miller <davem@davemloft.net>
LKML-Reference: <20100420.154711.11246950.davem@davemloft.net>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-04-27 13:06:35 -04:00
Jiri Olsa 62b915f106 tracing: Add graph output support for irqsoff tracer
Add function graph output to irqsoff tracer.

The graph output is enabled by setting new 'display-graph' trace option.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
LKML-Reference: <1270227683-14631-4-git-send-email-jolsa@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-04-27 12:36:53 -04:00
Frederic Weisbecker cecbca96da tracing: Dump either the oops's cpu source or all cpus buffers
The ftrace_dump_on_oops kernel parameter, sysctl and sysrq let one
dump every cpu buffers when an oops or panic happens.

It's nice when you have few cpus but it may take ages if have many,
plus you miss the real origin of the problem in all the cpu traces.

Sometimes, all you need is to dump the cpu buffer that triggered the
opps, most of the time it is our main interest.

This patch modifies ftrace_dump_on_oops to handle this choice.

The ftrace_dump_on_oops kernel parameter, when it comes alone, has
the same behaviour than before. But ftrace_dump_on_oops=orig_cpu
will only dump the buffer of the cpu that oops'ed.

Similarly, sysctl kernel.ftrace_dump_on_oops=1 and
echo 1 > /proc/sys/kernel/ftrace_dump_on_oops keep their previous
behaviour. But setting 2 jumps into cpu origin dump mode.

v2: Fix double setup
v3: Fix spelling issues reported by Randy Dunlap
v4: Also update __ftrace_dump in the selftests

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
2010-04-21 23:11:42 +02:00
Ingo Molnar b15c7b1cee Merge branch 'tip/tracing/core' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into tracing/core 2010-04-14 12:15:23 +02:00
Ingo Molnar c1ab9cab75 Merge branch 'linus' into tracing/core
Conflicts:
	include/linux/module.h
	kernel/module.c

Semantic conflict:
	include/trace/events/module.h

Merge reason: Resolve the conflict with upstream commit 5fbfb18 ("Fix up
              possibly racy module refcounting")

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-04-08 10:18:47 +02:00
Lai Jiangshan aa27497c2f tracing: Fix uninitialized variable of tracing/trace output
Because a local variable is not initialized, I got these
when I did 'cat tracing/trace'. (not trace_pipe):

CPU:0 [LOST 18446744071579453134 EVENTS]
              ps-3099  [000]   560.770221: lock_acquire: ffff880030865010 &(&dentry->d_lock)->rlock
CPU:0 [LOST 18446744071579453134 EVENTS]
              ps-3099  [000]   560.770221: lock_release: ffff880030865010 &(&dentry->d_lock)->rlock
CPU:0 [LOST 18446612133255294080 EVENTS]
              ps-3099  [000]   560.770221: lock_acquire: ffff880030865010 &(&dentry->d_lock)->rlock
CPU:0 [LOST 18446744071579453134 EVENTS]
              ps-3099  [000]   560.770222: lock_release: ffff880030865010 &(&dentry->d_lock)->rlock
CPU:0 [LOST 18446744071579453134 EVENTS]
              ps-3099  [000]   560.770222: lock_release: ffffffff816cfb98 dcache_lock

See peek_next_entry(), it does not set *lost_events when we 'cat tracing/trace'

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
LKML-Reference: <4BB9A929.2000303@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-04-05 11:01:22 -04:00
Steven Rostedt bc21b47842 tracing: Show the lost events in the trace_pipe output
Now that the ring buffer can keep track of where events are lost.
Use this information to the output of trace_pipe:

       hackbench-3588  [001]  1326.701660: lock_acquire: ffffffff816591e0 read rcu_read_lock
       hackbench-3588  [001]  1326.701661: lock_acquire: ffff88003f4091f0 &(&dentry->d_lock)->rlock
       hackbench-3588  [001]  1326.701664: lock_release: ffff88003f4091f0 &(&dentry->d_lock)->rlock
CPU:1 [LOST 673 EVENTS]
       hackbench-3588  [001]  1326.702711: kmem_cache_free: call_site=ffffffff81102b85 ptr=ffff880026d96738
       hackbench-3588  [001]  1326.702712: lock_release: ffff88003e1480a8 &mm->mmap_sem
       hackbench-3588  [001]  1326.702713: lock_acquire: ffff88003e1480a8 &mm->mmap_sem

Even works with the function graph tracer:

 2) ! 170.098 us  |                                            }
 2)   4.036 us    |                                            rcu_irq_exit();
 2)   3.657 us    |                                            idle_cpu();
 2) ! 190.301 us  |                                          }
CPU:2 [LOST 2196 EVENTS]
 2)   0.853 us    |                            } /* cancel_dirty_page */
 2)               |                            remove_from_page_cache() {
 2)   1.578 us    |                              _raw_spin_lock_irq();
 2)               |                              __remove_from_page_cache() {

Note, it does not work with the iterator "trace" file, since it requires
the use of consuming the page from the ring buffer to determine how many
events were lost, which the iterator does not do.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-03-31 22:57:06 -04:00
Steven Rostedt 66a8cb95ed ring-buffer: Add place holder recording of dropped events
Currently, when the ring buffer drops events, it does not record
the fact that it did so. It does inform the writer that the event
was dropped by returning a NULL event, but it does not put in any
place holder where the event was dropped.

This is not a trivial thing to add because the ring buffer mostly
runs in overwrite (flight recorder) mode. That is, when the ring
buffer is full, new data will overwrite old data.

In a produce/consumer mode, where new data is simply dropped when
the ring buffer is full, it is trivial to add the placeholder
for dropped events. When there's more room to write new data, then
a special event can be added to notify the reader about the dropped
events.

But in overwrite mode, any new write can overwrite events. A place
holder can not be inserted into the ring buffer since there never
may be room. A reader could also come in at anytime and miss the
placeholder.

Luckily, the way the ring buffer works, the read side can find out
if events were lost or not, and how many events. Everytime a write
takes place, if it overwrites the header page (the next read) it
updates a "overrun" variable that keeps track of the number of
lost events. When a reader swaps out a page from the ring buffer,
it can record this number, perfom the swap, and then check to
see if the number changed, and take the diff if it has, which would be
the number of events dropped. This can be stored by the reader
and returned to callers of the reader.

Since the reader page swap will fail if the writer moved the head
page since the time the reader page set up the swap, this gives room
to record the overruns without worrying about races. If the reader
sets up the pages, records the overrun, than performs the swap,
if the swap succeeds, then the overrun variable has not been
updated since the setup before the swap.

For binary readers of the ring buffer, a flag is set in the header
of each sub page (sub buffer) of the ring buffer. This flag is embedded
in the size field of the data on the sub buffer, in the 31st bit (the size
can be 32 or 64 bits depending on the architecture), but only 27
bits needs to be used for the actual size (less actually).

We could add a new field in the sub buffer header to also record the
number of events dropped since the last read, but this will change the
format of the binary ring buffer a bit too much. Perhaps this change can
be made if the information on the number of events dropped is considered
important enough.

Note, the notification of dropped events is only used by consuming reads
or peeking at the ring buffer. Iterating over the ring buffer does not
keep this information because the necessary data is only available when
a page swap is made, and the iterator does not swap out pages.

Cc: Robert Richter <robert.richter@amd.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: "Luis Claudio R. Goncalves" <lclaudio@uudg.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-03-31 22:57:04 -04:00
Tejun Heo 5a0e3ad6af include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files.  percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -> slab.h dependency is about to be removed.  Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability.  As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

  http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
  only the necessary includes are there.  ie. if only gfp is used,
  gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
  blocks and try to put the new include such that its order conforms
  to its surrounding.  It's put in the include block which contains
  core kernel includes, in the same order that the rest are ordered -
  alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
  doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
  because the file doesn't have fitting include block), it prints out
  an error message indicating which .h file needs to be added to the
  file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
   over 4000 files, deleting around 700 includes and adding ~480 gfp.h
   and ~3000 slab.h inclusions.  The script emitted errors for ~400
   files.

2. Each error was manually checked.  Some didn't need the inclusion,
   some needed manual addition while adding it to implementation .h or
   embedding .c file was more appropriate for others.  This step added
   inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
   from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
   e.g. lib/decompress_*.c used malloc/free() wrappers around slab
   APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
   editing them as sprinkling gfp.h and slab.h inclusions around .h
   files could easily lead to inclusion dependency hell.  Most gfp.h
   inclusion directives were ignored as stuff from gfp.h was usually
   wildly available and often used in preprocessor macros.  Each
   slab.h inclusion directive was examined and added manually as
   necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
   were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
   distributed build env didn't work with gcov compiles) and a few
   more options had to be turned off depending on archs to make things
   build (like ipr on powerpc/64 which failed due to missing writeq).

   * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
   * powerpc and powerpc64 SMP allmodconfig
   * sparc and sparc64 SMP allmodconfig
   * ia64 SMP allmodconfig
   * s390 SMP allmodconfig
   * alpha SMP allmodconfig
   * um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
   a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2010-03-30 22:02:32 +09:00
Linus Torvalds 8655e7e3dd Merge branch 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  tracing: Do not record user stack trace from NMI context
  tracing: Disable buffer switching when starting or stopping trace
  tracing: Use same local variable when resetting the ring buffer
  function-graph: Init curr_ret_stack with ret_stack
  ring-buffer: Move disabled check into preempt disable section
  function-graph: Add tracing_thresh support to function_graph tracer
  tracing: Update the comm field in the right variable in update_max_tr
  function-graph: Use comment notation for func names of dangling '}'
  function-graph: Fix unused reference to ftrace_set_func()
  tracing: Fix warning in s_next of trace file ops
  tracing: Include irqflags headers from trace clock
2010-03-13 14:40:50 -08:00
Steven Rostedt b6345879cc tracing: Do not record user stack trace from NMI context
A bug was found with Li Zefan's ftrace_stress_test that caused applications
to segfault during the test.

Placing a tracing_off() in the segfault code, and examining several
traces, I found that the following was always the case. The lock tracer
was enabled (lockdep being required) and userstack was enabled. Testing
this out, I just enabled the two, but that was not good enough. I needed
to run something else that could trigger it. Running a load like hackbench
did not work, but executing a new program would. The following would
trigger the segfault within seconds:

  # echo 1 > /debug/tracing/options/userstacktrace
  # echo 1 > /debug/tracing/events/lock/enable
  # while :; do ls > /dev/null ; done

Enabling the function graph tracer and looking at what was happening
I finally noticed that all cashes happened just after an NMI.

 1)               |    copy_user_handle_tail() {
 1)               |      bad_area_nosemaphore() {
 1)               |        __bad_area_nosemaphore() {
 1)               |          no_context() {
 1)               |            fixup_exception() {
 1)   0.319 us    |              search_exception_tables();
 1)   0.873 us    |            }
[...]
 1)   0.314 us    |  __rcu_read_unlock();
 1)   0.325 us    |    native_apic_mem_write();
 1)   0.943 us    |  }
 1)   0.304 us    |  rcu_nmi_exit();
[...]
 1)   0.479 us    |  find_vma();
 1)               |  bad_area() {
 1)               |    __bad_area() {

After capturing several traces of failures, all of them happened
after an NMI. Curious about this, I added a trace_printk() to the NMI
handler to read the regs->ip to see where the NMI happened. In which I
found out it was here:

ffffffff8135b660 <page_fault>:
ffffffff8135b660:       48 83 ec 78             sub    $0x78,%rsp
ffffffff8135b664:       e8 97 01 00 00          callq  ffffffff8135b800 <error_entry>

What was happening is that the NMI would happen at the place that a page
fault occurred. It would call rcu_read_lock() which was traced by
the lock events, and the user_stack_trace would run. This would trigger
a page fault inside the NMI. I do not see where the CR2 register is
saved or restored in NMI handling. This means that it would corrupt
the page fault handling that the NMI interrupted.

The reason the while loop of ls helped trigger the bug, was that
each execution of ls would cause lots of pages to be faulted in, and
increase the chances of the race happening.

The simple solution is to not allow user stack traces in NMI context.
After this patch, I ran the above "ls" test for a couple of hours
without any issues. Without this patch, the bug would trigger in less
than a minute.

Cc: stable@kernel.org
Reported-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-03-12 20:31:49 -05:00
Steven Rostedt a2f8071428 tracing: Disable buffer switching when starting or stopping trace
When the trace iterator is read, tracing_start() and tracing_stop()
is called to stop tracing while the iterator is processing the trace
output.

These functions disable both the standard buffer and the max latency
buffer. But if the wakeup tracer is running, it can switch these
buffers between the two disables:

  buffer = global_trace.buffer;
  if (buffer)
      ring_buffer_record_disable(buffer);

      <<<--------- swap happens here

  buffer = max_tr.buffer;
  if (buffer)
      ring_buffer_record_disable(buffer);

What happens is that we disabled the same buffer twice. On tracing_start()
we can enable the same buffer twice. All ring_buffer_record_disable()
must be matched with a ring_buffer_record_enable() or the buffer
can be disable permanently, or enable prematurely, and cause a bug
where a reset happens while a trace is commiting.

This patch protects these two by taking the ftrace_max_lock to prevent
a switch from occurring.

Found with Li Zefan's ftrace_stress_test.

Cc: stable@kernel.org
Reported-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-03-12 20:30:21 -05:00
Steven Rostedt 283740c619 tracing: Use same local variable when resetting the ring buffer
In the ftrace code that resets the ring buffer it references the
buffer with a local variable, but then uses the tr->buffer as the
parameter to reset. If the wakeup tracer is running, which can
switch the tr->buffer with the max saved buffer, this can break
the requirement of disabling the buffer before the reset.

   buffer = tr->buffer;
   ring_buffer_record_disable(buffer);
   synchronize_sched();
   __tracing_reset(tr->buffer, cpu);

If the tr->buffer is swapped, then the reset is not happening to the
buffer that was disabled. This will cause the ring buffer to fail.

Found with Li Zefan's ftrace_stress_test.

Cc: stable@kernel.org
Reported-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-03-12 20:29:20 -05:00
Ingo Molnar 915a0b575f Merge branch 'tip/tracing/core' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into tracing/urgent 2010-03-11 13:39:33 +01:00
Tim Bird 0e95017355 function-graph: Add tracing_thresh support to function_graph tracer
Add support for tracing_thresh to the function_graph tracer.  This
version of this feature isolates the checks into new entry and
return functions, to avoid adding more conditional code into the
main function_graph paths.

When the tracing_thresh is set and the function graph tracer is
enabled, only the functions that took longer than the time in
microseconds that was set in tracing_thresh are recorded. To do this
efficiently, only the function exits are recorded:

 [tracing]# echo 100 > tracing_thresh
 [tracing]# echo function_graph > current_tracer
 [tracing]# cat trace
 # tracer: function_graph
 #
 # CPU  DURATION                  FUNCTION CALLS
 # |     |   |                     |   |   |   |
  1) ! 119.214 us  |  } /* smp_apic_timer_interrupt */
  1)   <========== |
  0) ! 101.527 us  |              } /* __rcu_process_callbacks */
  0) ! 126.461 us  |            } /* rcu_process_callbacks */
  0) ! 145.111 us  |          } /* __do_softirq */
  0) ! 149.667 us  |        } /* do_softirq */
  0) ! 168.817 us  |      } /* irq_exit */
  0) ! 248.254 us  |    } /* smp_apic_timer_interrupt */

Also, add support for specifying tracing_thresh on the kernel
command line.  When used like so: "tracing_thresh=200 ftrace=function_graph"
this can be used to analyse system startup.  It is important to disable
tracing soon after boot, in order to avoid losing the trace data.

Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Tim Bird <tim.bird@am.sony.com>
LKML-Reference: <4B87098B.4040308@am.sony.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-03-05 21:20:57 -05:00
Arnaldo Carvalho de Melo 1acaa1b2d9 tracing: Update the comm field in the right variable in update_max_tr
The latency output showed:

 #    | task: -3 (uid:0 nice:0 policy:1 rt_prio:99)

The comm is missing in the "task:" and it looks like a minus 3 is
the output. The correct display should be:

 #    | task: migration/0-3 (uid:0 nice:0 policy:1 rt_prio:99)

The problem is that the comm is being stored in the wrong data
structure. The max_tr.data[cpu] is what stores the comm, not the
tr->data[cpu].

Before this patch the max_tr.data[cpu]->comm was zeroed and the /debug/trace
ended up showing just the '-' sign followed by the pid.

Also remove a needless initialization of max_data.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <1267824230-23861-1-git-send-email-acme@infradead.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-03-05 21:12:08 -05:00
Linus Torvalds 0a135ba14d Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu:
  percpu: add __percpu sparse annotations to what's left
  percpu: add __percpu sparse annotations to fs
  percpu: add __percpu sparse annotations to core kernel subsystems
  local_t: Remove leftover local.h
  this_cpu: Remove pageset_notifier
  this_cpu: Page allocator conversion
  percpu, x86: Generic inc / dec percpu instructions
  local_t: Move local.h include to ringbuffer.c and ring_buffer_benchmark.c
  module: Use this_cpu_xx to dynamically allocate counters
  local_t: Remove cpu_local_xx macros
  percpu: refactor the code in pcpu_[de]populate_chunk()
  percpu: remove compile warnings caused by __verify_pcpu_ptr()
  percpu: make accessors check for percpu pointer in sparse
  percpu: add __percpu for sparse.
  percpu: make access macros universal
  percpu: remove per_cpu__ prefix.
2010-03-03 07:34:18 -08:00
Lai Jiangshan ac91d85456 tracing: Fix warning in s_next of trace file ops
This warning in s_next() can be triggered by lseek():
 [<c018b3f7>] ? s_next+0x77/0x80
 [<c013e3c1>] warn_slowpath_common+0x81/0xa0
 [<c018b3f7>] ? s_next+0x77/0x80
 [<c013e3fa>] warn_slowpath_null+0x1a/0x20
 [<c018b3f7>] s_next+0x77/0x80
 [<c01efa77>] traverse+0x117/0x200
 [<c01eff13>] seq_lseek+0xa3/0x120
 [<c01efe70>] ? seq_lseek+0x0/0x120
 [<c01d7081>] vfs_llseek+0x41/0x50
 [<c01d8116>] sys_llseek+0x66/0xa0
 [<c0102bd0>] sysenter_do_call+0x12/0x26

The iterator "leftover" variable is zeroed in the opening of the trace
file. But lseek can call s_start() which will call s_next() without
reseting the "leftover" variable back to zero, which might trigger
the WARN_ON_ONCE(iter->leftover) that is in s_next().

Cc: stable@kernel.org
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
LKML-Reference: <4B8CE06A.9090207@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-03-02 21:11:47 -05:00
Ingo Molnar 64b9fb5704 Merge commit 'v2.6.33' into tracing/core
Conflicts:
	scripts/recordmcount.pl

Merge reason: Merge up to v2.6.33.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-02-26 09:18:32 +01:00
Tejun Heo ab386128f2 Merge branch 'master' into percpu 2010-02-02 14:38:15 +09:00
Steven Rostedt 74bf4076f2 tracing: Prevent kernel oops with corrupted buffer
If the contents of the ftrace ring buffer gets corrupted and the trace
file is read, it could create a kernel oops (usualy just killing the user
task thread). This is caused by the checking of the pid in the buffer.
If the pid is negative, it still references the cmdline cache array,
which could point to an invalid address.

The simple fix is to test for negative PIDs.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-01-25 15:11:53 -05:00
Steven Rostedt d931369b74 tracing: Add stack dump to trace_printk if stacktrace option is set
If the ftrace stacktrace option is set, then add the stack dumps to
trace_printk.

Requested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-01-06 18:09:57 -05:00
Lai Jiangshan 7e53bd42d1 tracing: Consolidate protection of reader access to the ring buffer
At the beginning, access to the ring buffer was fully serialized
by trace_types_lock. Patch d7350c3f45 gives more freedom to readers,
and patch b04cc6b1f6 adds code to protect trace_pipe and cpu#/trace_pipe.

But actually it is not enough, ring buffer readers are not always
read-only, they may consume data.

This patch makes accesses to trace, trace_pipe, trace_pipe_raw
cpu#/trace, cpu#/trace_pipe and cpu#/trace_pipe_raw serialized.
And removes tracing_reader_cpumask which is used to protect trace_pipe.

Details:

Ring buffer serializes readers, but it is low level protection.
The validity of the events (which returns by ring_buffer_peek() ..etc)
are not protected by ring buffer.

The content of events may become garbage if we allow another process to consume
these events concurrently:
  A) the page of the consumed events may become a normal page
     (not reader page) in ring buffer, and this page will be rewritten
     by the events producer.
  B) The page of the consumed events may become a page for splice_read,
     and this page will be returned to system.

This patch adds trace_access_lock() and trace_access_unlock() primitives.

These primitives allow multi process access to different cpu ring buffers
concurrently.

These primitives don't distinguish read-only and read-consume access.
Multi read-only access is also serialized.

And we don't use these primitives when we open files,
we only use them when we read files.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
LKML-Reference: <4B447D52.1050602@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-01-06 12:51:34 -05:00
Tejun Heo 32032df6c2 Merge branch 'master' into percpu
Conflicts:
	arch/powerpc/platforms/pseries/hvCall.S
	include/linux/percpu.h
2010-01-05 09:17:33 +09:00
Linus Torvalds b21c070403 Merge branch 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  tracing: Fix sign fields in ftrace_define_fields_##call()
  tracing/syscalls: Fix typo in SYSCALL_DEFINE0
  tracing/kprobe: Show sign of fields in trace_kprobe format files
  ksym_tracer: Remove trace_stat
  ksym_tracer: Fix race when incrementing count
  ksym_tracer: Fix to allow writing newline to ksym_trace_filter
  ksym_tracer: Fix to make the tracer work
  tracing: Kconfig spelling fixes and cleanups
  tracing: Fix setting tracer specific options
  Documentation: Update ftrace-design.txt
  Documentation: Update tracepoint-analysis.txt
  Documentation: Update mmiotrace.txt
2009-12-31 11:52:01 -08:00
Steven Rostedt c757bea93b tracing: Fix setting tracer specific options
The function __set_tracer_option() takes as its last parameter a
"neg" value. If set it should negate the value of the option.

The trace_options_write() passed the value written to the file
which is what the new value needs to be set as. But since this
is not the negative, it never sets the value.

Reported-by: Peter Zijlstra <peterz@infradead.org>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-12-21 22:35:16 -05:00
Linus Torvalds 5a865c0606 Merge branch 'for-33' of git://repo.or.cz/linux-kbuild
* 'for-33' of git://repo.or.cz/linux-kbuild: (29 commits)
  net: fix for utsrelease.h moving to generated
  gen_init_cpio: fixed fwrite warning
  kbuild: fix make clean after mismerge
  kbuild: generate modules.builtin
  genksyms: properly consider  EXPORT_UNUSED_SYMBOL{,_GPL}()
  score: add asm/asm-offsets.h wrapper
  unifdef: update to upstream revision 1.190
  kbuild: specify absolute paths for cscope
  kbuild: create include/generated in silentoldconfig
  scripts/package: deb-pkg: use fakeroot if available
  scripts/package: add KBUILD_PKG_ROOTCMD variable
  scripts/package: tar-pkg: use tar --owner=root
  Kbuild: clean up marker
  net: add net_tstamp.h to headers_install
  kbuild: move utsrelease.h to include/generated
  kbuild: move autoconf.h to include/generated
  drop explicit include of autoconf.h
  kbuild: move compile.h to include/generated
  kbuild: drop include/asm
  kbuild: do not check for include/asm-$ARCH
  ...

Fixed non-conflicting clean merge of modpost.c as per comments from
Stephen Rothwell (modpost.c had grown an include of linux/autoconf.h
that needed to be changed to generated/autoconf.h)
2009-12-17 07:23:42 -08:00
Linus Torvalds da184a8064 Merge branch 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  tracing: Fix return of trace_dump_stack()
  ksym_tracer: Fix bad cast
  tracing/power: Remove two exports
  tracing: Change event->profile_count to be int type
  tracing: Simplify trace_option_write()
  tracing: Remove useless trace option
  tracing: Use seq file for trace_clock
  tracing: Use seq file for trace_options
  function-graph: Allow writing the same val to set_graph_function
  ftrace: Call trace_parser_clear() properly
  ftrace: Return EINVAL when writing invalid val to set_ftrace_filter
  tracing: Move a printk out of ftrace_raw_reg_event_foo()
  tracing: Pull up calls to trace_define_common_fields()
  tracing: Extract duplicate ftrace_raw_init_event_foo()
  ftrace.h: Use common pr_info fmt string
  tracing: Add stack trace to irqsoff tracer
  tracing: Add trace_dump_stack()
  ring-buffer: Move resize integrity check under reader lock
  ring-buffer: Use sync sched protection on ring buffer resizing
  tracing: Fix wrong usage of strstrip in trace_ksyms
2009-12-16 12:02:25 -08:00
Alexey Dobriyan 28dfef8feb const: constify remaining pipe_buf_operations
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-16 07:20:05 -08:00
Steven Rostedt e36c54582c tracing: Fix return of trace_dump_stack()
The trace_dump_stack() returned a value for a void function.

Also, added the missing stub for trace_dump_stack() when tracing is
not configured.

Reported-by: Ingo Molnar <mingo@elte.hu>
LKML-Reference: <20091214162713.GA31060@elte.hu>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-15 08:36:11 +01:00
Thomas Gleixner 0199c4e68d locking: Convert __raw_spin* functions to arch_spin*
Name space cleanup. No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: linux-arch@vger.kernel.org
2009-12-14 23:55:32 +01:00
Thomas Gleixner edc35bd72e locking: Rename __RAW_SPIN_LOCK_UNLOCKED to __ARCH_SPIN_LOCK_UNLOCKED
Further name space cleanup. No functional change

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: linux-arch@vger.kernel.org
2009-12-14 23:55:32 +01:00
Thomas Gleixner 445c89514b locking: Convert raw_spinlock to arch_spinlock
The raw_spin* namespace was taken by lockdep for the architecture
specific implementations. raw_spin_* would be the ideal name space for
the spinlocks which are not converted to sleeping locks in preempt-rt.

Linus suggested to convert the raw_ to arch_ locks and cleanup the
name space instead of using an artifical name like core_spin,
atomic_spin or whatever

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: linux-arch@vger.kernel.org
2009-12-14 23:55:32 +01:00
Linus Torvalds d0316554d3 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (34 commits)
  m68k: rename global variable vmalloc_end to m68k_vmalloc_end
  percpu: add missing per_cpu_ptr_to_phys() definition for UP
  percpu: Fix kdump failure if booted with percpu_alloc=page
  percpu: make misc percpu symbols unique
  percpu: make percpu symbols in ia64 unique
  percpu: make percpu symbols in powerpc unique
  percpu: make percpu symbols in x86 unique
  percpu: make percpu symbols in xen unique
  percpu: make percpu symbols in cpufreq unique
  percpu: make percpu symbols in oprofile unique
  percpu: make percpu symbols in tracer unique
  percpu: make percpu symbols under kernel/ and mm/ unique
  percpu: remove some sparse warnings
  percpu: make alloc_percpu() handle array types
  vmalloc: fix use of non-existent percpu variable in put_cpu_var()
  this_cpu: Use this_cpu_xx in trace_functions_graph.c
  this_cpu: Use this_cpu_xx for ftrace
  this_cpu: Use this_cpu_xx in nmi handling
  this_cpu: Use this_cpu operations in RCU
  this_cpu: Use this_cpu ops for VM statistics
  ...

Fix up trivial (famous last words) global per-cpu naming conflicts in
	arch/x86/kvm/svm.c
	mm/slab.c
2009-12-14 09:58:24 -08:00
Ingo Molnar 0087aabd6a Merge branch 'tip/tracing/core' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into tracing/urgent 2009-12-14 17:12:37 +01:00
Li Zefan 8d18eaaff5 tracing: Simplify trace_option_write()
- remove duplicate code inside trace_options_write()
- extract duplicate code in trace_options_write() and set_tracer_option()

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <4B1DC532.9010802@cn.fujitsu.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2009-12-13 18:37:28 +01:00
Li Zefan 2cbafd68b8 tracing: Remove useless trace option
Since commit 4d9493c90f
("ftrace: remove add-hoc code"), option "sched-tree"
has become useless.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <4B1DC50A.7040402@cn.fujitsu.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2009-12-13 18:37:27 +01:00
Li Zefan 13f16d2091 tracing: Use seq file for trace_clock
The buffer for the output is as small as 64 bytes, so it'll
overflow if we add more clock type. Use seq file instead.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <4B1DC4FB.5030407@cn.fujitsu.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2009-12-13 18:37:27 +01:00
Li Zefan fdb372ed4c tracing: Use seq file for trace_options
Code simplification for reading trace_options.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
LKML-reference: <4B1DC4EF.3090106@cn.fujitsu.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2009-12-13 18:37:27 +01:00
Sam Ravnborg 273b281fa2 kbuild: move utsrelease.h to include/generated
Fix up all users of utsrelease.h

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Michal Marek <mmarek@suse.cz>
2009-12-12 13:08:15 +01:00
Steven Rostedt 03889384ce tracing: Add trace_dump_stack()
I've been asked a few times about how to find out what is calling
some location in the kernel. One way is to use dynamic function tracing
and implement the func_stack_trace. But this only finds out who is
calling a particular function. It does not tell you who is calling
that function and entering a specific if conditional.

I have myself implemented a quick version of trace_dump_stack() for
this purpose a few times, and just needed it now. This is when I realized
that this would be a good tool to have in the kernel like trace_printk().

Using trace_dump_stack() is similar to dump_stack() except that it
writes to the trace buffer instead and can be used in critical locations.

For example:

@@ -5485,8 +5485,12 @@ need_resched_nonpreemptible:
 	if (prev->state && !(preempt_count() & PREEMPT_ACTIVE)) {
 		if (unlikely(signal_pending_state(prev->state, prev)))
 			prev->state = TASK_RUNNING;
-		else
+		else {
 			deactivate_task(rq, prev, 1);
+			trace_printk("Deactivating task %s:%d\n",
+				     prev->comm, prev->pid);
+			trace_dump_stack();
+		}
 		switch_count = &prev->nvcsw;
 	}

Produces:

           <...>-3249  [001]   296.105269: schedule: Deactivating task ntpd:3249
           <...>-3249  [001]   296.105270: <stack trace>
 => schedule
 => schedule_hrtimeout_range
 => poll_schedule_timeout
 => do_select
 => core_sys_select
 => sys_select
 => system_call_fastpath

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-12-11 10:38:47 -05:00
Carsten Emde f2942487ff tracing: Remove comparing of NULL to va_list in trace_array_vprintk()
Olof Johansson stated the following:

  Comparing a va_list with NULL is bogus. It's supposed to be treated like
  an opaque type and only be manipulated with va_* accessors.

Olof noticed that this code broke the ARM builds:

    kernel/trace/trace.c: In function 'trace_array_vprintk':
    kernel/trace/trace.c:1364: error: invalid operands to binary == (have 'va_list' and 'void *')
    kernel/trace/trace.c: In function 'tracing_mark_write':
    kernel/trace/trace.c:3349: error: incompatible type for argument 3 of 'trace_vprintk'

This patch partly reverts c13d2f7c32 and
re-installs the original mark_printk() mechanism.

Reported-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Carsten Emde <C.Emde@osadl.org>
LKML-Reference: <4B1BAB74.104@osadl.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-12-09 14:20:08 -05:00
Steven Rostedt a63ce5b306 tracing: Buffer the output of seq_file in case of filled buffer
If the seq_read fills the buffer it will call s_start again on the next
itertation with the same position. This causes a problem with the
function_graph tracer because it consumes the iteration in order to
determine leaf functions.

What happens is that the iterator stores the entry, and the function
graph plugin will look at the next entry. If that next entry is a return
of the same function and task, then the function is a leaf and the
function_graph plugin calls ring_buffer_read which moves the ring buffer
iterator forward (the trace iterator still points to the function start
entry).

The copying of the trace_seq to the seq_file buffer will fail if the
seq_file buffer is full. The seq_read will not show this entry.
The next read by userspace will cause seq_read to again call s_start
which will reuse the trace iterator entry (the function start entry).
But the function return entry was already consumed. The function graph
plugin will think that this entry is a nested function and not a leaf.

To solve this, the trace code now checks the return status of the
seq_printf (trace_print_seq). If the writing to the seq_file buffer
fails, we set a flag in the iterator (leftover) and we do not reset
the trace_seq buffer. On the next call to s_start, we check the leftover
flag, and if it is set, we just reuse the trace_seq buffer and do not
call into the plugin print functions.

Before this patch:

 2)               |      fput() {
 2)               |        __fput() {
 2)   0.550 us    |          inotify_inode_queue_event();
 2)               |          __fsnotify_parent() {
 2)   0.540 us    |          inotify_dentry_parent_queue_event();

After the patch:

 2)               |      fput() {
 2)               |        __fput() {
 2)   0.550 us    |          inotify_inode_queue_event();
 2)   0.548 us    |          __fsnotify_parent();
 2)   0.540 us    |          inotify_dentry_parent_queue_event();

[
  Updated the patch to fix a missing return 0 from the trace_print_seq()
  stub when CONFIG_TRACING is disabled.

  Reported-by: Ingo Molnar <mingo@elte.hu>
]

Reported-by: Jiri Olsa <jolsa@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-12-09 13:55:26 -05:00
Steven Rostedt 29bf4a5e3f tracing: Only call pipe_close if pipe_close is defined
This fixes a cut and paste error that had pipe_close get called
if pipe_open was defined (not pipe_close).

Reported-by: Kosaki Motohiro <kosaki.motohiro@jp.fujitsu.com>
LKML-Reference: <20091209153204.F4CD.A69D9226@jp.fujitsu.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-12-09 12:47:35 -05:00
Steven Rostedt c521efd170 tracing: Add pipe_close interface
An ftrace plugin can add a pipe_open interface when the user opens
trace_pipe. But if the plugin allocates something within the pipe_open
it can not free it because there exists no pipe_close. The hook to
the trace file open has a corresponding close. The closing of the
trace_pipe file should also have a corresponding close.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-12-07 12:01:35 -05:00
Linus Torvalds 96fa2b508d Merge branch 'tracing-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'tracing-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (40 commits)
  tracing: Separate raw syscall from syscall tracer
  ring-buffer-benchmark: Add parameters to set produce/consumer priorities
  tracing, function tracer: Clean up strstrip() usage
  ring-buffer benchmark: Run producer/consumer threads at nice +19
  tracing: Remove the stale include/trace/power.h
  tracing: Only print objcopy version warning once from recordmcount
  tracing: Prevent build warning: 'ftrace_graph_buf' defined but not used
  ring-buffer: Move access to commit_page up into function used
  tracing: do not disable interrupts for trace_clock_local
  ring-buffer: Add multiple iterations between benchmark timestamps
  kprobes: Sanitize struct kretprobe_instance allocations
  tracing: Fix to use __always_unused attribute
  compiler: Introduce __always_unused
  tracing: Exit with error if a weak function is used in recordmcount.pl
  tracing: Move conditional into update_funcs() in recordmcount.pl
  tracing: Add regex for weak functions in recordmcount.pl
  tracing: Move mcount section search to front of loop in recordmcount.pl
  tracing: Fix objcopy revision check in recordmcount.pl
  tracing: Check absolute path of input file in recordmcount.pl
  tracing: Correct the check for number of arguments in recordmcount.pl
  ...
2009-12-05 09:53:36 -08:00
Carsten Emde c13d2f7c32 tracing: Fix trace_marker output
When a string was written to <debugfs>/tracing/trace_marker, some
strange characters appeared in the trace output instead of the
string, since a vprint function erroneously called a vararg print
function with a va_list argument. This patch fixes the problem and
simplifies the related code.

Signed-off-by: Carsten Emde <C.Emde@osadl.org>
LKML-Reference: <4B01AE5D.1010801@osadl.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-11-17 09:19:06 -05:00
Roel Kluin a646365cc3 tracing: Fix return value of tracing_stats_read()
The function tracing_stats_read() mistakenly returns ENOMEM instead
of the negative value -ENOMEM.

Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
LKML-Reference: <4AFB2C0B.50605@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-11-11 21:26:55 -05:00
Rusty Russell dd17c8f729 percpu: remove per_cpu__ prefix.
Now that the return from alloc_percpu is compatible with the address
of per-cpu vars, it makes sense to hand around the address of per-cpu
variables.  To make this sane, we remove the per_cpu__ prefix we used
created to stop people accidentally using these vars directly.

Now we have sparse, we can use that (next patch).

tj: * Updated to convert stuff which were missed by or added after the
      original patch.

    * Kill per_cpu_var() macro.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Christoph Lameter <cl@linux-foundation.org>
2009-10-29 22:34:15 +09:00
Tejun Heo 9705f69ed0 percpu: make percpu symbols in tracer unique
This patch updates percpu related symbols in kernel tracer such that
percpu symbols are unique and don't clash with local symbols.  This
serves two purposes of decreasing the possibility of global percpu
symbol collision and allowing dropping per_cpu__ prefix from percpu
symbols.

* kernel/trace/trace.c: s/max_data/max_tr_data/
* kernel/trace/trace_hw_branches: s/tracer/hwb_tracer/, s/buffer/hwb_buffer/

Partly based on Rusty Russell's "alloc_percpu: rename percpu vars
which cause name clashes" patch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
2009-10-29 22:34:13 +09:00
Ingo Molnar 9de09ace8d Merge branch 'tracing/urgent' into tracing/core
Merge reason: Pick up fixes and move base from -rc1 to -rc5.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-29 09:02:20 +01:00
Jiri Olsa cf8517cf90 tracing: Update *ppos instead of filp->f_pos
Instead of directly updating filp->f_pos we should update the *ppos
argument. The filp->f_pos gets updated within the file_pos_write()
function called from sys_write().

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <20091023233646.399670810@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-24 11:07:49 +02:00
Frederic Weisbecker 1beee96bae ftrace: Rename set_bootup_ftrace into set_cmdline_ftrace
set_cmdline_ftrace is a better match against what does this function:
apply a tracer name from the kernel command line.

Reported-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
2009-10-14 20:55:55 +02:00
Ingo Molnar 1bac0497ef Merge branch 'tracing/core' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing into tracing/core 2009-10-13 12:03:08 +02:00
Frederic Weisbecker aef6f81b55 tracing: Rename set_ftrace to set_bootup_ftrace
Do this rename because set_ftrace is too much generic and not enough
self-explainable as a name.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Li Zefan <lizf@cn.fujitsu.com>
2009-10-13 09:32:57 +02:00
Christoph Lameter 9288f99aa5 this_cpu: Use this_cpu_xx for ftrace
this_cpu_xx can reduce the instruction count here and also
avoid address arithmetic.

Signed-off-by: Christoph Lameter <cl@linux-foundation.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Tejun Heo <tj@kernel.org>
2009-10-12 19:51:49 +09:00
Steven Rostedt a813a15976 tracing: fix trace_vprintk call
The addition of trace_array_{v}printk used the wrong function for
trace_vprintk to call. This broke trace_marker and trace_vprintk
itself. Although trace_printk may not have been affected by those
that end up calling trace_vbprintk.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-10-09 01:41:35 -04:00
Linus Torvalds 4187e7e9f1 Merge branch 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  modules, tracing: Remove stale struct marker signature from module_layout()
  tracing/workqueue: Use %pf in workqueue trace events
  tracing: Fix a comment and a trivial format issue in tracepoint.h
  tracing: Fix failure path in ftrace_regex_open()
  tracing: Fix failure path in ftrace_graph_write()
  tracing: Check the return value of trace_get_user()
  tracing: Fix off-by-one in trace_get_user()
2009-09-26 10:13:54 -07:00
Li Zefan 79f5599772 cpumask: use zalloc_cpumask_var() where possible
Remove open-coded zalloc_cpumask_var() and zalloc_cpumask_var_node().

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-09-24 09:34:24 +09:30
James Morris 88e9d34c72 seq_file: constify seq_operations
Make all seq_operations structs const, to help mitigate against
revectoring user-triggerable function pointers.

This is derived from the grsecurity patch, although generated from scratch
because it's simpler than extracting the changes from there.

Signed-off-by: James Morris <jmorris@namei.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-23 07:39:29 -07:00
Li Zefan 3c235a337e tracing: Fix off-by-one in trace_get_user()
Leave the last slot for the tailing '\0'.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <4AB865FA.5080801@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-22 10:28:53 +02:00
Linus Torvalds 8e4bc3dd2c Merge branch 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  sched: Simplify sys_sched_rr_get_interval() system call
  sched: Fix potential NULL derference of doms_cur
  sched: Fix raciness in runqueue_is_locked()
  sched: Re-add lost cpu_allowed check to sched_fair.c::select_task_rq_fair()
  sched: Remove unneeded indentation in sched_fair.c::place_entity()
2009-09-21 09:06:17 -07:00
Andrew Morton 89f19f04dc sched: Fix raciness in runqueue_is_locked()
runqueue_is_locked() is unavoidably racy due to a poor interface design.
It does

	cpu = get_cpu()
	ret = some_perpcu_thing(cpu);
	put_cpu(cpu);
	return ret;

Its return value is unreliable.

Fix.

Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <200909191855.n8JItiko022148@imap1.linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-20 20:00:32 +02:00
Li Zefan ee6c2c1bd1 tracing: remove max_tracer_type_len
Limit the length of a tracer's name within 100 chars, and then we
don't have to play with max_tracer_type_len.

Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
LKML-Reference: <4AB32377.9020601@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-09-19 11:28:19 -04:00
Steven Rostedt 60ba770227 tracing: add filter event logic to special, mmiotrace and boot tracers
Now that the pluging tracers use macros to create the structures and
automate the exporting of their formats to the format files, they also
automatically get a filter file.

This patch adds the code to implement the filter logic in the trace
recordings.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-09-12 23:34:04 -04:00
Carsten Emde b5130b1e7d tracing: do not update tracing_max_latency when tracer is stopped
The state of the function pair tracing_stop()/tracing_start() is
correctly considered when tracer data are updated. However, the global
and externally accessible variable tracing_max_latency is always updated
- even when tracing is stopped.

The update should only occur, if tracing was not stopped.

Signed-off-by: Carsten Emde <C.Emde@osadl.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-09-12 21:45:17 -04:00
jolsa@redhat.com b63f39ea50 tracing: create generic trace parser
Create a "trace_parser" that can parse the user space input for
separate words.

struct trace_parser is the descriptor.

Generic "trace_get_user" function that can be a helper to read multiple
words passed in by user space.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
LKML-Reference: <1252682969-3366-2-git-send-email-jolsa@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-09-11 14:46:55 -04:00
Steven Rostedt 637e7e8641 tracing: add lock depth to entries
This patch adds the lock depth of the big kernel lock to the generic
entry header. This way we can see the depth of the lock and help
in removing the BKL.

Example:

 #                  _------=> CPU#
 #                 / _-----=> irqs-off
 #                | / _----=> need-resched
 #                || / _---=> hardirq/softirq
 #                ||| / _--=> preempt-depth
 #                |||| /_--=> lock-depth
 #                |||||/     delay
 #  cmd     pid   |||||| time  |   caller
 #     \   /      ||||||   \   |   /
   <idle>-0       2.N..3 5902255250us+: lock_acquire: read rcu_read_lock
   <idle>-0       2.N..3 5902255253us+: lock_release: rcu_read_lock
   <idle>-0       2dN..3 5902255257us+: lock_acquire: xtime_lock
   <idle>-0       2dN..4 5902255259us : lock_acquire: clocksource_lock
   <idle>-0       2dN..4 5902255261us+: lock_release: clocksource_lock

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-09-11 13:55:35 -04:00
Steven Rostedt 48659d3119 tracing: move tgid out of generic entry and into userstack
The userstack trace required the recording of the tgid entry.
Unfortunately, it was added to the generic entry where it wasted
4 bytes of every entry and was only used by one entry.

This patch moves it out of the generic field and moves it into the
only user (userstack_entry).

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-09-11 11:36:23 -04:00
Ingo Molnar ed011b22ce Merge commit 'v2.6.31-rc9' into tracing/core
Merge reason: move from -rc5 to -rc9.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-06 06:11:42 +02:00
Steven Rostedt e8165dbb03 tracing: report error in trace if we fail to swap latency buffer
The irqsoff tracer will fail to swap the cpu buffer with the max
buffer if it preempts a commit. Instead of ignoring this, this patch
makes the tracer report it if the last max latency failed due to preempting
a current commit.

The output of the latency tracer will look like this:

 # tracer: irqsoff
 #
 # irqsoff latency trace v1.1.5 on 2.6.31-rc5
 # --------------------------------------------------------------------
 # latency: 112 us, #1/1, CPU#1 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:4)
 #    -----------------
 #    | task: -4281 (uid:0 nice:0 policy:0 rt_prio:0)
 #    -----------------
 #  => started at: save_args
 #  => ended at:   __do_softirq
 #
 #
 #                  _------=> CPU#
 #                 / _-----=> irqs-off
 #                | / _----=> need-resched
 #                || / _---=> hardirq/softirq
 #                ||| / _--=> preempt-depth
 #                |||| /
 #                |||||     delay
 #  cmd     pid   ||||| time  |   caller
 #     \   /      |||||   \   |   /
    bash-4281    1d.s6  265us : update_max_tr_single: Failed to swap buffers due to commit in progress

Note the latency time and the functions that disabled the irqs or preemption
will still be listed.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-09-04 19:22:41 -04:00
Steven Rostedt 659372d3e4 tracing: add trace_array_printk for internal tracers to use
This patch adds a trace_array_printk to allow a tracer to use the
trace_printk on its own trace array.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-09-04 19:13:53 -04:00
Steven Rostedt e77405ad80 tracing: pass around ring buffer instead of tracer
The latency tracers (irqsoff and wakeup) can swap trace buffers
on the fly. If an event is happening and has reserved data on one of
the buffers, and the latency tracer swaps the global buffer with the
max buffer, the result is that the event may commit the data to the
wrong buffer.

This patch changes the API to the trace recording to be recieve the
buffer that was used to reserve a commit. Then this buffer can be passed
in to the commit.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-09-04 18:59:39 -04:00
Steven Rostedt f633903af2 tracing: make tracing_reset safe for external use
Reseting the trace buffer without first disabling the buffer and
waiting for any writers to complete, can corrupt the ring buffer.

This patch makes the external version of tracing_reset safe from
corruption by disabling the ring buffer and calling synchronize_sched.

This version can no longer be called from interrupt context. But all those
callers have been removed.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-09-04 18:46:51 -04:00
Steven Rostedt 2f26ebd549 tracing: use timestamp to determine start of latency traces
Currently the latency tracers reset the ring buffer. Unfortunately
if a commit is in process (due to a trace event), this can corrupt
the ring buffer. When this happens, the ring buffer will detect
the corruption and then permanently disable the ring buffer.

The bug does not crash the system, but it does prevent further tracing
after the bug is hit.

Instead of reseting the trace buffers, the timestamp of the start of
the trace is used instead. The buffers will still contain the previous
data, but the output will not count any data that is before the
timestamp of the trace.

Note, this only affects the static trace output (trace) and not the
runtime trace output (trace_pipe). The runtime trace output does not
make sense for the latency tracers anyway.

Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-09-04 18:44:22 -04:00
Steven Rostedt 76f0d07376 tracing: remove users of tracing_reset
The function tracing_reset is deprecated for outside use of trace.c.

The new function to reset the the buffers is tracing_reset_online_cpus.

The reason for this is that resetting the buffers while the event
trace points are active can corrupt the buffers, because they may
be writing at the time of reset. The tracing_reset_online_cpus disables
writes and waits for current writers to finish.

This patch replaces all users of tracing_reset except for the latency
tracers. Those changes require more work and will be removed in the
following patches.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-09-04 12:12:39 -04:00
Steven Rostedt 621968cdb2 tracing: disable buffers and synchronize_sched before resetting
Resetting the ring buffers while traces are happening can corrupt
the ring buffer and disable it (no kernel crash to worry about).

The safest thing to do is disable the ring buffers, call synchronize_sched()
to wait for all current writers to finish and then reset the buffer.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-09-04 12:02:35 -04:00
Steven Rostedt b8de7bd168 tracing: disable update max tracer while reading trace
When reading the tracer from the trace file, updating the max latency
may corrupt the output. This patch disables the tracing of the max
latency while reading the trace file.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-09-04 11:52:24 -04:00
Steven Rostedt 8248ac052d tracing: print out start and stop in latency traces
During development of the tracer, we would copy information from
the live tracer to the max tracer with one memcpy. Since then we
added a generic ring buffer and we handle the copies differently now.
Unfortunately, we never copied the critical section information, and
we lost the output:

 #  => started at: kmem_cache_alloc
 #  => ended at:   kmem_cache_alloc

This patch adds back the critical start and end copying as well as
removes the unused "trace_idx" and "overrun" fields of the
trace_array_cpu structure.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-09-04 11:48:12 -04:00
Steven Rostedt 5d4a9dba2d tracing: only show tracing_max_latency when latency tracer configured
The tracing_max_latency file should only be present when one of the
latency tracers ({preempt|irqs}off, wakeup*) are enabled.

This patch also removes tracing_thresh when latency tracers are not
enabled, as well as compiles out code that is only used for latency
tracers.

Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-08-27 16:58:05 -04:00
Zhaolei 5079f3261f ftrace: Move setting of clock-source out of options
There are many clock sources for the tracing system but we can only
enable/disable one at a time with the trace/options file.
We can move the setting of clock-source out of options and add a separate
file for it:
 # cat trace_clock
 [local] global
 # echo global > trace_clock
 # cat trace_clock
 local [global]

Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
LKML-Reference: <4A939D08.6050604@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-08-26 00:32:08 -04:00
Zhaolei f2d84b65b9 ftrace: Unify effect of writing to trace_options and option/*
"echo noglobal-clock > trace_options" can be used to change trace
clock but "echo 0 > options/global-clock" can't. The flag toggling
will be silently accepted without actually changing the clock callback.

We can fix it by using set_tracer_flags() in
trace_options_core_write().

Changelog:
v1->v2: Simplified switch() after Li Zefan <lizf@cn.fujitsu.com>'s
        suggestion

Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Cc: Steven Rostedt <srostedt@redhat.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2009-08-18 02:07:04 +02:00
Zhaolei 7770841e63 tracing: Rename set_tracer_flags()'s local variable trace_flags
set_tracer_flags() have a local variable named trace_flags which has
the same name than a global one in the same scope.
This leads to confusion, using tracer_flags should be better by its
meaning.

Changelog:
v1->v2: Simplified another patch in this patchset, no change in this
        patch.

Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2009-08-11 20:35:25 +02:00
Ingo Molnar 89034bc2c7 Merge branch 'linus' into tracing/core
Conflicts:
	kernel/trace/trace_events_filter.c

We use the tracing/core version.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-11 14:19:09 +02:00
Frederic Weisbecker f413cdb80c perf_counter: Fix/complete ftrace event records sampling
This patch implements the kernel side support for ftrace event
record sampling.

A new counter sampling attribute is added:

   PERF_SAMPLE_TP_RECORD

which requests ftrace events record sampling. In this case
if a PERF_TYPE_TRACEPOINT counter is active and a tracepoint
fires, we emit the tracepoint binary record to the
perfcounter event buffer, as a sample.

Result, after setting PERF_SAMPLE_TP_RECORD attribute from perf
record:

 perf record -f -F 1 -a -e workqueue:workqueue_execution
 perf report -D

 0x21e18 [0x48]: event: 9
 .
 . ... raw event: size 72 bytes
 .  0000:  09 00 00 00 01 00 48 00 d0 c7 00 81 ff ff ff ff  ......H........
 .  0010:  0a 00 00 00 0a 00 00 00 21 00 00 00 00 00 00 00  ........!......
 .  0020:  2b 00 01 02 0a 00 00 00 0a 00 00 00 65 76 65 6e  +...........eve
 .  0030:  74 73 2f 31 00 00 00 00 00 00 00 00 0a 00 00 00  ts/1...........
 .  0040:  e0 b1 31 81 ff ff ff ff                          .......
.
0x21e18 [0x48]: PERF_EVENT_SAMPLE (IP, 1): 10: 0xffffffff8100c7d0 period: 33

The raw ftrace binary record starts at offset 0020.

Translation:

 struct trace_entry {
	type		= 0x2b = 43;
	flags		= 1;
	preempt_count	= 2;
	pid		= 0xa = 10;
	tgid		= 0xa = 10;
 }

 thread_comm = "events/1"
 thread_pid  = 0xa = 10;
 func	    = 0xffffffff8131b1e0 = flush_to_ldisc()

What will come next?

 - Userspace support ('perf trace'), 'flight data recorder' mode
   for perf trace, etc.

 - The unconditional copy from the profiling callback brings
   some costs however if someone wants no such sampling to
   occur, and needs to be fixed in the future. For that we need
   to have an instant access to the perf counter attribute.
   This is a matter of a flag to add in the struct ftrace_event.

 - Take care of the events recursivity! Don't ever try to record
   a lock event for example, it seems some locking is used in
   the profiling fast path and lead to a tracing recursivity.
   That will be fixed using raw spinlock or recursivity
   protection.

 - [...]

 - Profit! :-)

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Tom Zanussi <tzanussi@gmail.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Gabriel Munteanu <eduard.munteanu@linux360.ro>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-09 12:53:48 +02:00
Frederic Weisbecker 1a0799a8fe tracing/function-graph-tracer: Move graph event insertion helpers in the graph tracer file
The function graph events helpers which insert the function entry and
return events into the ring buffer currently reside in trace.c
But this file is quite overloaded and the right place for these helpers
is in the function graph tracer file.

Then move them to trace_functions_graph.c

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
2009-08-06 07:28:06 +02:00
Frederic Weisbecker 82e04af498 tracing: Move sched event insertion helpers in the sched switch tracer file
The sched events helpers which insert the sched switch and wakeup
events into the ring buffer currently reside in trace.c
But this file is quite overloaded and the right place for these helpers
is in the sched switch tracer file.

Then move them to trace_functions.c

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
2009-08-06 07:28:06 +02:00
Frederic Weisbecker c0a0d0d3f6 tracing/core: Make the stack entry helpers global
Make the stacktrace event insertion helpers globals.
This has two effects:

- Prepare for moving the sched events insertion helpers to
  the sched switch tracer file.
- Move some ifdef outside function definitions

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
2009-08-06 07:28:05 +02:00
Frederic Weisbecker 5e5bf48398 tracing/core: Turn ftrace_cpu_disabled into a global var
In order to prepare the moving of the function graph tracer insertion
helpers from trace.c to trace_functions_graph.c, we need to export the
ftrace_cpu_disabled variable.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
2009-08-06 07:28:05 +02:00
Ingo Molnar e16852cfc5 Merge branch 'tracing/fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing into tracing/urgent 2009-08-04 13:58:28 +02:00
Lai Jiangshan 74e7ff8c50 tracing: Fix missing function_graph events when we splice_read from trace_pipe
About a half events are missing when we splice_read
from trace_pipe. They are unexpectedly consumed because we ignore
the TRACE_TYPE_NO_CONSUME return value used by the function graph
tracer when it needs to consume the events by itself to walk on
the ring buffer.

The same problem appears with ftrace_dump()

Example of an output before this patch:

1)               |      ktime_get_real() {
1)   2.846 us    |          read_hpet();
1)   4.558 us    |        }
1)   6.195 us    |      }

After this patch:

0)               |      ktime_get_real() {
0)               |        getnstimeofday() {
0)   1.960 us    |          read_hpet();
0)   3.597 us    |        }
0)   5.196 us    |      }

The fix also applies on 2.6.30

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: stable@kernel.org
LKML-Reference: <4A6EEC52.90704@cn.fujitsu.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2009-07-28 23:17:23 +02:00
Steven Rostedt 8650ae32ef tracing: only truncate ftrace files when O_TRUNC is set
The current code will truncate the ftrace files contents if O_APPEND
is not set and the file is opened in write mode. This is incorrect.
It should only truncate the file if O_TRUNC is set. Otherwise
if one of these files is opened by a C program with fopen "r+",
it will incorrectly truncate the file.

Reported-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-07-23 10:07:18 -04:00
Xiao Guangrong ff4e9da233 tracing: cleanup for tracing_trace_options_read()
'\n' is already appended, and what we need is just an extra
space for the '\0'.

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
LKML-Reference: <4A3EED63.3090908@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-07-20 12:02:09 -04:00
jolsa@redhat.com 566b0aaf79 tracing: Remove unused fields/variables
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Cc: rostedt@goodmis.org
LKML-Reference: <1247773468-11594-2-git-send-email-jolsa@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-18 12:21:16 +02:00
Ingo Molnar 45bceffc30 Merge branch 'linus' into tracing/core
Merge reason: tracing/core was on an older, pre-rc1 base.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-18 12:20:01 +02:00
Alexey Dobriyan 405f55712d headers: smp_lock.h redux
* Remove smp_lock.h from files which don't need it (including some headers!)
* Add smp_lock.h to files which do need it
* Make smp_lock.h include conditional in hardirq.h
  It's needed only for one kernel_locked() usage which is under CONFIG_PREEMPT

  This will make hardirq.h inclusion cheaper for every PREEMPT=n config
  (which includes allmodconfig/allyesconfig, BTW)

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-07-12 12:22:34 -07:00
Steven Rostedt 77ae365eca ring-buffer: make lockless
This patch converts the ring buffers into a completely lockless
buffer recording system. The read side still takes locks since
we still serialize readers. But the writers are the ones that
must be lockless (those can happen in NMIs).

The main change is to the "head_page" pointer. We write to the
tail, and read from the head. The "head_page" pointer in the cpu
buffer is now just a reference to where to look. The real head
page is now kept in the head_page->list->prev->next pointer.
That is, in the list head of the previous page we set flags.

The list pages are allocated to be aligned such that the lowest
significant bits are always zero pointing to the list. This gives
us play to put in flags to their pointers.

bit 0: set when the page is a head page
bit 1: set when the writer is moving the page (for overwrite mode)

cmpxchg is used to update the pointer.

When the writer wraps the buffer and the tail meets the head,
in overwrite mode, the writer must move the head page forward.
It first uses cmpxchg to change the pointer flag from 1 to 2.
Once this is done, the reader on another CPU will not take the
page from the buffer.

The writers need to protect against interrupts (we don't bother with
disabling interrupts because NMIs are allowed to write too).

After the writer sets the pointer flag to 2, it takes care to
manage interrupts coming in. This is discribed in detail within the
comments of the code.

 Changes in version 2:
  - Let reader reset entries value of header page.
  - Fix tail page passing commit page on reader page test.
  - Always increment entries and write counter in rb_tail_page_update
  - Add safety check in rb_set_commit_to_write to break out of infinite loop
  - add mask in rb_is_reader_page

[ Impact: lock free writing to the ring buffer ]

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-07-07 18:36:12 -04:00
Li Zefan 020e5f85cb tracing/events: Add trace_event boot option
We already have ftrace= boot option, and this adds a similar
boot option for trace events, so allow trace events to be
enabled at boot, for boot debugging purpose.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <4A4ACE29.3010407@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-01 15:44:24 +02:00
Li Zefan 9d612beff5 tracing: Fix trace_buf_size boot option
We should be able to specify [KMG] when setting trace_buf_size
boot option, as documented in kernel-parameters.txt

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <4A41F2DB.4020102@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-24 11:41:12 +02:00
Li Zefan f129e965be tracing: Reset iterator in t_start()
The iterator is m->private, but it's not reset to trace_types in
t_start(). If the output is larger than PAGE_SIZE and t_start()
is called the 2nd time, things will go wrong.

Reviewed-by: Liming Wang <liming.wang@windriver.com>
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <4A418728.5020506@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-24 11:02:51 +02:00
Linus Torvalds b0b7065b64 Merge branch 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (24 commits)
  tracing/urgent: warn in case of ftrace_start_up inbalance
  tracing/urgent: fix unbalanced ftrace_start_up
  function-graph: add stack frame test
  function-graph: disable when both x86_32 and optimize for size are configured
  ring-buffer: have benchmark test print to trace buffer
  ring-buffer: do not grab locks in nmi
  ring-buffer: add locks around rb_per_cpu_empty
  ring-buffer: check for less than two in size allocation
  ring-buffer: remove useless compile check for buffer_page size
  ring-buffer: remove useless warn on check
  ring-buffer: use BUF_PAGE_HDR_SIZE in calculating index
  tracing: update sample event documentation
  tracing/filters: fix race between filter setting and module unload
  tracing/filters: free filter_string in destroy_preds()
  ring-buffer: use commit counters for commit pointer accounting
  ring-buffer: remove unused variable
  ring-buffer: have benchmark test handle discarded events
  ring-buffer: prevent adding write in discarded area
  tracing/filters: strloc should be unsigned short
  tracing/filters: operand can be negative
  ...

Fix up kmemcheck-induced conflict in kernel/trace/ring_buffer.c manually
2009-06-20 10:56:46 -07:00
GeunSik Lim 156f5a7801 debugfs: Fix terminology inconsistency of dir name to mount debugfs filesystem.
Many developers use "/debug/" or "/debugfs/" or "/sys/kernel/debug/"
directory name to mount debugfs filesystem for ftrace according to
./Documentation/tracers/ftrace.txt file.

And, three directory names(ex:/debug/, /debugfs/, /sys/kernel/debug/) is
existed in kernel source like ftrace, DRM, Wireless, Documentation,
Network[sky2]files to mount debugfs filesystem.

debugfs means debug filesystem for debugging easy to use by greg kroah
hartman. "/sys/kernel/debug/" name is suitable as directory name
of debugfs filesystem.
- debugfs related reference: http://lwn.net/Articles/334546/

Fix inconsistency of directory name to mount debugfs filesystem.

* From Steven Rostedt
  - find_debugfs() and tracing_files() in this patch.

Signed-off-by: GeunSik Lim <geunsik.lim@samsung.com>
Acked-by     : Inaky Perez-Gonzalez <inaky@linux.intel.com>
Reviewed-by  : Steven Rostedt <rostedt@goodmis.org>
Reviewed-by  : James Smart <james.smart@emulex.com>
CC: Jiri Kosina <trivial@kernel.org>
CC: David Airlie <airlied@linux.ie>
CC: Peter Osterlund <petero2@telia.com>
CC: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
CC: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
CC: Masami Hiramatsu <mhiramat@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-06-15 21:30:28 -07:00
Li Zefan e4f2d10f47 tracing: replace a GFP_ATOMIC with GFP_KERNEL allocation
Atomic allocation is not needed here.

[ Impact: clean up of memory alloction type ]

Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
LKML-Reference: <4A35B898.2050607@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-06-15 11:37:14 -04:00
Li Zefan 215368e8e5 tracing: fix a typo in tracing_cpumask_write()
It's tracing_cpumask_new that should be kfree()ed.

This causes tracing_cpumask to be freed due to the typo:

 # echo z > tracing_cpumask
 bash: echo: write error: Invalid argument

And subsequent reads/writes to tracing_cpuamsk will access this
already-freed tracing_cpumask, thus may lead to crash.

[ Impact: fix leak and crash when writing invalid val to tracing_cpumask ]

Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
LKML-Reference: <4A35B86A.7070608@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-06-15 11:37:12 -04:00
Linus Torvalds 8623661180 Merge branch 'tracing-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'tracing-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (244 commits)
  Revert "x86, bts: reenable ptrace branch trace support"
  tracing: do not translate event helper macros in print format
  ftrace/documentation: fix typo in function grapher name
  tracing/events: convert block trace points to TRACE_EVENT(), fix !CONFIG_BLOCK
  tracing: add protection around module events unload
  tracing: add trace_seq_vprint interface
  tracing: fix the block trace points print size
  tracing/events: convert block trace points to TRACE_EVENT()
  ring-buffer: fix ret in rb_add_time_stamp
  ring-buffer: pass in lockdep class key for reader_lock
  tracing: add annotation to what type of stack trace is recorded
  tracing: fix multiple use of __print_flags and __print_symbolic
  tracing/events: fix output format of user stack
  tracing/events: fix output format of kernel stack
  tracing/trace_stack: fix the number of entries in the header
  ring-buffer: discard timestamps that are at the start of the buffer
  ring-buffer: try to discard unneeded timestamps
  ring-buffer: fix bug in ring_buffer_discard_commit
  ftrace: do not profile functions when disabled
  tracing: make trace pipe recognize latency format flag
  ...
2009-06-10 19:53:40 -07:00
Steven Rostedt 112f38a7e3 tracing: make trace pipe recognize latency format flag
The trace_pipe did not recognize the latency format flag and would produce
different output than the trace file. The problem was partly due that
the trace flags in the iterator was not set as well as the trace_pipe
zeros out part of the iterator (including the flags) to be able to use
the same routines as the trace file. trace_flags of the iterator should
not cause any problems when not zeroed out by for trace_pipe.

Reported-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-06-01 23:26:02 -04:00
Heiko Carstens 5b6045a906 trace: disable preemption before taking raw spinlocks
s390 code uses smp_processor_id() in __raw_spin_lock() code which
reveals that a (raw) spinlock is taken without preemption disabled.
This can potentially deadlock.

To fix this explicitly disable and enable preemption.

BUG: using smp_processor_id() in preemptible [00000000] code: cat/2278
caller is trace_find_cmdline+0x40/0xfc
CPU: 0 Not tainted 2.6.30-rc7-dirty #39
Process cat (pid: 2278, task: 000000003faedb68, ksp: 000000003b33b988)
000000003b33b988 000000003b33bae0 0000000000000002 0000000000000000
       000000003b33bb80 000000003b33baf8 000000003b33baf8 00000000000175d6
       0000000000000001 000000003b33b988 000000003f9b0000 000000000000000b
       000000000000000c 000000003b33bb40 000000003b33bae0 0000000000000000
       0000000000000000 00000000000175d6 000000003b33bae0 000000003b33bb28
Call Trace:
([<00000000000174b2>] show_trace+0x112/0x170)
 [<0000000000017582>] show_stack+0x72/0x100
 [<0000000000441538>] dump_stack+0xc8/0xd8
 [<000000000025c350>] debug_smp_processor_id+0x114/0x130
 [<00000000000bf0e4>] trace_find_cmdline+0x40/0xfc
 [<00000000000c35d4>] trace_print_context+0x58/0xac
 [<00000000000bb676>] print_trace_line+0x416/0x470
 [<00000000000bc8fe>] s_show+0x4e/0x428
 [<000000000013834e>] seq_read+0x36a/0x5d4
 [<0000000000112a78>] vfs_read+0xc8/0x174
 [<0000000000112c58>] SyS_read+0x74/0xc4
 [<000000000002c7ae>] sysc_noemu+0x10/0x16
 [<000002000012436c>] 0x2000012436c
1 lock held by cat/2278:
 #0:  (&p->lock){+.+.+.}, at: [<0000000000138056>] seq_read+0x72/0x5d4

[ Impact: fix preempt-unsafe raw spinlock ]

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2009-05-28 01:21:03 +02:00
Lai Jiangshan 4f5359685a tracing: add trace_event_read_lock()
I found that there is nothing to protect event_hash in
ftrace_find_event(). Rcu protects the event hashlist
but not the event itself while we use it after its extraction
through ftrace_find_event().

This lack of a proper locking in this spot opens a race
window between any event dereferencing and module removal.

Eg:

--Task A--

print_trace_line(trace) {
  event = find_ftrace_event(trace)

--Task B--

trace_module_remove_events(mod) {
  list_trace_events_module(ev, mod) {
    unregister_ftrace_event(ev->event) {
      hlist_del(ev->event->node)
        list_del(....)
    }
  }
}
|--> module removed, the event has been dropped

--Task A--

  event->print(trace); // Dereferencing freed memory

If the event retrieved belongs to a module and this module
is concurrently removed, we may end up dereferencing a data
from a freed module.

RCU could solve this, but it would add latency to the kernel and
forbid tracers output callbacks to call any sleepable code.
So this fix converts 'trace_event_mutex' to a read/write semaphore,
and adds trace_event_read_lock() to protect ftrace_find_event().

[ Impact: fix possible freed memory dereference in ftrace ]

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <4A114806.7090302@cn.fujitsu.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2009-05-25 23:53:41 +02:00
GeunSik Lim 88fc86c283 tracing: Append prompt in /debug/tracing/README file
append prompt in /debug/tracing/README file.

This is trivial issue. Fix typo Mini Howto file(README) for ftrace.

[ Impact: cleanup ]

Signed-off-by: GeunSik Lim <geunsik.lim@samsung.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: williams <williams@redhat.com>
LKML-Reference: <1242289418.31161.45.camel@centos51>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-15 19:43:22 +02:00
Steven Rostedt 9456f0fa6d tracing: reset ring buffer when removing modules with events
Li Zefan found that there's a race using the event ids of events and
modules. When a module is loaded, an event id is incremented. We only
have 16 bits for event ids (65536) and there is a possible (but highly
unlikely) race that we could load and unload a module that registers
events so many times that the event id counter overflows.

When it overflows, it then restarts and goes looking for available
ids. An id is available if it was added by a module and released.

The race is if you have one module add an id, and then is removed.
Another module loaded can use that same event id. But if the old module
still had events in the ring buffer, the new module's call back would
get bogus data.  At best (and most likely) the output would just be
garbage. But if the module for some reason used pointers (not recommended)
then this could potentially crash.

The safest thing to do is just reset the ring buffer if a module that
registered events is removed.

[ Impact: prevent unpredictable results of event id overflows ]

Reported-by: Li Zefan <lizf@cn.fujitsu.com>
LKML-Reference: <49FEAFD0.30106@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-05-06 23:11:41 -04:00
Steven Rostedt 94487d6d53 tracing: use proper export symbol for tracing api
When adding the EXPORT_SYMBOL to some of the tracing API, I accidently
used EXPORT_SYMBOL instead of EXPORT_SYMBOL_GPL. This patch fixes
that mistake.

[ Impact: export the tracing code only for GPL modules ]

Reported-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-05-05 19:22:53 -04:00
Steven Rostedt c8d771835e tracing: export stats of ring buffers to userspace
This patch adds stats to the ftrace ring buffers:

 # cat /debugfs/tracing/per_cpu/cpu0/stats
 entries: 42360
 overrun: 30509326
 commit overrun: 0
 nmi dropped: 0

Where entries are the total number of data entries in the buffer.

overrun is the number of entries not consumed and were overwritten by
the writer.

commit overrun is the number of entries dropped due to nested writers
wrapping the buffer before the initial writer finished the commit.

nmi dropped is the number of entries dropped due to the ring buffer
lock being held when an nmi was going to write to the ring buffer.
Note, this field will be meaningless and will go away when the ring
buffer becomes lockless.

[ Impact: let userspace know what is happening in the ring buffers ]

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-05-05 13:52:02 -04:00
Steven Rostedt 7267fa6819 tracing: fix ref count in splice pages
The pages allocated for the splice binary buffer did not initialize
the ref count correctly. This caused pages not to be freed and causes
a drastic memory leak.

Thanks to logdev I was able to trace the tracer to find where the leak
was.

[ Impact: stop memory leak when using splice ]

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-29 08:02:44 +02:00
Steven Rostedt f2957f1f19 tracing: have splice only copy full pages
Splice works with pages, it is much more effecient to use an entire
page than to copy bits over several pages.

Using logdev to trace the internals of the splice mechanism, I was
able to see that splice can be very aggressive. When tracing is
occurring, and the reader caught up to the writer, and the writer
is on the reader page, the reader will copy what is there into the
splice page. Splice may iterate over several pages and if the
writer is still writing to the page, the reader will keep copying
bits to new pages to pass to userspace.

This patch changes it to only pass data to userspace if the page
is full (the writer has left the page). This has a small side effect
that splice can not read a partial page, and must wait for the
page to fill. This should not be an issue. If tracing has stopped,
then a use of "read" will still read all of the page.

[ Impact: better performance for ring buffer splice code ]

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-04-29 00:26:30 -04:00
Steven Rostedt 93459c6cb9 tracing: only add splice page if entries exist
The splice code allocates a page even when the ring buffer is empty.
It detects the ring buffer being empty when it it fails to copy
anything from the ring buffer into the page.

This patch adds a check to see if there is anything in the ring buffer
before allocating a page.

Thanks to logdev for letting me trace the tracer to find this.

[ Impact: speed up due to removing unnecessary allocation ]

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-04-29 00:23:13 -04:00
Steven Rostedt 5beae6efd1 tracing: fix ref count in splice pages
The pages allocated for the splice binary buffer did not initialize
the ref count correctly. This caused pages not to be freed and causes
a drastic memory leak.

Thanks to logdev I was able to trace the tracer to find where the leak
was.

[ Impact: stop memory leak when using splice ]

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-04-29 00:16:21 -04:00
Steven Rostedt cd891ae030 tracing: convert ftrace_dump spinlocks to raw
ftrace_dump is used for printing out the contents of the ftrace ring buffer
to the console on failure. Currently it uses a spinlock to synchronize
the output from multiple failures on different CPUs. This spin lock
currently is a normal spinlock and can cause issues with lockdep and
lock tracing.

This patch converts it to raw since it is for error handling only.
The lock is local to the ftrace_dump and is not used by any other
infrastructure.

[ Impact: prevent ftrace_dump from locking up by internal tracing ]

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-04-28 11:39:34 -04:00
Li Zefan 7a4f453b6d tracing/events: make struct trace_entry->type to be int type
struct trace_entry->type is unsigned char, while trace event's id is
int type, thus for a event with id >= 256, it's entry->type is cast
to (id % 256), and then we can't see the trace output of this event.

 # insmod trace-events-sample.ko
 # echo foo_bar > /mnt/tracing/set_event
 # cat /debug/tracing/events/trace-events-sample/foo_bar/id
 256
 # cat /mnt/tracing/trace_pipe
           <...>-3548  [001]   215.091142: Unknown type 0
           <...>-3548  [001]   216.089207: Unknown type 0
           <...>-3548  [001]   217.087271: Unknown type 0
           <...>-3548  [001]   218.085332: Unknown type 0

[ Impact: fix output for trace events with id >= 256 ]

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tom Zanussi <tzanussi@gmail.com>
LKML-Reference: <49EEDB0E.5070207@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-22 11:36:38 +02:00
Steven Rostedt 3189cdb316 tracing: protect trace_printk from recursion
trace_printk can be called from any context, including NMIs.
If this happens, then we must test for for recursion before
grabbing any spinlocks.

This patch prevents trace_printk from being called recursively.

[ Impact: prevent hard lockup in lockdep event tracer ]

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-04-17 16:21:35 -04:00
Steven Rostedt 12acd473d4 tracing: add EXPORT_SYMBOL_GPL for trace commits
Not all the necessary symbols were exported to allow for tracing
by modules. This patch adds them in.

[ Impact: allow modules to commit data to the ring buffer ]

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-04-17 16:21:28 -04:00
Li Zefan 339ae5d3c3 tracing: fix file mode of trace and README
trace is read-write and README is read-only.

[ Impact: fix /debug/tracing/ file permissions. ]

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <49E7EAB6.4070605@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-17 18:04:36 +02:00
Avadh Patel 69abe6a5d1 tracing: add saved_cmdlines file to show cached task comms
Export the cached task comms to userspace. This allows user apps to translate
the pids from a trace into their respective task command lines.

[ Impact: let userspace apps reading binary buffer know comm's of pids ]

Signed-off-by: Avadh Patel <avadh4all@gmail.com>
[ added error checking and use of buf pointer to index file_buf ]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-04-17 17:04:12 +02:00
Steven Rostedt 17c873ec28 tracing/events: add export symbols for trace events in modules
Impact: let modules add trace events

The trace event code requires some functions to be exported to allow
modules to use TRACE_EVENT. This patch adds EXPORT_SYMBOL_GPL to the
necessary functions.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-04-14 12:58:01 -04:00
Tom Zanussi eb02ce017d tracing/filters: use ring_buffer_discard_commit() in filter_check_discard()
This patch changes filter_check_discard() to make use of the new
ring_buffer_discard_commit() function and modifies the current users to
call the old commit function in the non-discard case.

It also introduces a version of filter_check_discard() that uses the
global trace buffer (filter_current_check_discard()) for those cases.

v2 changes:

- fix compile error noticed by Ingo Molnar

Signed-off-by: Tom Zanussi <tzanussi@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: fweisbec@gmail.com
LKML-Reference: <1239178554.10295.36.camel@tropicana>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-14 00:00:56 +02:00
Steven Rostedt 77d9f465d4 tracing/filters: use ring_buffer_discard_commit for discarded events
The ring_buffer_discard_commit makes better usage of the ring_buffer
when an event has been discarded. It tries to remove it completely if
possible.

This patch converts the trace event filtering to use
ring_buffer_discard_commit instead of the ring_buffer_event_discard.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-14 00:00:54 +02:00
Tom Zanussi e45f2e2bd2 tracing/filters: add TRACE_EVENT_FORMAT_NOFILTER event macro
Frederic Weisbecker suggested that the trace_special event shouldn't be
filterable; this patch adds a TRACE_EVENT_FORMAT_NOFILTER event macro
that allows an event format to be exported without having a filter
attached, and removes filtering from the trace_special event.

Signed-off-by: Tom Zanussi <tzanussi@gmail.com>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-14 00:00:51 +02:00
Tom Zanussi e1112b4d96 tracing/filters: add run-time field descriptions to TRACE_EVENT_FORMAT events
This patch adds run-time field descriptions to all the event formats
exported using TRACE_EVENT_FORMAT.  It also hooks up all the tracers
that use them (i.e. the tracers in the 'ftrace subsystem') so they can
also have their output filtered by the event-filtering mechanism.

When I was testing this, there were a couple of things that fooled me
into thinking the filters weren't working, when actually they were -
I'll mention them here so others don't make the same mistakes (and file
bug reports. ;-)

One is that some of the tracers trace multiple events e.g. the
sched_switch tracer uses the context_switch and wakeup events, and if
you don't set filters on all of the traced events, the unfiltered output
from the events without filters on them can make it look like the
filtering as a whole isn't working properly, when actually it is doing
what it was asked to do - it just wasn't asked to do the right thing.

The other is that for the really high-volume tracers e.g. the function
tracer, the volume of filtered events can be so high that it pushes the
unfiltered events out of the ring buffer before they can be read so e.g.
cat'ing the trace file repeatedly shows either no output, or once in
awhile some output but that isn't there the next time you read the
trace, which isn't what you normally expect when reading the trace file.
If you read from the trace_pipe file though, you can catch them before
they disappear.

Changes from v1:

As suggested by Frederic Weisbecker:

- get rid of externs in functions
- added unlikely() to filter_check_discard()

Signed-off-by: Tom Zanussi <tzanussi@gmail.com>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-14 00:00:50 +02:00
Ingo Molnar 1cad1252ed Merge branch 'tracing/urgent' into tracing/core
Merge reason: pick up both v2.6.30-rc1 [which includes tracing/urgent fixes]
              and pick up the current lineup of tracing/urgent fixes as well

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-10 12:46:51 +02:00
Lai Jiangshan 93cfb3c9fd tracing: fix splice return too large
I got these from strace:

 splice(0x3, 0, 0x5, 0, 0x1000, 0x1) = 12288
 splice(0x3, 0, 0x5, 0, 0x1000, 0x1) = 12288
 splice(0x3, 0, 0x5, 0, 0x1000, 0x1) = 12288
 splice(0x3, 0, 0x5, 0, 0x1000, 0x1) = 16384
 splice(0x3, 0, 0x5, 0, 0x1000, 0x1) = 8192
 splice(0x3, 0, 0x5, 0, 0x1000, 0x1) = 8192
 splice(0x3, 0, 0x5, 0, 0x1000, 0x1) = 8192

I wanted to splice_read 4096 bytes, but it returns 8192 or larger.

It is because the return value of tracing_buffers_splice_read()
does not include "zero out any left over data" bytes.

But tracing_buffers_read() includes these bytes, we make them
consistent.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <srostedt@redhat.com>
LKML-Reference: <49D46674.9030804@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-10 12:44:46 +02:00
Lai Jiangshan c7625a555f tracing: update file->f_pos when splice(2) it
Impact: Cleanup

These two lines:

	if (unlikely(*ppos))
		return -ESPIPE;

in tracing_buffers_splice_read() are not needed, VFS layer
has disabled seek(2).

We remove these two lines, and then we can update file->f_pos.

And tracing_buffers_read() updates file->f_pos, this fix
make tracing_buffers_splice_read() updates file->f_pos too.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <srostedt@redhat.com>
LKML-Reference: <49D46670.4010503@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-10 12:44:44 +02:00
Lai Jiangshan ddd538f3e6 tracing: allocate page when needed
Impact: Cleanup

Sometimes, we open trace_pipe_raw, but we don't read(2) it,
we just splice(2) it, thus, the page is not used.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <srostedt@redhat.com>
LKML-Reference: <49D4666B.4010608@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-10 12:44:43 +02:00
Lai Jiangshan d1e7e02f30 tracing: disable seeking for trace_pipe_raw
Impact: disable pread()

We set tracing_buffers_fops.llseek to no_llseek,
but we can still perform pread() to read this file.

That is not expected.

This fix uses nonseekable_open() to disable it.

tracing_buffers_fops.llseek is still set to no_llseek,
it mark this file is a "non-seekable device" and is used by
sys_splice(). See also do_splice() or manual of splice(2):

ERRORS
       EINVAL Target file system doesn't support  splicing;
              neither  of the descriptors refers to a pipe;
              or offset given for non-seekable device.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <srostedt@redhat.com>
LKML-Reference: <49D46668.8030806@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-10 12:44:42 +02:00
Frederic Weisbecker 5452af664f tracing/ftrace: factorize the tracing files creation
Impact: cleanup

Most of the tracing files creation follow the same pattern:

ret = debugfs_create_file(...)
if (!ret)
	pr_warning("Couldn't create ... entry\n")

Unify it!

Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <1238109938-11840-1-git-send-email-fweisbec@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-04-07 14:43:07 +02:00
Ingo Molnar 86665c75da Merge branch 'tracing/urgent' into tracing/ftrace 2009-04-07 14:41:17 +02:00
Nikanth Karthikesan bc2b6871c1 Update /debug/tracing/README
Some of the tracers have been renamed, which was not updated in the in-kernel
run-time README file. Update it.

Signed-off-by: Nikanth Karthikesan <knikanth@suse.de>
LKML-Reference: <200903231158.32151.knikanth@suse.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-07 14:02:36 +02:00
Frederic Weisbecker b0dfa978c7 tracing/ftrace: alloc the started cpumask for the trace file
Impact: fix a crash while cat trace file

Currently we are using a cpumask to remind each cpu where a
trace occured. It lets us notice the user that a cpu just had
its first trace.

But on latest -tip we have the following crash once we cat the trace
file:

IP: [<c0270c4a>] print_trace_fmt+0x45/0xe7
*pde = 00000000
Oops: 0000 [#1] PREEMPT SMP
last sysfs file: /sys/class/net/eth0/carrier
Pid: 3897, comm: cat Not tainted (2.6.29-tip-02825-g0f22972-dirty #81)
EIP: 0060:[<c0270c4a>] EFLAGS: 00010297 CPU: 0
EIP is at print_trace_fmt+0x45/0xe7
EAX: 00000000 EBX: 00000000 ECX: c12d9e98 EDX: ccdb7010
ESI: d31f4000 EDI: 00322401 EBP: d31f3f10 ESP: d31f3efc
DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
Process cat (pid: 3897, ti=d31f2000 task=d3b3cf20 task.ti=d31f2000)
Stack:
d31f4080 ccdb7010 d31f4000 d691fe70 ccdb7010 d31f3f24 c0270e5c d31f4000
d691fe70 d31f4000 d31f3f34 c02718e8 c12d9e98 d691fe70 d31f3f70 c02bfc33
00001000 09130000 d3b46e00 d691fe98 00000000 00000079 00000001 00000000
Call Trace:
[<c0270e5c>] ? print_trace_line+0x170/0x17c
[<c02718e8>] ? s_show+0xa7/0xbd
[<c02bfc33>] ? seq_read+0x24a/0x327
[<c02bf9e9>] ? seq_read+0x0/0x327
[<c02ab18b>] ? vfs_read+0x86/0xe1
[<c02ab289>] ? sys_read+0x40/0x65
[<c0202d8f>] ? sysenter_do_call+0x12/0x3c
Code: 00 00 00 89 45 ec f7 c7 00 20 00 00 89 55 f0 74 4e f6 86 98 10 00 00 02 74 45 8b 86 8c 10 00 00 8b 9e a8 10 00 00 e8 52 f3 ff ff <0f> a3 03 19 c0 85 c0 75 2b 8b 86 8c 10 00 00 8b 9e a8 10 00 00
EIP: [<c0270c4a>] print_trace_fmt+0x45/0xe7 SS:ESP 0068:d31f3efc
CR2: 0000000000000000
---[ end trace aa9cf38e5ebed9dd ]---

This is because we alloc the iter->started cpumask on tracing_pipe_open but
not on tracing_open.

It hadn't been noticed until now because we need to have ring buffer overruns
to activate the starting of cpu buffer detection.

Also, we need a check to not print the messagge for the first trace on the file.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <1238619188-6109-1-git-send-email-fweisbec@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-07 14:02:03 +02:00
Frederic Weisbecker 5f0c6c03c5 tracing/ftrace: fix missing include string.h
Building a kernel with tracing can raise the following warning on
tip/master:

kernel/trace/trace.c:1249: error: implicit declaration of function 'vbin_printf'

We are missing an include to string.h

Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <1238160130-7437-1-git-send-email-fweisbec@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-07 14:00:18 +02:00
Lai Jiangshan cf8e347465 tracing: fix incorrect return type of ns2usecs()
Impact: fix time output bug in 32bits system

ns2usecs() returns 'long', it's incorrect.

(In i386)
...
          <idle>-0     [000]   521.442100: _spin_lock <-tick_do_update_jiffies64
          <idle>-0     [000]   521.442101: do_timer <-tick_do_update_jiffies64
          <idle>-0     [000]   521.442102: update_wall_time <-do_timer
          <idle>-0     [000]   521.442102: update_xtime_cache <-update_wall_time
....
(It always print the time less than 2200 seconds besides ...)
Because 'long' is 32bits in i386. ( (1<<31) useconds is about 2200 seconds)

...
          <idle>-0     [001] 4154502640.134759: rcu_bh_qsctr_inc <-__do_softirq
          <idle>-0     [001] 4154502640.134760: _local_bh_enable <-__do_softirq
          <idle>-0     [001] 4154502640.134761: idle_cpu <-irq_exit
...
(very large value)
Because 'long' is a signed type and it is 32bits in i386.

Changes in v2:
return 'unsigned long long' instead of 'cycle_t'

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
LKML-Reference: <49D05D10.4030009@cn.fujitsu.com>
Reported-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-07 13:59:23 +02:00
Ingo Molnar 93776a8ec7 Merge branch 'linus' into tracing/core
Merge reason: update to upstream tracing facilities

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-07 13:47:45 +02:00
Ingo Molnar 8b54e45b00 Merge branches 'tracing/docs', 'tracing/filters', 'tracing/ftrace', 'tracing/kprobes', 'tracing/blktrace-v2' and 'tracing/textedit' into tracing/core-v2 2009-03-31 17:46:40 +02:00
Steven Rostedt a2a16d6a31 function-graph: add option to calculate graph time or not
graph time is the time that a function is executing another function.
Thus if function A calls B, if graph-time is set, then the time for
A includes B. This is the default behavior. But if graph-time is off,
then the time spent executing B is subtracted from A.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-03-24 23:41:11 -04:00
Steven Rostedt 0706f1c48c tracing: adding function timings to function profiler
If the function graph trace is enabled, the function profiler will
use it to take the timing of the functions.

 cat /debug/tracing/trace_stat/functions

  Function                               Hit    Time
  --------                               ---    ----
  mwait_idle                             127    183028.4 us
  schedule                                26    151997.7 us
  __schedule                              31    151975.1 us
  sys_wait4                                2    74080.53 us
  do_wait                                  2    74077.80 us
  sys_newlstat                           138    39929.16 us
  do_path_lookup                         179    39845.79 us
  vfs_lstat_fd                           138    39761.97 us
  user_path_at                           153    39469.58 us
  path_walk                              179    39435.76 us
  __link_path_walk                       189    39143.73 us
[...]

Note the times are skewed due to the function graph tracer not taking
into account schedules.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-03-24 23:41:09 -04:00
Steven Rostedt be6f164a02 function-graph: add option for include sleep times
Impact: give user a choice to show times spent while sleeping

The user may want to see the time a function spent sleeping.
This patch adds the trace option "sleep-time" to allow that.
The "sleep-time" option is default on.

 echo sleep-time > /debug/tracing/trace_options

produces:

 ------------------------------------------
 2)  avahi-d-3428  =>    <idle>-0
 ------------------------------------------

 2)               |      finish_task_switch() {
 2)   0.621 us    |        _spin_unlock_irq();
 2)   2.202 us    |      }
 2) ! 1002.197 us |    }
 2) ! 1003.521 us |  }

where as,

 echo nosleep-time > /debug/tracing/trace_options

produces:

 0)    <idle>-0    =>  yum-upd-3416
 ------------------------------------------

 0)               |              finish_task_switch() {
 0)   0.643 us    |                _spin_unlock_irq();
 0)   2.342 us    |              }
 0) + 41.302 us   |            }
 0) + 42.453 us   |          }

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-03-24 11:06:24 -04:00
Frederic Weisbecker 1618536961 tracing/function-graph-tracer: fix functions call traces imbalance
Impact: fix traces output

Sometimes one can observe an imbalance in the traces between function
calls and function return traces:

func1() {
    }
}

The curly brace inside func1() is the return of another function nested
inside func1. The return trace have been inserted in the buffer but not
the entry.
We are storing a return address on the function traces stack while we
haven't inserted its entry on the buffer, hence the imbalance on the
traces.

This is because the tracers doesn't check all failures that can happen
on buffer insertion.

This patch reports the tracing recursion failures and the ring buffer
failures. In such cases, we now restore the original return address for
the function, giving up its return trace.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <1237843021-11695-1-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-23 23:25:32 +01:00
Frederic Weisbecker 3e1f60b80c tracing/ftrace: check if debugfs is registered before creating files
Impact: fix a crash with ftrace={nop,boot} parameter

If the nop or initcall tracers are launched as boot tracers,
they will attempt to create their option directory and files.
But these tracers are registered very early and then assigned
as "boot tracers" very early if asked to.

Since they do this before debugfs has been registered (core initcall),
a crash is triggered.

Another early tracers could also come later. So we fix it by
checking if debugfs is initialized before creating the root
tracing directory.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <1237759847-21025-3-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-23 16:25:47 +01:00
Frederic Weisbecker 07edf71213 tracing/events: don't use wake up for events
Impact: fix hard-lockup with sched switch events

Some ftrace events, such as sched wakeup, can be traced
while the runqueue lock is hold. Since they are using
trace_current_buffer_unlock_commit(), they call wake_up()
which can try to grab the runqueue lock too, resulting in
a deadlock.

Now for all event, we call a new helper:
trace_nowake_buffer_unlock_commit() which do pretty the same than
trace_current_buffer_unlock_commit() except than it doesn't call
trace_wake_up().

Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <1237759847-21025-4-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-23 09:22:14 +01:00
Dmitri Vorobiev b8b9426533 tracing: fix four sparse warnings
Impact: cleanup.

This patch fixes the following sparse warnings:

 kernel/trace/trace.c:385:9: warning: symbol 'trace_seq_to_buffer' was
 not declared. Should it be static?

 kernel/trace/trace_clock.c:29:13: warning: symbol 'trace_clock_local'
 was not declared. Should it be static?

 kernel/trace/trace_clock.c:54:13: warning: symbol 'trace_clock' was not
 declared. Should it be static?

 kernel/trace/trace_clock.c:74:13: warning: symbol 'trace_clock_global'
 was not declared. Should it be static?

Signed-off-by: Dmitri Vorobiev <dmitri.vorobiev@movial.com>
LKML-Reference: <1237741871-5827-4-git-send-email-dmitri.vorobiev@movial.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-22 18:16:54 +01:00
Frederic Weisbecker cf586b61f8 tracing/function-graph-tracer: prevent hangs during self-tests
Impact: detect tracing related hangs

Sometimes, with some configs, the function graph tracer can make
the timer interrupt too much slow, hanging the kernel in an endless
loop of timer interrupts servicing.

As suggested by Ingo, this patch brings a watchdog which stops the
selftest after a defined number of functions traced, definitely
disabling this tracer.

For those who want to debug the cause of the function graph trace
hang, you can pass the ftrace_dump_on_oops kernel parameter to dump
the traces after this hang detection.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <1237694675-23509-1-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-22 14:06:40 +01:00
Ingo Molnar 44fc6ee923 Merge branch 'tip/tracing/ftrace' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into tracing/ftrace 2009-03-20 10:15:13 +01:00
Ingo Molnar 22de89b371 Merge branches 'tracing/ftrace', 'tracing/kprobes', 'tracing/tasks' and 'linus' into tracing/core 2009-03-20 10:14:53 +01:00
Steven Rostedt 40ce74f19c tracing: remove recording function depth from trace_printk
The function depth in trace_printk was to facilitate the function
graph output. Now that the function graph calculates the depth within
the trace output, we no longer need to record the depth when the
trace_printk is called.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-03-19 15:58:47 -04:00
Steven Rostedt 5ef841f6f3 tracing: make print_(b)printk_msg_only global
This patch makes print_printk_msg_only and print_bprintk_msg_only
global for other functions to use. It also renames them by adding
a "trace_" to the beginning to avoid namespace collisions.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-03-19 15:57:55 -04:00
Steven Rostedt 4acd4d00f7 tracing: give easy way to clear trace buffer
There is currently no easy way to clear the trace buffer. Currently
the only way is to change the current tracer.

This patch lets the user clear the trace buffer by simply writing
into the trace files.

 echo > /debug/tracing/trace

or to clear a single cpu (i.e. for CPU 1):

 echo > /debug/tracing/per_cpu/cpu1/trace

Requested-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-03-18 10:52:47 -04:00
Carsten Emde a635cf0497 tracing: fix command line to pid reverse map
Impact: fix command line to pid mapping

map_cmdline_to_pid[] is checked in trace_save_cmdline(), but never
updated. This results in stale pid to command line mappings and the
tracer output will associate the wrong comm string.

Signed-off-by: Carsten Emde <Carsten.Emde@osadl.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Steven Rostedt <srostedt@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-18 10:10:18 +01:00
Thomas Gleixner 50d88758a3 tracing: fix trace_find_cmdline()
Impact: prevent stale command line output

In case there is no valid command line mapping for a pid
trace_find_cmdline() returns without updating the comm buffer. The
trace dump keeps the previous entry which results in confusing trace
output:

     <idle>-0     [000]   280.702056 ....
     <idle>-23456 [000]   280.702080 ....

Update the comm buffer with "<...>" when no mapping is found.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Steven Rostedt <srostedt@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-18 10:10:17 +01:00
Thomas Gleixner 2c7eea4c62 tracing: replace the crude (unsigned) -1 hackery
Impact: cleanup

The command line recorder uses (unsigned) -1 to mark non mapped
entries in the pid to command line maps. The validity check is
completely unintuitive: idx >= SAVED_CMDLINES

There is no need for such casting games. Use a constant to mark
unmapped entries and check for that constant to make the code readable
and understandable.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Steven Rostedt <srostedt@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-18 10:10:17 +01:00
Thomas Gleixner 18aecd362a tracing: stop command line recording when tracing is disabled
Impact: prevent overwrite of command line entries

When the tracer is stopped the command line recording continues to
record. The check for tracing_is_on() is not sufficient here as the
ringbuffer status is not affected by setting
debug/tracing/tracing_enabled to 0. On a non idle system this can
result in the loss of the command line information for the stopped
trace, which makes the trace harder to read and analyse.

Check tracer_enabled to allow further recording.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Steven Rostedt <srostedt@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-18 10:10:16 +01:00
Ingo Molnar 327019b01e Merge branch 'tip/tracing/ftrace' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into tracing/ftrace 2009-03-18 06:59:56 +01:00
Steven Rostedt af4617bdba tracing: add global-clock option to provide cross CPU clock to traces
Impact: feature to allow better serialized clock

This patch adds an option called "global-clock" that will allow
the tracer to switch to a slower but more accurate (across CPUs)
clock.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-03-17 23:10:35 -04:00
Steven Rostedt 37886f6a9f ring-buffer: add api to allow a tracer to change clock source
This patch adds a new function called ring_buffer_set_clock that
allows a tracer to assign its own clock source to the buffer.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-03-17 23:06:31 -04:00
Ingo Molnar 4176935b58 Merge branch 'tip/tracing/ftrace' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into tracing/ftrace 2009-03-17 10:37:37 +01:00
Steven Rostedt 6adaad14d7 tracing: stop comm recording on tracing off
Impact: fix for losing comms in trace

The command lines of tasks are cached at sched switch to not need
to record them at every trace point.  Disabling the tracing on stops
the recording of traces, but does not stop the caching of command lines.
When the tracing is off the cache may overflow and cause the tracing
to show incorrect tasks matching the PIDs.

This patch disables prevents updates to the comm cache when the ring buffer
is off.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-03-16 23:31:15 -04:00
Steven Rostedt 4ca5308523 tracing: protect reader of cmdline output
Impact: fix to one cause of incorrect comm outputs in trace

The spinlock only protected the creation of a comm <=> pid pair.
But it was possible that a reader could look up a pid, and get the
wrong comm because it had no locking.

This also required changing trace_find_cmdline to copy the comm cache
and not just send back a pointer to it.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-03-16 23:27:06 -04:00
Frederic Weisbecker 2fc1dfbe17 tracing/core: fix early free of cpumasks
Impact: fix crashes when tracing cpumasks

While ring-buffer allocation, the cpumasks are allocated too,
including the tracing cpumask and the per-cpu file mask handler.
But these cpumasks are freed accidentally just after.
Fix it.

Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <1237164303-11476-1-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-16 09:13:18 +01:00
Frederic Weisbecker 59f586db98 tracing/core: fix missing mutex unlock on tracing_set_tracer()
Impact: fix possible locking imbalance

In case of ring buffer resize failure, tracing_set_tracer forgot to
release trace_types_lock. Fix it.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <1237151439-6755-5-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-16 09:13:17 +01:00
Steven Rostedt 7f96f93f02 tracing: move binary buffers into per cpu directory
The binary_buffers directory in /debugfs/tracing held the files
to read the trace buffers in a binary format. This held one file
per CPU buffer. But we also have a per_cpu directory that holds
a way to read the pretty-print formats.

This patch moves the binary buffers into the per_cpu_directory:

 # ls /debug/tracing/per_cpu/cpu1/
trace  trace_pipe  trace_pipe_raw

The new name is called "trace_pipe_raw". The binary buffers always
acted similar to trace_pipe, except that they produce raw data.

Requested-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-03-13 00:37:42 -04:00
Frederic Weisbecker 48ead02030 tracing/core: bring back raw trace_printk for dynamic formats strings
Impact: fix callsites with dynamic format strings

Since its new binary implementation, trace_printk() internally uses static
containers for the format strings on each callsites. But the value is
assigned once at build time, which means that it can't take dynamic
formats.

So this patch unearthes the raw trace_printk implementation for the callers
that will need trace_printk to be able to carry these dynamic format
strings. The trace_printk() macro will use the appropriate implementation
for each callsite. Most of the time however, the binary implementation will
still be used.

The other impact of this patch is that mmiotrace_printk() will use the old
implementation because it calls the low level trace_vprintk and we can't
guess here whether the format passed in it is dynamic or not.

Some parts of this patch have been written by Steven Rostedt (most notably
the part that chooses the appropriate implementation for each callsites).

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-03-12 21:15:00 -04:00
Steven Rostedt db526ca329 tracing: show that buffer size is not expanded
Impact: do not confuse user on small trace buffer sizes

When the system boots up, the trace buffer is small to conserve memory.
It is only two pages per online CPU. When the tracer is used, it expands
to the default value.

This can confuse the user if they look at the buffer size and see only
7, but then later they see 1408.

 # cat /debug/tracing/buffer_size_kb
7

 # echo sched_switch > /debug/tracing/current_tracer

 # cat /debug/tracing/buffer_size_kb
1408

This patch tries to help remove this confustion by showing that the
buffer has not been expanded.

 # cat /debug/tracing/buffer_size_kb
7 (expanded: 1408)

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-03-12 21:14:59 -04:00
Steven Rostedt 1027fcb206 tracing: protect ring_buffer_expanded with trace_types_lock
Impact: prevent races with ring_buffer_expanded

This patch places the expanding of the tracing buffer under the
protection of the trace_types_lock mutex. It is highly unlikely
that there would be any contention, but better safe than sorry.

Reported-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-03-12 21:14:58 -04:00
Steven Rostedt a123c52b46 tracing: fix comments about trace buffer resizing
Impact: cleanup

Some of the comments about the trace buffer resizing is gobbledygook.
And I wonder why people question if I'm a native English speaker.

This patch makes the comments make a bit more sense.

Reported-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-03-12 21:14:58 -04:00
Steven Rostedt 51b643b404 Merge branch 'tracing/ftrace' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip into trace/tip/tracing/ftrace-merge 2009-03-12 21:12:46 -04:00
Steven Rostedt 554f786e28 ring-buffer: only allocate buffers for online cpus
Impact: save on memory

Currently, a ring buffer was allocated for each "possible_cpus". On
some systems, this is the same as NR_CPUS. Thus, if a system defined
NR_CPUS = 64 but it only had 1 CPU, we could have possibly 63 useless
ring buffers taking up space. With a default buffer of 3 megs, this
could be quite drastic.

This patch changes the ring buffer code to only allocate ring buffers
for online CPUs.  If a CPU goes off line, we do not free the buffer.
This is because the user may still have trace data in that buffer
that they would like to look at.

Perhaps in the future we could add code to delete a ring buffer if
the CPU is offline and the ring buffer becomes empty.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-03-11 22:15:27 -04:00
Steven Rostedt 9aba60fe6e tracing: fix trace_wait to know to wait on all cpus or just one
Impact: fix to task live locking on reading trace_pipe on one CPU

The same code is used for both trace_pipe (all CPUS) and the per_cpu
trace_pipe file. When there is no data to read, it will check for
signals and wait on the trace wait queue.

The problem happens with the per_cpu wait. The trace_wait code checks
all CPUs. Thus, if there's data in another CPU buffer, then it will
exit the wait, without checking for signals or waiting on the wait queue.

It would then try to read the empty buffer, and since that will just
return nothing, then it will try to wait again. Unfortunately, that will
again fail due to there still being data in the other buffers. This
ends up with a live lock for the task.

This patch fixes the trace_wait to be aware that the iterator may only
be waiting on a single buffer.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-03-11 22:15:25 -04:00
Steven Rostedt 1852fcce18 tracing: expand the ring buffers when an event is activated
To save memory, the tracer ring buffers are set to a minimum.
The activating of a trace expands the ring buffer size. This patch
adds this expanding, when an event is activated.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-03-11 22:15:24 -04:00
Steven Rostedt 73c5162aa3 tracing: keep ring buffer to minimum size till used
Impact: less memory impact on systems not using tracer

When the kernel boots up that has tracing configured, it allocates
the default size of the ring buffer. This currently happens to be
1.4Megs per possible CPU. This is quite a bit of wasted memory if
the system is never using the tracer.

The current solution is to keep the ring buffers to a minimum size
until the user uses them. Once a tracer is piped into the current_tracer
the ring buffer will be expanded to the default size. If the user
changes the size of the ring buffer, it will take the size given
by the user immediately.

If the user adds a "ftrace=" to the kernel command line, then the ring
buffers will be set to the default size on initialization.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-03-11 22:15:22 -04:00
Ingo Molnar aecfcde920 Merge branch 'tip/tracing/ftrace' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into tracing/ftrace 2009-03-11 20:47:23 +01:00
Ingo Molnar e2b8b28085 Merge branch 'tip/tracing/ftrace' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into tracing/ftrace 2009-03-10 22:55:31 +01:00
Steven Rostedt 80370cb758 tracing: use raw spinlocks for trace_vprintk
Impact: prevent locking up by lockdep tracer

The lockdep tracer uses trace_vprintk and thus trace_vprintk can not
call back into lockdep without locking up.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-03-10 17:16:35 -04:00
Steven Rostedt ef18012b24 tracing: remove funky whitespace in the trace code
Impact: clean up

There existed a lot of <space><tab>'s in the tracing code. This
patch removes them.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-03-10 14:13:14 -04:00
KOSAKI Motohiro 888b55dc31 ftrace: tracing header should put '#' at the beginning of a line
In a recent discussion, Andrew Morton pointed out that tracing header
should put '#' at the beginning of a line.

Then, we can easily filtered the header by following grep usage:

  cat trace | grep -v '^#'

Wakeup trace also has the same header problem.

Comparison of headers displayed:

before this patch:

 # tracer: wakeup
 #
 wakeup latency trace v1.1.5 on 2.6.29-rc7-tip-tip
 --------------------------------------------------------------------
  latency: 19059 us, #21277/21277, CPU#1 | (M:desktop VP:0, KP:0, SP:0 HP:0 #P:4)
     -----------------
     | task: kondemand/1-1644 (uid:0 nice:-5 policy:0 rt_prio:0)
     -----------------

 #                  _------=> CPU#
 #                 / _-----=> irqs-off
 #                | / _----=> need-resched
 #                || / _---=> hardirq/softirq
 #                ||| / _--=> preempt-depth
 #                |||| /
 #                |||||     delay
 #  cmd     pid   ||||| time  |   caller
 #     \   /      |||||   \   |   /
 irqbalan-1887    1d.s.    0us :   1887:120:R   + [001]  1644:115:S kondemand/1
 irqbalan-1887    1d.s.    1us : default_wake_function <-autoremove_wake_function
 irqbalan-1887    1d.s.    2us : check_preempt_wakeup <-try_to_wake_up

after this patch:

 # tracer: wakeup
 #
 # wakeup latency trace v1.1.5 on 2.6.29-rc7-tip-tip
 # --------------------------------------------------------------------
 # latency: 529 us, #530/530, CPU#0 | (M:desktop VP:0, KP:0, SP:0 HP:0 #P:4)
 #    -----------------
 #    | task: kondemand/0-1641 (uid:0 nice:-5 policy:0 rt_prio:0)
 #    -----------------
 #
 #                  _------=> CPU#
 #                 / _-----=> irqs-off
 #                | / _----=> need-resched
 #                || / _---=> hardirq/softirq
 #                ||| / _--=> preempt-depth
 #                |||| /
 #                |||||     delay
 #  cmd     pid   ||||| time  |   caller
 #     \   /      |||||   \   |   /
     sshd-2496    0d.s.    0us :   2496:120:R   + [000]  1641:115:S kondemand/0
     sshd-2496    0d.s.    1us : default_wake_function <-autoremove_wake_function
     sshd-2496    0d.s.    1us : check_preempt_wakeup <-try_to_wake_up

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <20090308124421.23C3.A69D9226@jp.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-08 16:57:22 +01:00
Frederic Weisbecker 769b0441f4 tracing/core: drop the old trace_printk() implementation in favour of trace_bprintk()
Impact: faster and lighter tracing

Now that we have trace_bprintk() which is faster and consume lesser
memory than trace_printk() and has the same purpose, we can now drop
the old implementation in favour of the binary one from trace_bprintk(),
which means we move all the implementation of trace_bprintk() to
trace_printk(), so the Api doesn't change except that we must now use
trace_seq_bprintk() to print the TRACE_PRINT entries.

Some changes result of this:

- Previously, trace_bprintk depended of a single tracer and couldn't
  work without. This tracer has been dropped and the whole implementation
  of trace_printk() (like the module formats management) is now integrated
  in the tracing core (comes with CONFIG_TRACING), though we keep the file
  trace_printk (previously trace_bprintk.c) where we can find the module
  management. Thus we don't overflow trace.c

- changes some parts to use trace_seq_bprintk() to print TRACE_PRINT entries.

- change a bit trace_printk/trace_vprintk macros to support non-builtin formats
  constants, and fix 'const' qualifiers warnings. But this is all transparent for
  developers.

- etc...

V2:

- Rebase against last changes
- Fix mispell on the changelog

V3:

- Rebase against last changes (moving trace_printk() to kernel.h)

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <1236356510-8381-5-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-06 17:59:12 +01:00
Lai Jiangshan 1ba28e02a1 tracing: add trace_bprintk()
Impact: add a generic printk() for tracing, like trace_printk()

trace_bprintk() uses the infrastructure to record events on ring_buffer.

[ fweisbec@gmail.com: ported to latest -tip, made it work if
  !CONFIG_MODULES, never free the format strings from modules
  because we can't keep track of them and conditionnaly create
  the ftrace format strings section (reported by Steven Rostedt) ]

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <1236356510-8381-4-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-06 17:59:11 +01:00
Lai Jiangshan 1427cdf059 tracing: infrastructure for supporting binary record
Impact: save on memory for tracing

Current tracers are typically using a struct(like struct ftrace_entry,
struct ctx_switch_entry, struct special_entr etc...)to record a binary
event. These structs can only record a their own kind of events.
A new kind of tracer need a new struct and a lot of code too handle it.

So we need a generic binary record for events. This infrastructure
is for this purpose.

[fweisbec@gmail.com: rebase against latest -tip, make it safe while sched
tracing as reported by Steven Rostedt]

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <1236356510-8381-3-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-06 17:59:11 +01:00
Steven Rostedt 5e2336a0d4 tracing: make all file_operations const
Impact: cleanup

All file_operations structures should be constant. No one is going to
change them.

Reported-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-03-05 21:46:40 -05:00
Ingo Molnar 5e1607a00b tracing: rename ftrace_printk() => trace_printk()
Impact: cleanup

Use a more generic name - this also allows the prototype to move
to kernel.h and be generally available to kernel developers who
want to do some quick tracing.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-05 10:24:48 +01:00
Steven Rostedt 27d48be844 tracing: consolidate print_lat_fmt and print_trace_fmt
Impact: clean up

Both print_lat_fmt and print_trace_fmt do pretty much the same thing
except for one different function call. This patch consolidates the
two functions and adds an if statement to perform the difference.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-03-04 21:57:29 -05:00
Steven Rostedt 5fd73f8624 tracing: remove extra latency_trace method from trace structure
Impact: clean up

The trace and latency_trace function pointers are identical for
every tracer but the function tracer. The differences in the function
tracer are trivial (latency output puts paranthesis around parent).

This patch removes the latency_trace pointer and all prints will
now just use the trace output function pointer.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-03-04 21:42:04 -05:00
Steven Rostedt c032ef64d6 tracing: add latency output format option
With the removal of the latency_trace file, we lost the ability
to see some of the finer details in a trace. Like the state of
interrupts enabled, the preempt count, need resched, and if we
are in an interrupt handler, softirq handler or not.

This patch simply creates an option to bring back the old format.
This also removes the warning about an unused variable that held
the latency_trace file operations.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-03-04 20:34:24 -05:00
Steven Rostedt e74da5235c tracing: fix seq read from trace files
The buffer used by trace_seq was updated incorrectly. Instead
of consuming what was actually read, it consumed the rest of the
buffer on reads.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-03-04 20:31:11 -05:00
Steven Rostedt 2dc5d12b1f tracing: do not return EFAULT if read copied anything
Impact: fix trace read to conform to standards

Andrew Morton, Theodore Tso and H. Peter Anvin brought to my attention
that a userspace read should not return -EFAULT if it succeeded in
copying anything. It should only return -EFAULT if it failed to copy
at all.

This patch modifies the check of copy_from_user and updates the return
code appropriately.

I also used H. Peter Anvin's short cut rule to just test ret == count.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-03-04 19:10:05 -05:00
Steven Rostedt e543ad7691 tracing: add cpu_file intialization for ftrace_dump
Impact: fix to ftrace_dump output corruption

The commit: b04cc6b1f6
  tracing/core: introduce per cpu tracing files

added a new field to the iterator called cpu_file. This was a handle
to differentiate between the per cpu trace output files and the
all cpu "trace" file. The all cpu "trace" file required setting this
to TRACE_PIPE_ALL_CPU.

The problem is that the ftrace_dump sets up its own iterator but was
not updated to handle this change. The result was only CPU 0 printing
out on crash and a lot of "<0>"'s also being printed.

Reported-by: Thomas Gleixner <tglx@linuxtronix.de>
Tested-by: Darren Hart <dvhtc@us.ibm.com>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-03-04 18:32:28 -05:00
Peter Zijlstra efed792d67 tracing: add lockdep tracepoints for lock acquire/release
Augment the traces with lock names when lockdep is available:

 1)               |  down_read_trylock() {
 1)               |    _spin_lock_irqsave() {
 1)               |      /* lock_acquire: &sem->wait_lock */
 1)   4.201 us    |    }
 1)               |    _spin_unlock_irqrestore() {
 1)               |      /* lock_release: &sem->wait_lock */
 1)   3.523 us    |    }
 1)               |  /* lock_acquire: try read &mm->mmap_sem */
 1) + 13.386 us   |  }
 1)   1.635 us    |  find_vma();
 1)               |  handle_mm_fault() {
 1)               |    __do_fault() {
 1)               |      filemap_fault() {
 1)               |        find_lock_page() {
 1)               |          find_get_page() {
 1)               |            /* lock_acquire: read rcu_read_lock */
 1)               |            /* lock_release: rcu_read_lock */
 1)   5.697 us    |          }
 1)   8.158 us    |        }
 1) + 11.079 us   |      }
 1)               |      _spin_lock() {
 1)               |        /* lock_acquire: __pte_lockptr(page) */
 1)   3.949 us    |      }
 1)   1.460 us    |      page_add_file_rmap();
 1)               |      _spin_unlock() {
 1)               |        /* lock_release: __pte_lockptr(page) */
 1)   3.115 us    |      }
 1)               |      unlock_page() {
 1)   1.421 us    |        page_waitqueue();
 1)   1.220 us    |        __wake_up_bit();
 1)   6.519 us    |      }
 1) + 34.328 us   |    }
 1) + 37.452 us   |  }
 1)               |  up_read() {
 1)               |  /* lock_release: &mm->mmap_sem */
 1)               |    _spin_lock_irqsave() {
 1)               |      /* lock_acquire: &sem->wait_lock */
 1)   3.865 us    |    }
 1)               |    _spin_unlock_irqrestore() {
 1)               |      /* lock_release: &sem->wait_lock */
 1)   8.562 us    |    }
 1) + 17.370 us   |  }

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: =?ISO-8859-1?Q?T=F6r=F6k?= Edwin <edwintorok@gmail.com>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <1236166375.5330.7209.camel@laptop>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-04 18:49:58 +01:00
Steven Rostedt 2cadf9135e tracing: add binary buffer files for use with splice
Impact: new feature

This patch creates a directory of files that correspond to the
per CPU ring buffers. These are binary files and are made to
be used with splice. This is the fastest way to extract data from
the ftrace ring buffers.

Thanks to Jiaying Zhang for pushing me to get this code fixed,
 and to Eduard - Gabriel Munteanu for his splice code that helped
 me debug my code.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-03-03 21:01:55 -05:00
Steven Rostedt f9520750c4 tracing: make trace_seq_reset global and rename to trace_seq_init
Impact: clean up

The trace_seq functions may be used separately outside of the ftrace
iterator. The trace_seq_reset is needed for these operations.

This patch also renames trace_seq_reset to the more appropriate
trace_seq_init.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-03-02 14:08:51 -05:00
Steven Rostedt ef5580d0ff tracing: add interface to write into current tracer buffer
Right now all tracers must manage their own trace buffers. This was
to enforce tracers to be independent in case we finally decide to
allow each tracer to have their own trace buffer.

But now we are adding event tracing that writes to the current tracer's
buffer. This adds an interface to allow events to write to the current
tracer buffer without having to manage its own. Since event tracing
has no "tracer", and is just a way to hook into any other tracer.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-02-28 03:06:44 -05:00
Steven Rostedt 0cfe82451d tracing: replace kzalloc with kcalloc
Impact: clean up

kcalloc is a better approach to allocate a NULL array.

Reported-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-02-27 10:51:10 -05:00
Steven Rostedt 5c6a3ae1b4 tracing: use newline separator for trace options list
Impact: clean up

Instead of listing the trace options like:

 # cat /debug/tracing/trace_options
print-parent nosym-offset nosym-addr noverbose noraw nohex nobin noblock nostacktrace nosched-tree ftrace_printk noftrace_preempt nobranch annotate nouserstacktrace nosym-userobj

We now list them like:

 # cat /debug/tracing/trace_options
print-parent
nosym-offset
nosym-addr
noverbose
noraw
nohex
nobin
noblock
nostacktrace
nosched-tree
ftrace_printk
noftrace_preempt
nobranch
annotate
nouserstacktrace
nosym-userobj

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-02-27 00:22:21 -05:00
Steven Rostedt 85a2f9b46f tracing: use pointer error returns for __tracing_open
Impact: fix compile warning and clean up

When I first wrote __tracing_open, instead of passing the error
code via the ERR_PTR macros, I lazily used a separate parameter
to hold the return for errors.

When Frederic Weisbecker updated that function, he used the Linux
kernel ERR_PTR for the returns. This caused the parameter return
to possibly not be initialized on error. gcc correctly pointed this
out with a warning.

This patch converts the entire function to use the Linux kernel
ERR_PTR macro methods.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-02-27 00:12:38 -05:00
Steven Rostedt d8e83d26b5 tracing: add protection around open use of current_tracer
Impact: fix to possible race conditions

There's some uses of current_tracer that is not protected by the
trace_types_lock. There is a small chance that a sysadmin changes
the tracer while the current_tracer is being referenced.

If the race is hit, it is unlikely to cause any harm since the
tracers are constant and are not freed. But some strang side
effects may occur.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-02-26 23:55:58 -05:00
Steven Rostedt 577b785f55 tracing: add tracer dependent options to options directory
This patch adds the tracer dependent options dynamically to the
options directory when the tracer is activated. These options are
removed when the tracer is deactivated.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-02-26 23:43:05 -05:00
Steven Rostedt a825907507 tracing: add options directory and core option files
This patch creates an options directory in the debugfs, that contains
the available tracing options. These files contain 1 or 0, where 1
is the option is enabled and 0 it is disabled.

Simply echoing in 1 will enable the option and 0 will disable it.
This patch only contains the core options, not the tracer options.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-02-26 22:22:46 -05:00
Frederic Weisbecker 8656e7a2fa tracing/core: make the per cpu trace files in per cpu directories
Impact: restructure the VFS layout of per CPU trace buffers

The per cpu trace files are all in a single directory:
/debug/tracing/per_cpu. In case of a large number of cpu, the
content of this directory becomes messy so we create now one
directory per cpu inside /debug/tracing/per_cpu which contain
each their own trace_pipe and trace files.

Ie:

 /debug/tracing$ ls -R per_cpu
 per_cpu:
 cpu0  cpu1

 per_cpu/cpu0:
 trace  trace_pipe

 per_cpu/cpu1:
 trace  trace_pipe

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-26 14:04:08 +01:00
Frederic Weisbecker d7350c3f45 tracing/core: make the read callbacks reentrants
Now that several per-cpu files can be read or spliced at the
same, we want the read/splice callbacks for tracing files to be
reentrants.

Until now, a single global mutex (trace_types_lock) serialized
the access to tracing_read_pipe(), tracing_splice_read_pipe(),
and the seq helpers.

Ie: it means that if a user tries to read trace_pipe0 and
trace_pipe1 at the same time, the access to the function
tracing_read_pipe() is contended and one reader must wait for
the other to finish its read call.

The trace_type_lock mutex is mostly here to serialize the access
to the global current tracer (current_trace), which can be
changed concurrently. Although the iter struct keeps a private
pointer to this tracer, its callbacks can be changed by another
function.

The method used here is to not keep anymore private reference to
the tracer inside the iterator but to make a copy of it inside
the iterator. Then it checks on subsequents read calls if the
tracer has changed. This is not costly because the current
tracer is not expected to be changed often, so we use a branch
prediction for that.

Moreover, we add a private mutex to the iterator (there is one
iterator per file descriptor) to serialize the accesses in case
of multiple consumers per file descriptor (which would be a
silly idea from the user). Note that this is not to protect the
ring buffer, since the ring buffer already serializes the
readers accesses. This is to prevent from traces weirdness in
case of concurrent consumers. But these mutexes can be dropped
anyway, that would not result in any crash. Just tell me what
you think about it.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-25 13:40:58 +01:00
Frederic Weisbecker b04cc6b1f6 tracing/core: introduce per cpu tracing files
Impact: split up tracing output per cpu

Currently, on the tracing debugfs directory, three files are
available to the user to let him extracting the trace output:

- trace is an iterator through the ring-buffer. It's a reader
  but not a consumer It doesn't block when no more traces are
  available.

- trace pretty similar to the former, except that it adds more
  informations such as prempt count, irq flag, ...

- trace_pipe is a reader and a consumer, it will also block
  waiting for traces if necessary (heh, yes it's a pipe).

The traces coming from different cpus are curretly mixed up
inside these files. Sometimes it messes up the informations,
sometimes it's useful, depending on what does the tracer
capture.

The tracing_cpumask file is useful to filter the output and
select only the traces captured a custom defined set of cpus.
But still it is not enough powerful to extract at the same time
one trace buffer per cpu.

So this patch creates a new directory: /debug/tracing/per_cpu/.

Inside this directory, you will now find one trace_pipe file and
one trace file per cpu.

Which means if you have two cpus, you will have:

 trace0
 trace1
 trace_pipe0
 trace_pipe1

And of course, reading these files will have the same effect
than with the usual tracing files, except that you will only see
the traces from the given cpu.

The original all-in-one cpu trace file are still available on
their original place.

Until now, only one consumer was allowed on trace_pipe to avoid
racy consuming on the ring-buffer. Now the approach changed a
bit, you can have only one consumer per cpu.

Which means you are allowed to read concurrently trace_pipe0 and
trace_pipe1 But you can't have two readers on trace_pipe0 or
trace_pipe1.

Following the same logic, if there is one reader on the common
trace_pipe, you can not have at the same time another reader on
trace_pipe0 or in trace_pipe1. Because in trace_pipe is already
a consumer in all cpu buffers in essence.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-25 13:40:58 +01:00
Ingo Molnar 886b5b73d7 tracing: remove /debug/tracing/latency_trace
Impact: remove old debug/tracing API

/debug/tracing/latency_trace is an old legacy format we kept from
the old latency tracer. Remove the file for now. If there's any
useful bit missing then we'll propagate any useful output bits into
the /debug/tracing/trace output.

Reported-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-25 11:05:34 +01:00