Commit Graph

2897 Commits

Author SHA1 Message Date
Linus Torvalds fc335c1b68 This contains a fix for two long standing bugs. Both of which are
rarely ever hit, and requires the user to do something that users rarely
 do. It took a few special test cases to even trigger this bug,
 and one of them was just one test in the process of finishing up as another
 one started.
 
 Both bugs have to do with the ring buffer iterator rb_iter_peek(), but one
 is more indirect than the other.
 
 The fist bug fix is simply an increase in the safety net loop counter.
 The counter makes sure that the rb_iter_peek() only iterates the number
 of times we expect it can, and no more. Well, there was one way it could
 iterate one more than we expected, and that caused the ring buffer
 to shutdown with a nasty warning. The fix was simply to up that counter by
 one.
 
 The other bug has to be with rb_iter_reset() (called by rb_iter_peek()).
 This happens when a user reads both the trace_pipe and trace files.
 The trace_pipe is a consuming read and does not use the ring buffer
 iterator, but the trace file is not a consuming read and does use the
 ring buffer iterator. When the trace file is being read, if it detects
 that a consuming read occurred, it resets the iterator and starts over.
 But the reset code that does this (rb_iter_reset()), checks if the
 reader_page is linked to the ring buffer or not, and will look into
 the ring buffer itself if it is not. This is wrong, as it should always
 try to read the reader page first. Not to mention, the code that looked
 into the ring buffer did it wrong, and used the header_page "read" offset
 to start reading on that page. That offset is bogus for pages in the
 writable ring buffer, and was corrupting the iterator, and it would start
 returning bogus events.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJT44tRAAoJEKQekfcNnQGuMVIH/3evbjKT+w6Kon4S0LfLSejl
 YDsXYkeO/lGiO3MnUveqq1jfw2+yHtyBVUipvfG0A61urMUhyUvjveph8Ec2cQ4Q
 qHl0J28vDfF5tpKiYzygRN01wHD6GXYh+XZSChkA4ItzzD8K51lsZT1Yd+I2pTy2
 DgH01EEEYiwYJcih+T4vlbKqYju6pwgxqKNCTv0RdVXUPya/tG9X2Nf8VGQTbmKS
 FIO73qObYR+P9iXGIuPfyOxk2EvBiWS15WownZmfhZZxOIKx9IrDYwTsiV1+AJw+
 sJFER1ulobYGDtGDa9yyrNJQr6wgbrfCDELnNKmdLUTlSwgZjLXpE2HEmlelY/s=
 =5mQl
 -----END PGP SIGNATURE-----

Merge tag 'trace-fixes-3.16' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull trace file read iterator fixes from Steven Rostedt:
 "This contains a fix for two long standing bugs.  Both of which are
  rarely ever hit, and requires the user to do something that users
  rarely do.  It took a few special test cases to even trigger this bug,
  and one of them was just one test in the process of finishing up as
  another one started.

  Both bugs have to do with the ring buffer iterator rb_iter_peek(), but
  one is more indirect than the other.

  The fist bug fix is simply an increase in the safety net loop counter.
  The counter makes sure that the rb_iter_peek() only iterates the
  number of times we expect it can, and no more.  Well, there was one
  way it could iterate one more than we expected, and that caused the
  ring buffer to shutdown with a nasty warning.  The fix was simply to
  up that counter by one.

  The other bug has to be with rb_iter_reset() (called by
  rb_iter_peek()).  This happens when a user reads both the trace_pipe
  and trace files.  The trace_pipe is a consuming read and does not use
  the ring buffer iterator, but the trace file is not a consuming read
  and does use the ring buffer iterator.  When the trace file is being
  read, if it detects that a consuming read occurred, it resets the
  iterator and starts over.  But the reset code that does this
  (rb_iter_reset()), checks if the reader_page is linked to the ring
  buffer or not, and will look into the ring buffer itself if it is not.
  This is wrong, as it should always try to read the reader page first.
  Not to mention, the code that looked into the ring buffer did it
  wrong, and used the header_page "read" offset to start reading on that
  page.  That offset is bogus for pages in the writable ring buffer, and
  was corrupting the iterator, and it would start returning bogus
  events"

* tag 'trace-fixes-3.16' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  ring-buffer: Always reset iterator to reader page
  ring-buffer: Up rb_iter_peek() loop count to 3
2014-08-09 17:29:36 -07:00
Steven Rostedt (Red Hat) 651e22f270 ring-buffer: Always reset iterator to reader page
When performing a consuming read, the ring buffer swaps out a
page from the ring buffer with a empty page and this page that
was swapped out becomes the new reader page. The reader page
is owned by the reader and since it was swapped out of the ring
buffer, writers do not have access to it (there's an exception
to that rule, but it's out of scope for this commit).

When reading the "trace" file, it is a non consuming read, which
means that the data in the ring buffer will not be modified.
When the trace file is opened, a ring buffer iterator is allocated
and writes to the ring buffer are disabled, such that the iterator
will not have issues iterating over the data.

Although the ring buffer disabled writes, it does not disable other
reads, or even consuming reads. If a consuming read happens, then
the iterator is reset and starts reading from the beginning again.

My tests would sometimes trigger this bug on my i386 box:

WARNING: CPU: 0 PID: 5175 at kernel/trace/trace.c:1527 __trace_find_cmdline+0x66/0xaa()
Modules linked in:
CPU: 0 PID: 5175 Comm: grep Not tainted 3.16.0-rc3-test+ #8
Hardware name:                  /DG965MQ, BIOS MQ96510J.86A.0372.2006.0605.1717 06/05/2006
 00000000 00000000 f09c9e1c c18796b3 c1b5d74c f09c9e4c c103a0e3 c1b5154b
 f09c9e78 00001437 c1b5d74c 000005f7 c10bd85a c10bd85a c1cac57c f09c9eb0
 ed0e0000 f09c9e64 c103a185 00000009 f09c9e5c c1b5154b f09c9e78 f09c9e80^M
Call Trace:
 [<c18796b3>] dump_stack+0x4b/0x75
 [<c103a0e3>] warn_slowpath_common+0x7e/0x95
 [<c10bd85a>] ? __trace_find_cmdline+0x66/0xaa
 [<c10bd85a>] ? __trace_find_cmdline+0x66/0xaa
 [<c103a185>] warn_slowpath_fmt+0x33/0x35
 [<c10bd85a>] __trace_find_cmdline+0x66/0xaa^M
 [<c10bed04>] trace_find_cmdline+0x40/0x64
 [<c10c3c16>] trace_print_context+0x27/0xec
 [<c10c4360>] ? trace_seq_printf+0x37/0x5b
 [<c10c0b15>] print_trace_line+0x319/0x39b
 [<c10ba3fb>] ? ring_buffer_read+0x47/0x50
 [<c10c13b1>] s_show+0x192/0x1ab
 [<c10bfd9a>] ? s_next+0x5a/0x7c
 [<c112e76e>] seq_read+0x267/0x34c
 [<c1115a25>] vfs_read+0x8c/0xef
 [<c112e507>] ? seq_lseek+0x154/0x154
 [<c1115ba2>] SyS_read+0x54/0x7f
 [<c188488e>] syscall_call+0x7/0xb
---[ end trace 3f507febd6b4cc83 ]---
>>>> ##### CPU 1 buffer started ####

Which was the __trace_find_cmdline() function complaining about the pid
in the event record being negative.

After adding more test cases, this would trigger more often. Strangely
enough, it would never trigger on a single test, but instead would trigger
only when running all the tests. I believe that was the case because it
required one of the tests to be shutting down via delayed instances while
a new test started up.

After spending several days debugging this, I found that it was caused by
the iterator becoming corrupted. Debugging further, I found out why
the iterator became corrupted. It happened with the rb_iter_reset().

As consuming reads may not read the full reader page, and only part
of it, there's a "read" field to know where the last read took place.
The iterator, must also start at the read position. In the rb_iter_reset()
code, if the reader page was disconnected from the ring buffer, the iterator
would start at the head page within the ring buffer (where writes still
happen). But the mistake there was that it still used the "read" field
to start the iterator on the head page, where it should always start
at zero because readers never read from within the ring buffer where
writes occur.

I originally wrote a patch to have it set the iter->head to 0 instead
of iter->head_page->read, but then I questioned why it wasn't always
setting the iter to point to the reader page, as the reader page is
still valid.  The list_empty(reader_page->list) just means that it was
successful in swapping out. But the reader_page may still have data.

There was a bug report a long time ago that was not reproducible that
had something about trace_pipe (consuming read) not matching trace
(iterator read). This may explain why that happened.

Anyway, the correct answer to this bug is to always use the reader page
an not reset the iterator to inside the writable ring buffer.

Cc: stable@vger.kernel.org # 2.6.28+
Fixes: d769041f86 "ring_buffer: implement new locking"
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-08-06 16:06:34 -04:00
Steven Rostedt (Red Hat) 021de3d904 ring-buffer: Up rb_iter_peek() loop count to 3
After writting a test to try to trigger the bug that caused the
ring buffer iterator to become corrupted, I hit another bug:

 WARNING: CPU: 1 PID: 5281 at kernel/trace/ring_buffer.c:3766 rb_iter_peek+0x113/0x238()
 Modules linked in: ipt_MASQUERADE sunrpc [...]
 CPU: 1 PID: 5281 Comm: grep Tainted: G        W     3.16.0-rc3-test+ #143
 Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M., BIOS SDBLI944.86P 05/08/2007
  0000000000000000 ffffffff81809a80 ffffffff81503fb0 0000000000000000
  ffffffff81040ca1 ffff8800796d6010 ffffffff810c138d ffff8800796d6010
  ffff880077438c80 ffff8800796d6010 ffff88007abbe600 0000000000000003
 Call Trace:
  [<ffffffff81503fb0>] ? dump_stack+0x4a/0x75
  [<ffffffff81040ca1>] ? warn_slowpath_common+0x7e/0x97
  [<ffffffff810c138d>] ? rb_iter_peek+0x113/0x238
  [<ffffffff810c138d>] ? rb_iter_peek+0x113/0x238
  [<ffffffff810c14df>] ? ring_buffer_iter_peek+0x2d/0x5c
  [<ffffffff810c6f73>] ? tracing_iter_reset+0x6e/0x96
  [<ffffffff810c74a3>] ? s_start+0xd7/0x17b
  [<ffffffff8112b13e>] ? kmem_cache_alloc_trace+0xda/0xea
  [<ffffffff8114cf94>] ? seq_read+0x148/0x361
  [<ffffffff81132d98>] ? vfs_read+0x93/0xf1
  [<ffffffff81132f1b>] ? SyS_read+0x60/0x8e
  [<ffffffff8150bf9f>] ? tracesys+0xdd/0xe2

Debugging this bug, which triggers when the rb_iter_peek() loops too
many times (more than 2 times), I discovered there's a case that can
cause that function to legitimately loop 3 times!

rb_iter_peek() is different than rb_buffer_peek() as the rb_buffer_peek()
only deals with the reader page (it's for consuming reads). The
rb_iter_peek() is for traversing the buffer without consuming it, and as
such, it can loop for one more reason. That is, if we hit the end of
the reader page or any page, it will go to the next page and try again.

That is, we have this:

 1. iter->head > iter->head_page->page->commit
    (rb_inc_iter() which moves the iter to the next page)
    try again

 2. event = rb_iter_head_event()
    event->type_len == RINGBUF_TYPE_TIME_EXTEND
    rb_advance_iter()
    try again

 3. read the event.

But we never get to 3, because the count is greater than 2 and we
cause the WARNING and return NULL.

Up the counter to 3.

Cc: stable@vger.kernel.org # 2.6.37+
Fixes: 69d1b839f7 "ring-buffer: Bind time extend and data events together"
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-08-06 16:06:33 -04:00
Linus Torvalds e7fda6c4c3 Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer and time updates from Thomas Gleixner:
 "A rather large update of timers, timekeeping & co

   - Core timekeeping code is year-2038 safe now for 32bit machines.
     Now we just need to fix all in kernel users and the gazillion of
     user space interfaces which rely on timespec/timeval :)

   - Better cache layout for the timekeeping internal data structures.

   - Proper nanosecond based interfaces for in kernel users.

   - Tree wide cleanup of code which wants nanoseconds but does hoops
     and loops to convert back and forth from timespecs.  Some of it
     definitely belongs into the ugly code museum.

   - Consolidation of the timekeeping interface zoo.

   - A fast NMI safe accessor to clock monotonic for tracing.  This is a
     long standing request to support correlated user/kernel space
     traces.  With proper NTP frequency correction it's also suitable
     for correlation of traces accross separate machines.

   - Checkpoint/restart support for timerfd.

   - A few NOHZ[_FULL] improvements in the [hr]timer code.

   - Code move from kernel to kernel/time of all time* related code.

   - New clocksource/event drivers from the ARM universe.  I'm really
     impressed that despite an architected timer in the newer chips SoC
     manufacturers insist on inventing new and differently broken SoC
     specific timers.

[ Ed. "Impressed"? I don't think that word means what you think it means ]

   - Another round of code move from arch to drivers.  Looks like most
     of the legacy mess in ARM regarding timers is sorted out except for
     a few obnoxious strongholds.

   - The usual updates and fixlets all over the place"

* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (114 commits)
  timekeeping: Fixup typo in update_vsyscall_old definition
  clocksource: document some basic timekeeping concepts
  timekeeping: Use cached ntp_tick_length when accumulating error
  timekeeping: Rework frequency adjustments to work better w/ nohz
  timekeeping: Minor fixup for timespec64->timespec assignment
  ftrace: Provide trace clocks monotonic
  timekeeping: Provide fast and NMI safe access to CLOCK_MONOTONIC
  seqcount: Add raw_write_seqcount_latch()
  seqcount: Provide raw_read_seqcount()
  timekeeping: Use tk_read_base as argument for timekeeping_get_ns()
  timekeeping: Create struct tk_read_base and use it in struct timekeeper
  timekeeping: Restructure the timekeeper some more
  clocksource: Get rid of cycle_last
  clocksource: Move cycle_last validation to core code
  clocksource: Make delta calculation a function
  wireless: ath9k: Get rid of timespec conversions
  drm: vmwgfx: Use nsec based interfaces
  drm: i915: Use nsec based interfaces
  timekeeping: Provide ktime_get_raw()
  hangcheck-timer: Use ktime_get_ns()
  ...
2014-08-05 17:46:42 -07:00
Linus Torvalds ef35ad26f8 Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf changes from Ingo Molnar:
 "Kernel side changes:

   - Consolidate the PMU interrupt-disabled code amongst architectures
     (Vince Weaver)

   - misc fixes

  Tooling changes (new features, user visible changes):

   - Add support for pagefault tracing in 'trace', please see multiple
     examples in the changeset messages (Stanislav Fomichev).

   - Add pagefault statistics in 'trace' (Stanislav Fomichev)

   - Add header for columns in 'top' and 'report' TUI browsers (Jiri
     Olsa)

   - Add pagefault statistics in 'trace' (Stanislav Fomichev)

   - Add IO mode into timechart command (Stanislav Fomichev)

   - Fallback to syscalls:* when raw_syscalls:* is not available in the
     perl and python perf scripts.  (Daniel Bristot de Oliveira)

   - Add --repeat global option to 'perf bench' to be used in benchmarks
     such as the existing 'futex' one, that was modified to use it
     instead of a local option.  (Davidlohr Bueso)

   - Fix fd -> pathname resolution in 'trace', be it using /proc or a
     vfs_getname probe point.  (Arnaldo Carvalho de Melo)

   - Add suggestion of how to set perf_event_paranoid sysctl, to help
     non-root users trying tools like 'trace' to get a working
     environment.  (Arnaldo Carvalho de Melo)

   - Updates from trace-cmd for traceevent plugin_kvm plus args cleanup
     (Steven Rostedt, Jan Kiszka)

   - Support S/390 in 'perf kvm stat' (Alexander Yarygin)

  Tooling infrastructure changes:

   - Allow reserving a row for header purposes in the hists browser
     (Arnaldo Carvalho de Melo)

   - Various fixes and prep work related to supporting Intel PT (Adrian
     Hunter)

   - Introduce multiple debug variables control (Jiri Olsa)

   - Add callchain and additional sample information for python scripts
     (Joseph Schuchart)

   - More prep work to support Intel PT: (Adrian Hunter)
     - Polishing 'script' BTS output
     - 'inject' can specify --kallsym
     - VDSO is per machine, not a global var
     - Expose data addr lookup functions previously private to 'script'
     - Large mmap fixes in events processing

   - Include standard stringify macros in power pc code (Sukadev
     Bhattiprolu)

  Tooling cleanups:

   - Convert open coded equivalents to asprintf() (Andy Shevchenko)

   - Remove needless reassignments in 'trace' (Arnaldo Carvalho de Melo)

   - Cache the is_exit syscall test in 'trace) (Arnaldo Carvalho de
     Melo)

   - No need to reimplement err() in 'perf bench sched-messaging', drop
     barf().  (Davidlohr Bueso).

   - Remove ev_name argument from perf_evsel__hists_browse, can be
     obtained from the other parameters.  (Jiri Olsa)

  Tooling fixes:

   - Fix memory leak in the 'sched-messaging' perf bench test.
     (Davidlohr Bueso)

   - The -o and -n 'perf bench mem' options are mutually exclusive, emit
     error when both are specified.  (Davidlohr Bueso)

   - Fix scrollbar refresh row index in the ui browser, problem exposed
     now that headers will be added and will be allowed to be switched
     on/off.  (Jiri Olsa)

   - Handle the num array type in python properly (Sebastian Andrzej
     Siewior)

   - Fix wrong condition for allocation failure (Jiri Olsa)

   - Adjust callchain based on DWARF debug info on powerpc (Sukadev
     Bhattiprolu)

   - Fix a risk for doing free on uninitialized pointer in traceevent
     lib (Rickard Strandqvist)

   - Update attr test with PERF_FLAG_FD_CLOEXEC flag (Jiri Olsa)

   - Enable close-on-exec flag on perf file descriptor (Yann Droneaud)

   - Fix build on gcc 4.4.7 (Arnaldo Carvalho de Melo)

   - Event ordering fixes (Jiri Olsa)"

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (123 commits)
  Revert "perf tools: Fix jump label always changing during tracing"
  perf tools: Fix perf usage string leftover
  perf: Check permission only for parent tracepoint event
  perf record: Store PERF_RECORD_FINISHED_ROUND only for nonempty rounds
  perf record: Always force PERF_RECORD_FINISHED_ROUND event
  perf inject: Add --kallsyms parameter
  perf tools: Expose 'addr' functions so they can be reused
  perf session: Fix accounting of ordered samples queue
  perf powerpc: Include util/util.h and remove stringify macros
  perf tools: Fix build on gcc 4.4.7
  perf tools: Add thread parameter to vdso__dso_findnew()
  perf tools: Add dso__type()
  perf tools: Separate the VDSO map name from the VDSO dso name
  perf tools: Add vdso__new()
  perf machine: Fix the lifetime of the VDSO temporary file
  perf tools: Group VDSO global variables into a structure
  perf session: Add ability to skip 4GiB or more
  perf session: Add ability to 'skip' a non-piped event stream
  perf tools: Pass machine to vdso__dso_findnew()
  perf tools: Add dso__data_size()
  ...
2014-08-04 16:09:53 -07:00
Linus Torvalds c9b88e9581 Oleg Nesterov did several clean ups with the tracing filter code.
As he found some small bugs that went into 3.16, and these changes were
 based on that, I had to apply his changes to a separate branch than
 my main development branch.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJT3533AAoJEKQekfcNnQGuaNkIALM9uf6gvIXgvyAlhK1qfjJL
 0pxAlgppukA7hZT9gcubGuStPxKnwtVSvd/GwGI/mc7Ls57vOvdA61TEHN48l2MW
 SUev1Jeu1Nx3Wh4B2ivRq+dJadrwU9NwG507yvuXn2L8Gqzwal133dg3AHnsTYgz
 yu7oe8baxMniFFYlAmSELFp2cA3kli/WrRuQB7weudbT12jZlpOMqz7OUN9o7dfo
 E4XfwkkDUuo9EiKD1QwVMk2cIkCaHFI8lJDYJbrM3L5XJKfnKzZNRMWfwDG2L7hS
 DGjMl6T0SjU+ZIoTidHdbyF/Q+I0o9kKTnUrErGu88+2Z1gfEKN5O8zovNMbPUM=
 =dT/x
 -----END PGP SIGNATURE-----

Merge tag 'trace-3.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing filter cleanups from Steven Rostedt:
 "Oleg Nesterov did several clean ups with the tracing filter code.  As
  he found some small bugs that went into 3.16, and these changes were
  based on that, I had to apply his changes to a separate branch than my
  main development branch.

  This was based on work that was already pulled into 3.16, and is a
  separate pull request to keep from having local merges in my pull
  request"

* tag 'trace-3.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Kill "filter_string" arg of replace_preds()
  tracing: Change apply_subsystem_event_filter() paths to check file->system == dir
  tracing: Kill ftrace_event_call->files
  tracing/uprobes: Kill the dead TRACE_EVENT_FL_USE_CALL_FILTER logic
  tracing: Kill call_filter_disable()
  tracing: Kill destroy_call_preds()
  tracing: Kill destroy_preds() and destroy_file_preds()
2014-08-04 12:02:48 -07:00
Linus Torvalds b8c0aa46b3 This pull request has a lot of work done. The main thing is the changes
to the ftrace function callback infrastructure. It's introducing a
 way to allow different functions to call directly different trampolines
 instead of all calling the same "mcount" one.
 
 The only user of this for now is the function graph tracer, which always
 had a different trampoline, but the function tracer trampoline was called
 and did basically nothing, and then the function graph tracer trampoline
 was called. The difference now, is that the function graph tracer
 trampoline can be called directly if a function is only being traced by
 the function graph trampoline. If function tracing is also happening on
 the same function, the old way is still done.
 
 The accounting for this takes up more memory when function graph tracing
 is activated, as it needs to keep track of which functions it uses.
 I have a new way that wont take as much memory, but it's not ready yet
 for this merge window, and will have to wait for the next one.
 
 Another big change was the removal of the ftrace_start/stop() calls that
 were used by the suspend/resume code that stopped function tracing when
 entering into suspend and resume paths. The stop of ftrace was done
 because there was some function that would crash the system if one called
 smp_processor_id()! The stop/start was a big hammer to solve the issue
 at the time, which was when ftrace was first introduced into Linux.
 Now ftrace has better infrastructure to debug such issues, and I found
 the problem function and labeled it with "notrace" and function tracing
 can now safely be activated all the way down into the guts of suspend
 and resume.
 
 Other changes include clean ups of uprobe code.
 Clean up of the trace_seq() code.
 And other various small fixes and clean ups to ftrace and tracing.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJT35zXAAoJEKQekfcNnQGuOz0H/38zqM0nLFhrgvz3EPk2UOjn
 xqpX8qyb2V7TJZL+IqeXU2a5cQZl5ba0D4WtBGpxbTae3CJYiuQ87iKUNFoH0om5
 FDpn80igb368k8V3qRdRsziKVCCf0XBd/NkHJXc0ZkfXGyzB2Ga4bBxALxp2gj9y
 bnO+vKo6+tWYKG4hyQb4P3LRXUrK8/LWEsPr39cH2QH1Rdj69Lx9CgrCdUVJmwcb
 Bj8hEiLXL/RYCFNn79A3wNTUvW0rG/AOIf4SLqXtasSRZ0ToaU0ZyDnrNv+0Ol47
 rX8tSk+LfXchL9hpIvjCf1vlAYq3pO02favteR/jip3lx/dTjEDE4RJ9qtJzZ4Q=
 =fwQY
 -----END PGP SIGNATURE-----

Merge tag 'trace-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing updates from Steven Rostedt:
 "This pull request has a lot of work done.  The main thing is the
  changes to the ftrace function callback infrastructure.  It's
  introducing a way to allow different functions to call directly
  different trampolines instead of all calling the same "mcount" one.

  The only user of this for now is the function graph tracer, which
  always had a different trampoline, but the function tracer trampoline
  was called and did basically nothing, and then the function graph
  tracer trampoline was called.  The difference now, is that the
  function graph tracer trampoline can be called directly if a function
  is only being traced by the function graph trampoline.  If function
  tracing is also happening on the same function, the old way is still
  done.

  The accounting for this takes up more memory when function graph
  tracing is activated, as it needs to keep track of which functions it
  uses.  I have a new way that wont take as much memory, but it's not
  ready yet for this merge window, and will have to wait for the next
  one.

  Another big change was the removal of the ftrace_start/stop() calls
  that were used by the suspend/resume code that stopped function
  tracing when entering into suspend and resume paths.  The stop of
  ftrace was done because there was some function that would crash the
  system if one called smp_processor_id()! The stop/start was a big
  hammer to solve the issue at the time, which was when ftrace was first
  introduced into Linux.  Now ftrace has better infrastructure to debug
  such issues, and I found the problem function and labeled it with
  "notrace" and function tracing can now safely be activated all the way
  down into the guts of suspend and resume

  Other changes include clean ups of uprobe code, clean up of the
  trace_seq() code, and other various small fixes and clean ups to
  ftrace and tracing"

* tag 'trace-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (57 commits)
  ftrace: Add warning if tramp hash does not match nr_trampolines
  ftrace: Fix trampoline hash update check on rec->flags
  ring-buffer: Use rb_page_size() instead of open coded head_page size
  ftrace: Rename ftrace_ops field from trampolines to nr_trampolines
  tracing: Convert local function_graph functions to static
  ftrace: Do not copy old hash when resetting
  tracing: let user specify tracing_thresh after selecting function_graph
  ring-buffer: Always run per-cpu ring buffer resize with schedule_work_on()
  tracing: Remove function_trace_stop and HAVE_FUNCTION_TRACE_MCOUNT_TEST
  s390/ftrace: remove check of obsolete variable function_trace_stop
  arm64, ftrace: Remove check of obsolete variable function_trace_stop
  Blackfin: ftrace: Remove check of obsolete variable function_trace_stop
  metag: ftrace: Remove check of obsolete variable function_trace_stop
  microblaze: ftrace: Remove check of obsolete variable function_trace_stop
  MIPS: ftrace: Remove check of obsolete variable function_trace_stop
  parisc: ftrace: Remove check of obsolete variable function_trace_stop
  sh: ftrace: Remove check of obsolete variable function_trace_stop
  sparc64,ftrace: Remove check of obsolete variable function_trace_stop
  tile: ftrace: Remove check of obsolete variable function_trace_stop
  ftrace: x86: Remove check of obsolete variable function_trace_stop
  ...
2014-08-04 11:50:00 -07:00
Jiri Olsa f4be073db8 perf: Check permission only for parent tracepoint event
There's no need to check cloned event's permission once the
parent was already checked.

Also the code is checking 'current' process permissions, which
is not owner process for cloned events, thus could end up with
wrong permission check result.

Reported-by: Alexander Yarygin <yarygin@linux.vnet.ibm.com>
Tested-by: Alexander Yarygin <yarygin@linux.vnet.ibm.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/1405079782-8139-1-git-send-email-jolsa@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-07-28 10:01:38 +02:00
Steven Rostedt (Red Hat) dc6f03f26f ftrace: Add warning if tramp hash does not match nr_trampolines
After adding all the records to the tramp_hash, add a check that makes
sure that the number of records added matches the number of records
expected to match and do a WARN_ON and disable ftrace if they do
not match.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-24 11:26:11 -04:00
Steven Rostedt (Red Hat) 2a0343baa4 ftrace: Fix trampoline hash update check on rec->flags
In the loop of ftrace_save_ops_tramp_hash(), it adds all the recs
to the ops hash if the rec has only one callback attached and the
ops is connected to the rec. It gives a nasty warning and shuts down
ftrace if the rec doesn't have a trampoline set for it. But this
can happen with the following scenario:

  # cd /sys/kernel/debug/tracing
  # echo schedule do_IRQ > set_ftrace_filter
  # mkdir instances/foo
  # echo schedule > instances/foo/set_ftrace_filter
  # echo function_graph > current_function
  # echo function > instances/foo/current_function
  # echo nop > instances/foo/current_function

The above would then trigger the following warning and disable
ftrace:

 ------------[ cut here ]------------
 WARNING: CPU: 0 PID: 3145 at kernel/trace/ftrace.c:2212 ftrace_run_update_code+0xe4/0x15b()
 Modules linked in: ipt_MASQUERADE sunrpc ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ip [...]
 CPU: 1 PID: 3145 Comm: bash Not tainted 3.16.0-rc3-test+ #136
 Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M., BIOS SDBLI944.86P 05/08/2007
  0000000000000000 ffffffff81808a88 ffffffff81502130 0000000000000000
  ffffffff81040ca1 ffff880077c08000 ffffffff810bd286 0000000000000001
  ffffffff81a56830 ffff88007a041be0 ffff88007a872d60 00000000000001be
 Call Trace:
  [<ffffffff81502130>] ? dump_stack+0x4a/0x75
  [<ffffffff81040ca1>] ? warn_slowpath_common+0x7e/0x97
  [<ffffffff810bd286>] ? ftrace_run_update_code+0xe4/0x15b
  [<ffffffff810bd286>] ? ftrace_run_update_code+0xe4/0x15b
  [<ffffffff810bda1a>] ? ftrace_shutdown+0x11c/0x16b
  [<ffffffff810bda87>] ? unregister_ftrace_function+0x1e/0x38
  [<ffffffff810cc7e1>] ? function_trace_reset+0x1a/0x28
  [<ffffffff810c924f>] ? tracing_set_tracer+0xc1/0x276
  [<ffffffff810c9477>] ? tracing_set_trace_write+0x73/0x91
  [<ffffffff81132383>] ? __sb_start_write+0x9a/0xcc
  [<ffffffff8120478f>] ? security_file_permission+0x1b/0x31
  [<ffffffff81130e49>] ? vfs_write+0xac/0x11c
  [<ffffffff8113115d>] ? SyS_write+0x60/0x8e
  [<ffffffff81508112>] ? system_call_fastpath+0x16/0x1b
 ---[ end trace 938c4415cbc7dc96 ]---
 ------------[ cut here ]------------

Link: http://lkml.kernel.org/r/20140723120805.GB21376@redhat.com

Reported-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-24 10:06:41 -04:00
Steven Rostedt (Red Hat) 10e83fd01c ring-buffer: Use rb_page_size() instead of open coded head_page size
There's a helper function to get a ring buffer page size (the number
of bytes of data recorded on the page), called rb_page_size().
Use that instead of open coding it.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-23 19:45:12 -04:00
Thomas Gleixner 1b3e5c0936 ftrace: Provide trace clocks monotonic
Expose the new NMI safe accessor to clock monotonic to the tracer.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
2014-07-23 15:01:55 -07:00
Steven Rostedt (Red Hat) 0162d621dd ftrace: Rename ftrace_ops field from trampolines to nr_trampolines
Having two fields within the same struct that is off by one character
can be confusing and error prone. Rename the counter "trampolines"
to "nr_trampolines" to explicitly show it is a counter and not to
be confused by the "trampoline" field.

Suggested-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-23 15:03:00 -04:00
Tony Luck 58d4e21e50 tracing: Fix wraparound problems in "uptime" trace clock
The "uptime" trace clock added in:

    commit 8aacf017b0
    tracing: Add "uptime" trace clock that uses jiffies

has wraparound problems when the system has been up more
than 1 hour 11 minutes and 34 seconds. It converts jiffies
to nanoseconds using:
        (u64)jiffies_to_usecs(jiffy) * 1000ULL
but since jiffies_to_usecs() only returns a 32-bit value, it
truncates at 2^32 microseconds.  An additional problem on 32-bit
systems is that the argument is "unsigned long", so fixing the
return value only helps until 2^32 jiffies (49.7 days on a HZ=1000
system).

Avoid these problems by using jiffies_64 as our basis, and
not converting to nanoseconds (we do convert to clock_t because
user facing API must not be dependent on internal kernel
HZ values).

Link: http://lkml.kernel.org/p/99d63c5bfe9b320a3b428d773825a37095bf6a51.1405708254.git.tony.luck@intel.com

Cc: stable@vger.kernel.org # 3.10+
Fixes: 8aacf017b0 "tracing: Add "uptime" trace clock that uses jiffies"
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-21 09:56:12 -04:00
Steven Rostedt (Red Hat) ba1afef6a4 tracing: Convert local function_graph functions to static
Local functions should be static.

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-18 21:16:06 -04:00
Wang Nan b972cc58ce ftrace: Do not copy old hash when resetting
Do not waste time copying the old hash if the hash is going to be
reset. Just allocate a new hash and free the old one, as that is
the same result as copying te old one and then resetting it.

Link: http://lkml.kernel.org/p/1405384820-48837-1-git-send-email-wangnan0@huawei.com

Signed-off-by: Wang Nan <wangnan0@huawei.com>
[ SDR: Removed unused ftrace_filter_reset() function ]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-18 17:47:04 -04:00
Stanislav Fomichev 6508fa761c tracing: let user specify tracing_thresh after selecting function_graph
Currently, tracing_thresh works only if we specify it before selecting
function_graph tracer. If we do the opposite, tracing_thresh will change
it's value, but it will not be applied.
To fix it, we add update_thresh callback which is called whenever
tracing_thresh is updated and for function_graph tracer we register
handler which reinitializes tracer depending on tracing_thresh.

Link: http://lkml.kernel.org/p/20140718111727.GA3206@stfomichev-desktop.yandex.net

Signed-off-by: Stanislav Fomichev <stfomichev@yandex-team.ru>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-18 15:48:52 -04:00
Corey Minyard 021c5b3445 ring-buffer: Always run per-cpu ring buffer resize with schedule_work_on()
The code for resizing the trace ring buffers has to run the per-cpu
resize on the CPU itself.  The code was using preempt_off() and
running the code for the current CPU directly, otherwise calling
schedule_work_on().

At least on RT this could result in the following:

|BUG: sleeping function called from invalid context at kernel/rtmutex.c:673
|in_atomic(): 1, irqs_disabled(): 0, pid: 607, name: bash
|3 locks held by bash/607:
|CPU: 0 PID: 607 Comm: bash Not tainted 3.12.15-rt25+ #124
|(rt_spin_lock+0x28/0x68)
|(free_hot_cold_page+0x84/0x3b8)
|(free_buffer_page+0x14/0x20)
|(rb_update_pages+0x280/0x338)
|(ring_buffer_resize+0x32c/0x3dc)
|(free_snapshot+0x18/0x38)
|(tracing_set_tracer+0x27c/0x2ac)

probably via
|cd /sys/kernel/debug/tracing/
|echo 1 > events/enable ; sleep 2
|echo 1024 > buffer_size_kb

If we just always use schedule_work_on(), there's no need for the
preempt_off().  So do that.

Link: http://lkml.kernel.org/p/1405537633-31518-1-git-send-email-cminyard@mvista.com

Reported-by: Stanislav Meduna <stano@meduna.org>
Signed-off-by: Corey Minyard <cminyard@mvista.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-18 13:58:12 -04:00
Steven Rostedt (Red Hat) 3a636388ba tracing: Remove function_trace_stop and HAVE_FUNCTION_TRACE_MCOUNT_TEST
All users of function_trace_stop and HAVE_FUNCTION_TRACE_MCOUNT_TEST have
been removed. We can safely remove them from the kernel.

Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-18 13:58:12 -04:00
Steven Rostedt (Red Hat) 1d48d5960f ftrace: Remove function_trace_stop check from list func
function_trace_stop is no longer used to stop function tracing.
Remove the check from __ftrace_ops_list_func().

Also, call FTRACE_WARN_ON() instead of setting function_trace_stop
if a ops has no func to call.

Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-18 13:57:00 -04:00
Steven Rostedt (Red Hat) 1820122a76 ftrace: Do no disable function tracing on enabling function tracing
When function tracing is being updated function_trace_stop is set to
keep from tracing the updates. This was fine when function tracing
was done from stop machine. But it is no longer done that way and
this can cause real tracing to be missed.

Remove it.

Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-18 13:56:59 -04:00
Steven Rostedt (Red Hat) 545d47b8f3 ftrace-graph: Remove usage of ftrace_stop() in ftrace_graph_stop()
All archs now use ftrace_graph_is_dead() to stop function graph
tracing. Remove the usage of ftrace_stop() as that is no longer
needed.

Cc: Frederic Weisbecker <fweisbec@gmail.com>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-18 13:56:58 -04:00
Steven Rostedt (Red Hat) 1b2f121c14 ftrace-graph: Remove dependency of ftrace_stop() from ftrace_graph_stop()
ftrace_stop() is going away as it disables parts of function tracing
that affects users that should not be affected. But ftrace_graph_stop()
is built on ftrace_stop(). Here's another example of killing all of
function tracing because something went wrong with function graph
tracing.

Instead of disabling all users of function tracing on function graph
error, disable only function graph tracing.

A new function is created called ftrace_graph_is_dead(). This is called
in strategic paths to prevent function graph from doing more harm and
allowing at least a warning to be printed before the system crashes.

NOTE: ftrace_stop() is still used until all the archs are converted over
to use ftrace_graph_is_dead(). After that, ftrace_stop() will be removed.

Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-17 09:45:07 -04:00
Oleg Nesterov 6355d54438 tracing: Kill "filter_string" arg of replace_preds()
Cosmetic, but replace_preds() doesn't need/use "char *filter_string".
Remove it to microsimplify the code.

Link: http://lkml.kernel.org/p/20140715184832.GA20519@redhat.com

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-16 14:58:53 -04:00
Oleg Nesterov bb9ef1cb7d tracing: Change apply_subsystem_event_filter() paths to check file->system == dir
filter_free_subsystem_preds(), filter_free_subsystem_filters() and
replace_system_preds() can simply check file->system->subsystem and
avoid strcmp(call->class->system).

Better yet, we can pass "struct ftrace_subsystem_dir *dir" instead of
event_subsystem and just check file->system == dir.

Thanks to Namhyung Kim who pointed out that replace_system_preds() can
be changed too.

Link: http://lkml.kernel.org/p/20140715184829.GA20516@redhat.com

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-16 14:33:35 -04:00
Oleg Nesterov ede392a750 tracing/uprobes: Kill the dead TRACE_EVENT_FL_USE_CALL_FILTER logic
alloc_trace_uprobe() sets TRACE_EVENT_FL_USE_CALL_FILTER for unknown
reason and this is simply wrong. Fortunately this has no effect because
register_uprobe_event() clears call->flags after that.

Kill both. This trace_uprobe was kzalloc'ed and we rely on this fact
anyway.

Link: http://lkml.kernel.org/p/20140715184824.GA20505@redhat.com

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-16 14:25:19 -04:00
Oleg Nesterov b5d09db5ac tracing: Kill call_filter_disable()
It seems that the only purpose of call_filter_disable() is to
make filter_disable() less clear and symmetrical, remove it.

Link: http://lkml.kernel.org/p/20140715184821.GA20498@redhat.com

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-16 14:23:12 -04:00
Oleg Nesterov 57375747b6 tracing: Kill destroy_call_preds()
Remove destroy_call_preds(). Its only caller, __trace_remove_event_call(),
can use free_event_filter() and nullify ->filter by hand.

Perhaps we could keep this trivial helper although imo it is pointless, but
then it should be static in trace_events.c.

Link: http://lkml.kernel.org/p/20140715184816.GA20495@redhat.com

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-16 14:22:32 -04:00
Oleg Nesterov 3e5454d656 tracing: Kill destroy_preds() and destroy_file_preds()
destroy_preds() makes no sense.

The only caller, event_remove(), actually wants destroy_file_preds().
__trace_remove_event_call() does destroy_call_preds() which takes care
of call->filter.

And after the previous change we can simply remove destroy_preds() from
event_remove(), we are going to call remove_event_from_tracers() which
in turn calls remove_event_file_dir()->free_event_filter().

Link: http://lkml.kernel.org/p/20140715184813.GA20488@redhat.com

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-16 14:19:55 -04:00
Steven Rostedt (Red Hat) 646d7043ad ftrace: Allow archs to specify if they need a separate function graph trampoline
Currently if an arch supports function graph tracing, the core code will
just assign the function graph trampoline to the function graph addr that
gets called.

But as the old method for function graph tracing always calls the function
trampoline first and that calls the function graph trampoline, some
archs may have the function graph trampoline dependent on operations that
were done in the function trampoline. This causes function graph tracer
to break on those archs.

Instead of having the default be to set the function graph ftrace_ops
to the function graph trampoline, have it instead just set it to zero
which will keep it from jumping to a trampoline that is not set up
to be jumped directly too.

Link: http://lkml.kernel.org/r/53BED155.9040607@nvidia.com

Reported-by: Tuomas Tynkkynen <ttynkkynen@nvidia.com>
Tested-by: Tuomas Tynkkynen <ttynkkynen@nvidia.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-16 11:01:24 -04:00
Martin Lau 97b8ee8453 ring-buffer: Fix polling on trace_pipe
ring_buffer_poll_wait() should always put the poll_table to its wait_queue
even there is immediate data available.  Otherwise, the following epoll and
read sequence will eventually hang forever:

1. Put some data to make the trace_pipe ring_buffer read ready first
2. epoll_ctl(efd, EPOLL_CTL_ADD, trace_pipe_fd, ee)
3. epoll_wait()
4. read(trace_pipe_fd) till EAGAIN
5. Add some more data to the trace_pipe ring_buffer
6. epoll_wait() -> this epoll_wait() will block forever

~ During the epoll_ctl(efd, EPOLL_CTL_ADD,...) call in step 2,
  ring_buffer_poll_wait() returns immediately without adding poll_table,
  which has poll_table->_qproc pointing to ep_poll_callback(), to its
  wait_queue.
~ During the epoll_wait() call in step 3 and step 6,
  ring_buffer_poll_wait() cannot add ep_poll_callback() to its wait_queue
  because the poll_table->_qproc is NULL and it is how epoll works.
~ When there is new data available in step 6, ring_buffer does not know
  it has to call ep_poll_callback() because it is not in its wait queue.
  Hence, block forever.

Other poll implementation seems to call poll_wait() unconditionally as the very
first thing to do.  For example, tcp_poll() in tcp.c.

Link: http://lkml.kernel.org/p/20140610060637.GA14045@devbig242.prn2.facebook.com

Cc: stable@vger.kernel.org # 2.6.27
Fixes: 2a2cc8f7c4 "ftrace: allow the event pipe to be polled"
Reviewed-by: Chris Mason <clm@fb.com>
Signed-off-by: Martin Lau <kafai@fb.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-15 17:07:47 -04:00
zhangwei(Jovi) f0160a5a29 tracing: Add TRACE_ITER_PRINTK flag check in __trace_puts/__trace_bputs
The TRACE_ITER_PRINTK check in __trace_puts/__trace_bputs is missing,
so add it, to be consistent with __trace_printk/__trace_bprintk.
Those functions are all called by the same function: trace_printk().

Link: http://lkml.kernel.org/p/51E7A7D6.8090900@huawei.com

Cc: stable@vger.kernel.org # 3.11+
Signed-off-by: zhangwei(Jovi) <jovi.zhangwei@huawei.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-15 11:58:40 -04:00
Steven Rostedt (Red Hat) 5f8bf2d263 tracing: Fix graph tracer with stack tracer on other archs
Running my ftrace tests on PowerPC, it failed the test that checks
if function_graph tracer is affected by the stack tracer. It was.
Looking into this, I found that the update_function_graph_func()
must be called even if the trampoline function is not changed.
This is because archs like PowerPC do not support ftrace_ops being
passed by assembly and instead uses a helper function (what the
trampoline function points to). Since this function is not changed
even when multiple ftrace_ops are added to the code, the test that
falls out before calling update_function_graph_func() will miss that
the update must still be done.

Call update_function_graph_function() for all calls to
update_ftrace_function()

Cc: stable@vger.kernel.org # 3.3+
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-15 11:10:25 -04:00
zhangwei(Jovi) 8abfb8727f tracing: Add ftrace_trace_stack into __trace_puts/__trace_bputs
Currently trace option stacktrace is not applicable for
trace_printk with constant string argument, the reason is
in __trace_puts/__trace_bputs ftrace_trace_stack is missing.

In contrast, when using trace_printk with non constant string
argument(will call into __trace_printk/__trace_bprintk), then
trace option stacktrace is workable, this inconstant result
will confuses users a lot.

Link: http://lkml.kernel.org/p/51E7A7C9.9040401@huawei.com

Cc: stable@vger.kernel.org # 3.10+
Signed-off-by: zhangwei(Jovi) <jovi.zhangwei@huawei.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-15 11:04:40 -04:00
Oleg Nesterov 2448e3493c tracing: instance_rmdir() leaks ftrace_event_file->filter
instance_rmdir() path destroys the event files but forgets to free
file->filter. Change remove_event_file_dir() to free_event_filter().

Link: http://lkml.kernel.org/p/20140711190638.GA19517@redhat.com

Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Tom Zanussi <tom.zanussi@linux.intel.com>
Cc: "zhangwei(Jovi)" <jovi.zhangwei@huawei.com>
Cc: stable@vger.kernel.org # 3.11+
Fixes: f6a84bdc75 "tracing: Introduce remove_event_file_dir()"
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-14 14:41:13 -04:00
Steven Rostedt (Red Hat) ca65ef1ab6 Merge branch 'trace/ftrace/urgent' into trace/ftrace/core
Needed 099ed15167 "tracing: Remove ftrace_stop/start() from
 reading the trace file" for the removal of ftrace_start/stop().
2014-07-09 11:02:34 -04:00
Steven Rostedt (Red Hat) 099ed15167 tracing: Remove ftrace_stop/start() from reading the trace file
Disabling reading and writing to the trace file should not be able to
disable all function tracing callbacks. There's other users today
(like kprobes and perf). Reading a trace file should not stop those
from happening.

Cc: stable@vger.kernel.org # 3.0+
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-01 12:45:54 -04:00
Namhyung Kim d048a8c7b5 tracing: Add description of set_graph_notrace to tracing/README
It was missing the description of set_graph_notrace file.  Add it.

Link: http://lkml.kernel.org/p/1402590233-22321-5-git-send-email-namhyung@kernel.org

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-01 07:13:45 -04:00
Namhyung Kim 8c006cf7a2 tracing: Improve message of empty set_ftrace_notrace file
When there's no entry in set_ftrace_notrace, it'll print nothing, but
it's better to print something like below like set_graph_notrace does:

  #### no functions disabled ####

Link: http://lkml.kernel.org/p/1402644246-4649-1-git-send-email-namhyung@kernel.org

Reported-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-01 07:13:44 -04:00
Namhyung Kim 280d1429b6 tracing: Improve message of empty set_graph_notrace file
When there's no entry in set_graph_notrace, it'll print below message

  #### all functions enabled ####

While this is technically correct, it's better to print like below:

  #### no functions disabled ####

Link: http://lkml.kernel.org/p/1402590233-22321-3-git-send-email-namhyung@kernel.org

Reported-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-01 07:13:44 -04:00
Namhyung Kim 0d7d9a16ce tracing: Add ftrace_graph_notrace boot parameter
The ftrace_graph_notrace option is for specifying notrace filter for
function graph tracer at boot time.  It can be altered after boot
using set_graph_notrace file on the debugfs.

Link: http://lkml.kernel.org/p/1402590233-22321-2-git-send-email-namhyung@kernel.org

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-01 07:13:43 -04:00
Fabian Frederick 3448bac329 tracing: Convert pr_warning() to pr_warn() in trace_events.c
Convert pr_warning to standard pr_warn
Define pr_fmt(fmt) fmt to avoid any future default fmt definition

Link: http://lkml.kernel.org/p/1402141388-21144-1-git-send-email-fabf@skynet.be

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-01 07:13:42 -04:00
Namhyung Kim ef2fbe16ac ftrace: Do not copy hash if O_TRUNC is set
When a filter file is open for writing and O_TRUNC is set, there's no
need to copy and free the filter entries.

Link: http://lkml.kernel.org/p/1402474014-28655-2-git-send-email-namhyung@kernel.org

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-01 07:13:41 -04:00
Namhyung Kim 1f61be007e ftrace: Fix memory leak on failure path in ftrace_allocate_pages()
As struct ftrace_page is managed in a single linked list, it should
free from the start page.

Link: http://lkml.kernel.org/p/1402474014-28655-1-git-send-email-namhyung@kernel.org

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-01 07:13:40 -04:00
Namhyung Kim a737e6dd7b ftrace: Get rid of obsolete global_start_up variable
It seems like it's a leftover from commit 4104d326b6 ("ftrace:
Remove global function list and call function directly").  As it
isn't updated at all, checking its value is meaningless.

Let's get rid of it.

Link: http://lkml.kernel.org/p/1402584972-17824-1-git-send-email-namhyung@kernel.org

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-01 07:13:40 -04:00
Steven Rostedt (Red Hat) 7b039cb4c5 tracing: Add trace_seq_buffer_ptr() helper function
There's several locations in the kernel that open code the calculation
of the next location in the trace_seq buffer. This is usually done with

  p->buffer + p->len

Instead of having this open coded, supply a helper function in the
header to do it for them. This function is called trace_seq_buffer_ptr().

Link: http://lkml.kernel.org/p/20140626220129.452783019@goodmis.org

Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-01 07:13:39 -04:00
Fabian Frederick 3f4d8f78a0 tracing: Remove unnecessary null test before debugfs_remove()
This fixes checkpatch warning:

"WARNING: debugfs_remove(NULL) is safe this check is probably not required"
Link: http://lkml.kernel.org/p/1403802871-8599-1-git-send-email-fabf@skynet.be

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-01 07:13:38 -04:00
Steven Rostedt (Red Hat) 9096032fbc tracing: Remove trace_seq_reserve()
trace_seq_reserve() has no users in the kernel, it just wastes space.
Remove it.

Cc: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-01 07:13:37 -04:00
Steven Rostedt (Red Hat) 6d2289f3fa tracing: Make trace_seq_putmem_hex() more robust
Currently trace_seq_putmem_hex() can only take as a parameter a pointer
to something that is 8 bytes or less, otherwise it will overflow the
buffer. This is protected by a macro that encompasses the call to
trace_seq_putmem_hex() that has a BUILD_BUG_ON() for the variable before
it is passed in. This is not very robust and if trace_seq_putmem_hex() ever
gets used outside that macro it will cause issues.

Instead of only being able to produce a hex output of memory that is for
a single word, change it to be more robust and allow any size input.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-01 07:13:37 -04:00
Steven Rostedt (Red Hat) 36aabfff50 tracing: Clean up trace_seq.c
For using trace_seq_*() functions in NMI context, I posted a patch to move
it to the lib/ directory. This caused Andrew Morton to take a look at the code.
He went through and gave a lot of comments about missing kernel doc,
inconsistent types for the save variable, mix match of EXPORT_SYMBOL_GPL()
and EXPORT_SYMBOL() as well as missing EXPORT_SYMBOL*()s. There were
a few comments about the way variables were being compared (int vs uint).

All these were good review comments and should be implemented regardless of
if trace_seq.c should be moved to lib/ or not.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-01 07:13:36 -04:00
Steven Rostedt (Red Hat) 12306276fa tracing: Move the trace_seq_* functions into its own trace_seq.c file
The trace_seq_*() functions are a nice utility that allows users to manipulate
buffers with printf() like formats. It has its own trace_seq.h header in
include/linux and should be in its own file. Being tied with trace_output.c
is rather awkward.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-01 07:13:35 -04:00
Masami Hiramatsu 5c27c775d5 ftrace: Simplify ftrace_hash_disable/enable path in ftrace_hash_move
Simplify ftrace_hash_disable/enable path in ftrace_hash_move
for hardening the process if the memory allocation failed.

Link: http://lkml.kernel.org/p/20140617110442.15167.81076.stgit@kbuild-fedora.novalocal

Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-01 07:13:34 -04:00
Steven Rostedt (Red Hat) 9674b2fada ftrace: Add trampolines to enabled_functions debug file
The enabled_functions is used to help debug the dynamic function tracing.
Adding what trampolines are attached to files is useful for debugging.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-01 07:13:32 -04:00
Steven Rostedt (Red Hat) 79922b8009 ftrace: Optimize function graph to be called directly
Function graph tracing is a bit different than the function tracers, as
it is processed after either the ftrace_caller or ftrace_regs_caller
and we only have one place to modify the jump to ftrace_graph_caller,
the jump needs to happen after the restore of registeres.

The function graph tracer is dependent on the function tracer, where
even if the function graph tracing is going on by itself, the save and
restore of registers is still done for function tracing regardless of
if function tracing is happening, before it calls the function graph
code.

If there's no function tracing happening, it is possible to just call
the function graph tracer directly, and avoid the wasted effort to save
and restore regs for function tracing.

This requires adding new flags to the dyn_ftrace records:

  FTRACE_FL_TRAMP
  FTRACE_FL_TRAMP_EN

The first is set if the count for the record is one, and the ftrace_ops
associated to that record has its own trampoline. That way the mcount code
can call that trampoline directly.

In the future, trampolines can be added to arbitrary ftrace_ops, where you
can have two or more ftrace_ops registered to ftrace (like kprobes and perf)
and if they are not tracing the same functions, then instead of doing a
loop to check all registered ftrace_ops against their hashes, just call the
ftrace_ops trampoline directly, which would call the registered ftrace_ops
function directly.

Without this patch perf showed:

  0.05%  hackbench  [kernel.kallsyms]  [k] ftrace_caller
  0.05%  hackbench  [kernel.kallsyms]  [k] arch_local_irq_save
  0.05%  hackbench  [kernel.kallsyms]  [k] native_sched_clock
  0.04%  hackbench  [kernel.kallsyms]  [k] __buffer_unlock_commit
  0.04%  hackbench  [kernel.kallsyms]  [k] preempt_trace
  0.04%  hackbench  [kernel.kallsyms]  [k] prepare_ftrace_return
  0.04%  hackbench  [kernel.kallsyms]  [k] __this_cpu_preempt_check
  0.04%  hackbench  [kernel.kallsyms]  [k] ftrace_graph_caller

See that the ftrace_caller took up more time than the ftrace_graph_caller
did.

With this patch:

  0.05%  hackbench  [kernel.kallsyms]  [k] __buffer_unlock_commit
  0.04%  hackbench  [kernel.kallsyms]  [k] call_filter_check_discard
  0.04%  hackbench  [kernel.kallsyms]  [k] ftrace_graph_caller
  0.04%  hackbench  [kernel.kallsyms]  [k] sched_clock

The ftrace_caller is no where to be found and ftrace_graph_caller still
takes up the same percentage.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-01 07:13:31 -04:00
Oleg Nesterov fb6bab6a5a tracing/uprobes: Fix the usage of uprobe_buffer_enable() in probe_event_enable()
The usage of uprobe_buffer_enable() added by dcad1a20 is very wrong,

1. uprobe_buffer_enable() and uprobe_buffer_disable() are not balanced,
   _enable() should be called only if !enabled.

2. If uprobe_buffer_enable() fails probe_event_enable() should clear
   tp.flags and free event_file_link.

3. If uprobe_register() fails it should do uprobe_buffer_disable().

Link: http://lkml.kernel.org/p/20140627170146.GA18332@redhat.com

Acked-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Fixes: dcad1a204f "tracing/uprobes: Fetch args before reserving a ring buffer"
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-06-30 13:22:33 -04:00
Oleg Nesterov f786106e80 tracing/uprobes: Kill the bogus UPROBE_HANDLER_REMOVE code in uprobe_dispatcher()
I do not know why dd9fa555d7 "tracing/uprobes: Move argument fetching
to uprobe_dispatcher()" added the UPROBE_HANDLER_REMOVE, but it looks
wrong.

OK, perhaps it makes sense to avoid store_trace_args() if the tracee is
nacked by uprobe_perf_filter(). But then we should kill the same code
in uprobe_perf_func() and unify the TRACE/PROFILE filtering (we need to
do this anyway to mix perf/ftrace). Until then this code actually adds
the pessimization because uprobe_perf_filter() will be called twice and
return T in likely case.

Link: http://lkml.kernel.org/p/20140627170143.GA18329@redhat.com

Acked-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-06-30 13:22:23 -04:00
Oleg Nesterov 4821254206 tracing/uprobes: Revert "Support mix of ftrace and perf"
This reverts commit 43fe98913c.

This patch is very wrong. Firstly, this change leads to unbalanced
uprobe_unregister(). Just for example,

	# perf probe -x /lib/libc.so.6 syscall
	# echo 1 >> /sys/kernel/debug/tracing/events/probe_libc/enable
	# perf record -e probe_libc:syscall whatever

after that uprobe is dead (unregistered) but the user of ftrace/perf
can't know this, and it looks as if nobody hits this probe.

This would be easy to fix, but there are other reasons why it is not
simple to mix ftrace and perf. If nothing else, they can't share the
same ->consumer.filter. This is fixable too, but probably we need to
fix the poorly designed uprobe_register() interface first. At least
"register" and "apply" should be clearly separated.

Link: http://lkml.kernel.org/p/20140627170136.GA18319@redhat.com

Cc: Tom Zanussi <tom.zanussi@linux.intel.com>
Cc: "zhangwei(Jovi)" <jovi.zhangwei@huawei.com>
Cc: stable@vger.kernel.org # v3.14
Acked-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-06-30 13:21:58 -04:00
Steven Rostedt (Red Hat) 0376bde11b ftrace: Add ftrace_rec_counter() macro to simplify the code
The ftrace dynamic record has a flags element that also has a counter.
Instead of hard coding "rec->flags & ~FTRACE_FL_MASK" all over the
place. Use a macro instead.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-06-30 10:09:56 -04:00
Steven Rostedt (Red Hat) 4fbb48cb11 ftrace: Allow no regs if no more callbacks require it
When registering a function callback for the function tracer, the ops
can specify if it wants to save full regs (like an interrupt would)
for each function that it traces, or if it does not care about regs
and just wants to have the fastest return possible.

Once a ops has registered a function, if other ops register that
function they all will receive the regs too. That's because it does
the work once, it does it for everyone.

Now if the ops wanting regs unregisters the function so that there's
only ops left that do not care about regs, those ops will still
continue getting regs and going through the work for it on that
function. This is because the disabling of the rec counter only
sees the ops registered, and does not see the ops that are still
attached, and does not know if the current ops that are still attached
want regs or not. To play it safe, it just keeps regs being processed
until no function is registered anymore.

Instead of doing that, check the ops that are still registered for that
function and if none want regs for it anymore, then disable the
processing of regs.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-06-30 10:09:53 -04:00
Linus Torvalds 8841c8b3c4 One bug fix that goes back to 3.10. Accessing a non existent buffer
if "possible cpus" is greater than actual CPUs (including offline CPUs).
 
 Namhyung Kim did some reviews of the patches I sent this merge window and
 found a memory leak and had a few clean ups.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJTmcYfAAoJEKQekfcNnQGusVIH/2wvfh1D4Mu+qCUsG7HqMLWM
 wHhTHcJsiE5rBpfcvc+XoLLBMGn9IKCeClGG59KYJGzznbJHHLmk1dy4qdSqNAel
 POTEtGh+AUX0wFBZtVLl2AesFZRsbtaxoNqD0ZBLhkHCV9FBEKbsm8uDCtJIf8wR
 vDDz27nPXkiFWX+wM9Z+tFSL7rohxvwREKlQjfk5Z9plyUcURE8GDbrWl870ZSJX
 uYzSYzioCyYdy/cpasDuTX22/I+nNUxA4dWTeyciigYC+6IqLJRIwqEXJ4RjYmfm
 v9+hf6KkqF8CY6mDjObLkKmy0gubPBQ8Op1G/3ChluUrjkwdffgb6oiLb352yHY=
 =XD1A
 -----END PGP SIGNATURE-----

Merge tag 'trace-3.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing cleanups and bugfixes from Steven Rostedt:
 "One bug fix that goes back to 3.10.  Accessing a non existent buffer
  if "possible cpus" is greater than actual CPUs (including offline
  CPUs).

  Namhyung Kim did some reviews of the patches I sent this merge window
  and found a memory leak and had a few clean ups"

* tag 'trace-3.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Fix check of ftrace_trace_arrays list_empty() check
  tracing: Fix leak of per cpu max data in instances
  tracing: Cleanup saved_cmdlines_size changes
  ring-buffer: Check if buffer exists before polling
2014-06-12 21:07:25 -07:00
Linus Torvalds 3737a12761 Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull more perf updates from Ingo Molnar:
 "A second round of perf updates:

   - wide reaching kprobes sanitization and robustization, with the hope
     of fixing all 'probe this function crashes the kernel' bugs, by
     Masami Hiramatsu.

   - uprobes updates from Oleg Nesterov: tmpfs support, corner case
     fixes and robustization work.

   - perf tooling updates and fixes from Jiri Olsa, Namhyung Ki, Arnaldo
     et al:
        * Add support to accumulate hist periods (Namhyung Kim)
        * various fixes, refactorings and enhancements"

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (101 commits)
  perf: Differentiate exec() and non-exec() comm events
  perf: Fix perf_event_comm() vs. exec() assumption
  uprobes/x86: Rename arch_uprobe->def to ->defparam, minor comment updates
  perf/documentation: Add description for conditional branch filter
  perf/x86: Add conditional branch filtering support
  perf/tool: Add conditional branch filter 'cond' to perf record
  perf: Add new conditional branch filter 'PERF_SAMPLE_BRANCH_COND'
  uprobes: Teach copy_insn() to support tmpfs
  uprobes: Shift ->readpage check from __copy_insn() to uprobe_register()
  perf/x86: Use common PMU interrupt disabled code
  perf/ARM: Use common PMU interrupt disabled code
  perf: Disable sampled events if no PMU interrupt
  perf: Fix use after free in perf_remove_from_context()
  perf tools: Fix 'make help' message error
  perf record: Fix poll return value propagation
  perf tools: Move elide bool into perf_hpp_fmt struct
  perf tools: Remove elide setup for SORT_MODE__MEMORY mode
  perf tools: Fix "==" into "=" in ui_browser__warning assignment
  perf tools: Allow overriding sysfs and proc finding with env var
  perf tools: Consider header files outside perf directory in tags target
  ...
2014-06-12 19:18:49 -07:00
Steven Rostedt (Red Hat) da9c3413a2 tracing: Fix check of ftrace_trace_arrays list_empty() check
The check that tests if ftrace_trace_arrays is empty in
top_trace_array(), uses the .prev pointer:

  if (list_empty(ftrace_trace_arrays.prev))

instead of testing the variable itself:

  if (list_empty(&ftrace_trace_arrays))

Although it is technically correct, it is awkward and confusing.
Use the proper method.

Link: http://lkml.kernel.org/r/87oay1bas8.fsf@sejong.aot.lge.com

Reported-by: Namhyung Kim <namhyung@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-06-10 13:53:50 -04:00
Steven Rostedt (Red Hat) f0b70cc48c tracing: Fix leak of per cpu max data in instances
The freeing of an instance, if max data is configured, there will be
per cpu data structures created. But these are not freed when the instance
is deleted, which causes a memory leak.

A new helper function is added that frees the individual buffers within a
trace array, instead of duplicating the code. This way changes made for one
are applied to the other (normal buffer vs max buffer).

Link: http://lkml.kernel.org/r/87k38pbake.fsf@sejong.aot.lge.com

Reported-by: Namhyung Kim <namhyung@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-06-10 12:06:30 -04:00
Namhyung Kim a6af8fbf17 tracing: Cleanup saved_cmdlines_size changes
The recent addition of saved_cmdlines_size file had some remaining
(minor - mostly coding style) issues.  Fix them by passing pointer
name to sizeof() and using scnprintf().

Link: http://lkml.kernel.org/p/1402384295-23680-1-git-send-email-namhyung@kernel.org

Cc: Namhyung Kim <namhyung.kim@lge.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Yoshihiro YUNOMAE <yoshihiro.yunomae.ez@hitachi.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-06-10 09:51:10 -04:00
Steven Rostedt (Red Hat) 8b8b36834d ring-buffer: Check if buffer exists before polling
The per_cpu buffers are created one per possible CPU. But these do
not mean that those CPUs are online, nor do they even exist.

With the addition of the ring buffer polling, it assumes that the
caller polls on an existing buffer. But this is not the case if
the user reads trace_pipe from a CPU that does not exist, and this
causes the kernel to crash.

Simple fix is to check the cpu against buffer bitmask against to see
if the buffer was allocated or not and return -ENODEV if it is
not.

More updates were done to pass the -ENODEV back up to userspace.

Link: http://lkml.kernel.org/r/5393DB61.6060707@oracle.com

Reported-by: Sasha Levin <sasha.levin@oracle.com>
Cc: stable@vger.kernel.org # 3.10+
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-06-10 09:46:00 -04:00
Linus Torvalds 214b931320 Lots of tweaks, small fixes, optimizations, and some helper functions
to help out the rest of the kernel to ease their use of trace events.
 
 The big change for this release is the allowing of other tracers,
 such as the latency tracers, to be used in the trace instances and allow
 for function or function graph tracing to be in the top level
 simultaneously.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJTlbUqAAoJEKQekfcNnQGuP+8H+wTBG06beHsqe6XcaeXcKNkt
 Mimm0O04oQdw89CBWeJvXyOwRTtiN4M/4hxHXBTDtChxM9oUyWw1o0IpSMMuQ16O
 w9r3DfC8e1air+ufEuYWM0QNtyzHi8EfDSNia55ON5jvtkCZTXOEKZD+n8M9w28p
 I7PVgr0PDztsCpethCpg0M8beK9zuQPWMzsHAQCsKI06Xl5z33kPIJR15Exh+Kr1
 uVVTZW7JFVAPuSnteLSIx9pN6OjsVGzOZCljg+O+9/v/02u5nkMiS2nURxae86kg
 RTSiRYT6Hvl/MCBhdss/w5kgSk6BYiZ0hXbLtwetvre+vQrOR5CnDw2DxZ7e+gU=
 =oudH
 -----END PGP SIGNATURE-----

Merge tag 'trace-3.16' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing updates from Steven Rostedt:
 "Lots of tweaks, small fixes, optimizations, and some helper functions
  to help out the rest of the kernel to ease their use of trace events.

  The big change for this release is the allowing of other tracers, such
  as the latency tracers, to be used in the trace instances and allow
  for function or function graph tracing to be in the top level
  simultaneously"

* tag 'trace-3.16' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (44 commits)
  tracing: Fix memory leak on instance deletion
  tracing: Fix leak of ring buffer data when new instances creation fails
  tracing/kprobes: Avoid self tests if tracing is disabled on boot up
  tracing: Return error if ftrace_trace_arrays list is empty
  tracing: Only calculate stats of tracepoint benchmarks for 2^32 times
  tracing: Convert stddev into u64 in tracepoint benchmark
  tracing: Introduce saved_cmdlines_size file
  tracing: Add __get_dynamic_array_len() macro for trace events
  tracing: Remove unused variable in trace_benchmark
  tracing: Eliminate double free on failure of allocation on boot up
  ftrace/x86: Call text_ip_addr() instead of the duplicated code
  tracing: Print max callstack on stacktrace bug
  tracing: Move locking of trace_cmdline_lock into start/stop seq calls
  tracing: Try again for saved cmdline if failed due to locking
  tracing: Have saved_cmdlines use the seq_read infrastructure
  tracing: Add tracepoint benchmark tracepoint
  tracing: Print nasty banner when trace_printk() is in use
  tracing: Add funcgraph_tail option to print function name after closing braces
  tracing: Eliminate duplicate TRACE_GRAPH_PRINT_xx defines
  tracing: Add __bitmask() macro to trace events to cpumasks and other bitmasks
  ...
2014-06-09 16:39:15 -07:00
Steven Rostedt (Red Hat) a9fcaaac37 tracing: Fix memory leak on instance deletion
When an instance is created, it also gets a snapshot ring buffer
allocated (with minimum of pages). But when it is deleted the snapshot
buffer is not. There was a helper function added to match the allocation
of these ring buffers to a way to free them, but it wasn't used by
the deletion of an instance. Using that helper function solves this
memory leak.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-06-06 23:17:28 -04:00
Steven Rostedt (Red Hat) 23aaa3c18e tracing: Fix leak of ring buffer data when new instances creation fails
Yoshihiro Yunomae reported that the ring buffer data for a trace
instance does not get properly cleaned up when it fails. He proposed
a patch that manually cleaned the data up and addad a bunch of labels.
The labels are not needed because all trace array is allocated with
a kzalloc which initializes it to 0 and all kfree()s can take a NULL
pointer and will ignore it.

Adding a new helper function free_trace_buffers() that can also take
null buffers to free the buffers that were allocated by
allocate_trace_buffers().

Link: http://lkml.kernel.org/r/20140605223522.32311.31664.stgit@yunodevel

Reported-by: Yoshihiro YUNOMAE <yoshihiro.yunomae.ez@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-06-06 04:53:40 -04:00
Yoshihiro YUNOMAE 748ec3a20e tracing/kprobes: Avoid self tests if tracing is disabled on boot up
If tracing is disabled on boot up, the kernel should not execute tracing
self tests. The kernel should check whether tracing is disabled or not
before executing any of the tracing self tests.

Link: http://lkml.kernel.org/p/20140605223520.32311.56097.stgit@yunodevel

Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Yoshihiro YUNOMAE <yoshihiro.yunomae.ez@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-06-06 04:53:39 -04:00
Yoshihiro YUNOMAE dc81e5e3ab tracing: Return error if ftrace_trace_arrays list is empty
ftrace_trace_arrays links global_trace.list. However, global_trace
is not added to ftrace_trace_arrays if trace_alloc_buffers() failed.
As the result, ftrace_trace_arrays becomes an empty list. If
ftrace_trace_arrays is an empty list, current top_trace_array() returns
an invalid pointer. As the result, the kernel can induce memory corruption
or panic.

Current implementation does not check whether ftrace_trace_arrays is empty
list or not. So, in this patch, if ftrace_trace_arrays is empty list,
top_trace_array() returns NULL. Moreover, this patch makes all functions
calling top_trace_array() handle it appropriately.

Link: http://lkml.kernel.org/p/20140605223517.32311.99233.stgit@yunodevel

Signed-off-by: Yoshihiro YUNOMAE <yoshihiro.yunomae.ez@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-06-06 04:47:46 -04:00
Steven Rostedt (Red Hat) 34839f5a69 tracing: Only calculate stats of tracepoint benchmarks for 2^32 times
When calculating the average and standard deviation, it is required that
the count be less than UINT_MAX, otherwise the do_div() will get
undefined results. After 2^32 counts of data, the average and standard
deviation should pretty much be set anyway.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-06-06 00:41:38 -04:00
Steven Rostedt (Red Hat) 72e2fe38ea tracing: Convert stddev into u64 in tracepoint benchmark
I've been told that do_div() expects an unsigned 64 bit number, and
is undefined if a signed is used. This gave a warning on the MIPS
build. I'm not sure if a signed 64 bit dividend is really an issue
or not, but the calculation this is used for is standard deviation,
and that isn't going to be negative. We can just convert it to
unsigned and be safe.

Reported-by: David Daney <ddaney.cavm@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-06-05 20:35:30 -04:00
Yoshihiro YUNOMAE 939c7a4f04 tracing: Introduce saved_cmdlines_size file
Introduce saved_cmdlines_size file for changing the number of saved pid-comms.
saved_cmdlines currently stores 128 command names using SAVED_CMDLINES, but
'no-existing processes' names are often lost in saved_cmdlines when we
read the trace data. So, by introducing saved_cmdlines_size file, we can
now change the 128 command names saved to something much larger if needed.

When we write a value to saved_cmdlines_size, the number of the value will
be stored in pid-comm list:

	# echo 1024 > /sys/kernel/debug/tracing/saved_cmdlines_size

Here, 1024 command names can be stored. The default number is 128 and the maximum
number is PID_MAX_DEFAULT (=32768 if CONFIG_BASE_SMALL is not set). So, if we
want to avoid losing any command names, we need to set 32768 to
saved_cmdlines_size.

We can read the maximum number of the list:

	# cat /sys/kernel/debug/tracing/saved_cmdlines_size
	128

Link: http://lkml.kernel.org/p/20140605012427.22115.16173.stgit@yunodevel

Signed-off-by: Yoshihiro YUNOMAE <yoshihiro.yunomae.ez@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-06-05 12:35:49 -04:00
Ingo Molnar 10b0256496 Merge branch 'perf/kprobes' into perf/core
Conflicts:
	arch/x86/kernel/traps.c

The kprobes enhancements are fully cooked, ship them upstream.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-06-05 12:26:50 +02:00
Ingo Molnar c56d34064b Merge branch 'perf/uprobes' into perf/core
These bits from Oleg are fully cooked, ship them to Linus.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-06-05 12:26:27 +02:00
Steven Rostedt (Red Hat) 195d280201 tracing: Remove unused variable in trace_benchmark
Somehow this unused variable warning sneaked past my warnings check
(probably due to it depending on a new config).

kernel/trace/trace_benchmark.c: In function 'trace_do_benchmark':
kernel/trace/trace_benchmark.c:38:6: warning: unused variable 'seedsq' [-Wunused-variable]
  u64 seedsq;
      ^

Link: http://lkml.kernel.org/r/20140604160921.4f4e69c4@canb.auug.org.au

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-06-04 13:36:29 -04:00
Yoshihiro YUNOMAE 198376cd88 tracing: Eliminate double free on failure of allocation on boot up
If allocation of the max_buffer fails on boot up, the error path will
free both per_cpu data structures from the buffers. With the new redesign
of the code, those structures are freed if allocations failed. That is,
the helper function that allocates the buffers will free the per cpu data
on failure. No need to do it again. In fact, the second free will cause
a bug as the code can not handle a double free.

Link: http://lkml.kernel.org/p/20140603042803.27308.30956.stgit@yunodevel

Signed-off-by: Yoshihiro YUNOMAE <yoshihiro.yunomae.ez@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-06-03 19:58:31 -04:00
Minchan Kim e317218194 tracing: Print max callstack on stacktrace bug
While I played with my own feature(ex, something on the way to reclaim),
the kernel would easily oops. I guessed that the reason had to do with
stack overflow and wanted to prove it.

I discovered the stack tracer which proved to be very useful for me but
the kernel would oops before my user program gather the information via
"watch cat /sys/kernel/debug/tracing/stack_trace" so I couldn't get any
message from that. What I needed was to have the stack tracer emit the
kernel stack usage before it does the oops so I could find what was
hogging the stack.

This patch shows the callstack of max stack usage right before an oops so
we can find a culprit.

So, the result is as follows.

[ 1116.522206] init: lightdm main process (1246) terminated with status 1
[ 1119.922916] init: failsafe-x main process (1272) terminated with status 1
[ 3887.728131] kworker/u24:1 (6637) used greatest stack depth: 256 bytes left
[ 6397.629227] cc1 (9554) used greatest stack depth: 128 bytes left
[ 7174.467392]         Depth    Size   Location    (47 entries)
[ 7174.467392]         -----    ----   --------
[ 7174.467785]   0)     7248     256   get_page_from_freelist+0xa7/0x920
[ 7174.468506]   1)     6992     352   __alloc_pages_nodemask+0x1cd/0xb20
[ 7174.469224]   2)     6640       8   alloc_pages_current+0x10f/0x1f0
[ 7174.469413]   3)     6632     168   new_slab+0x2c5/0x370
[ 7174.469413]   4)     6464       8   __slab_alloc+0x3a9/0x501
[ 7174.469413]   5)     6456      80   __kmalloc+0x1cb/0x200
[ 7174.469413]   6)     6376     376   vring_add_indirect+0x36/0x200
[ 7174.469413]   7)     6000     144   virtqueue_add_sgs+0x2e2/0x320
[ 7174.469413]   8)     5856     288   __virtblk_add_req+0xda/0x1b0
[ 7174.469413]   9)     5568      96   virtio_queue_rq+0xd3/0x1d0
[ 7174.469413]  10)     5472     128   __blk_mq_run_hw_queue+0x1ef/0x440
[ 7174.469413]  11)     5344      16   blk_mq_run_hw_queue+0x35/0x40
[ 7174.469413]  12)     5328      96   blk_mq_insert_requests+0xdb/0x160
[ 7174.469413]  13)     5232     112   blk_mq_flush_plug_list+0x12b/0x140
[ 7174.469413]  14)     5120     112   blk_flush_plug_list+0xc7/0x220
[ 7174.469413]  15)     5008      64   io_schedule_timeout+0x88/0x100
[ 7174.469413]  16)     4944     128   mempool_alloc+0x145/0x170
[ 7174.469413]  17)     4816      96   bio_alloc_bioset+0x10b/0x1d0
[ 7174.469413]  18)     4720      48   get_swap_bio+0x30/0x90
[ 7174.469413]  19)     4672     160   __swap_writepage+0x150/0x230
[ 7174.469413]  20)     4512      32   swap_writepage+0x42/0x90
[ 7174.469413]  21)     4480     320   shrink_page_list+0x676/0xa80
[ 7174.469413]  22)     4160     208   shrink_inactive_list+0x262/0x4e0
[ 7174.469413]  23)     3952     304   shrink_lruvec+0x3e1/0x6a0
[ 7174.469413]  24)     3648      80   shrink_zone+0x3f/0x110
[ 7174.469413]  25)     3568     128   do_try_to_free_pages+0x156/0x4c0
[ 7174.469413]  26)     3440     208   try_to_free_pages+0xf7/0x1e0
[ 7174.469413]  27)     3232     352   __alloc_pages_nodemask+0x783/0xb20
[ 7174.469413]  28)     2880       8   alloc_pages_current+0x10f/0x1f0
[ 7174.469413]  29)     2872     200   __page_cache_alloc+0x13f/0x160
[ 7174.469413]  30)     2672      80   find_or_create_page+0x4c/0xb0
[ 7174.469413]  31)     2592      80   ext4_mb_load_buddy+0x1e9/0x370
[ 7174.469413]  32)     2512     176   ext4_mb_regular_allocator+0x1b7/0x460
[ 7174.469413]  33)     2336     128   ext4_mb_new_blocks+0x458/0x5f0
[ 7174.469413]  34)     2208     256   ext4_ext_map_blocks+0x70b/0x1010
[ 7174.469413]  35)     1952     160   ext4_map_blocks+0x325/0x530
[ 7174.469413]  36)     1792     384   ext4_writepages+0x6d1/0xce0
[ 7174.469413]  37)     1408      16   do_writepages+0x23/0x40
[ 7174.469413]  38)     1392      96   __writeback_single_inode+0x45/0x2e0
[ 7174.469413]  39)     1296     176   writeback_sb_inodes+0x2ad/0x500
[ 7174.469413]  40)     1120      80   __writeback_inodes_wb+0x9e/0xd0
[ 7174.469413]  41)     1040     160   wb_writeback+0x29b/0x350
[ 7174.469413]  42)      880     208   bdi_writeback_workfn+0x11c/0x480
[ 7174.469413]  43)      672     144   process_one_work+0x1d2/0x570
[ 7174.469413]  44)      528     112   worker_thread+0x116/0x370
[ 7174.469413]  45)      416     240   kthread+0xf3/0x110
[ 7174.469413]  46)      176     176   ret_from_fork+0x7c/0xb0
[ 7174.469413] ------------[ cut here ]------------
[ 7174.469413] kernel BUG at kernel/trace/trace_stack.c:174!
[ 7174.469413] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
[ 7174.469413] Dumping ftrace buffer:
[ 7174.469413]    (ftrace buffer empty)
[ 7174.469413] Modules linked in:
[ 7174.469413] CPU: 0 PID: 440 Comm: kworker/u24:0 Not tainted 3.14.0+ #212
[ 7174.469413] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[ 7174.469413] Workqueue: writeback bdi_writeback_workfn (flush-253:0)
[ 7174.469413] task: ffff880034170000 ti: ffff880029518000 task.ti: ffff880029518000
[ 7174.469413] RIP: 0010:[<ffffffff8112336e>]  [<ffffffff8112336e>] stack_trace_call+0x2de/0x340
[ 7174.469413] RSP: 0000:ffff880029518290  EFLAGS: 00010046
[ 7174.469413] RAX: 0000000000000030 RBX: 000000000000002f RCX: 0000000000000000
[ 7174.469413] RDX: 0000000000000000 RSI: 000000000000002f RDI: ffffffff810b7159
[ 7174.469413] RBP: ffff8800295182f0 R08: ffffffffffffffff R09: 0000000000000000
[ 7174.469413] R10: 0000000000000001 R11: 0000000000000001 R12: ffffffff82768dfc
[ 7174.469413] R13: 000000000000f2e8 R14: ffff8800295182b8 R15: 00000000000000f8
[ 7174.469413] FS:  0000000000000000(0000) GS:ffff880037c00000(0000) knlGS:0000000000000000
[ 7174.469413] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 7174.469413] CR2: 00002acd0b994000 CR3: 0000000001c0b000 CR4: 00000000000006f0
[ 7174.469413] Stack:
[ 7174.469413]  0000000000000000 ffffffff8114fdb7 0000000000000087 0000000000001c50
[ 7174.469413]  0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 7174.469413]  0000000000000002 ffff880034170000 ffff880034171028 0000000000000000
[ 7174.469413] Call Trace:
[ 7174.469413]  [<ffffffff8114fdb7>] ? get_page_from_freelist+0xa7/0x920
[ 7174.469413]  [<ffffffff816eee3f>] ftrace_call+0x5/0x2f
[ 7174.469413]  [<ffffffff81165065>] ? next_zones_zonelist+0x5/0x70
[ 7174.469413]  [<ffffffff810a23fa>] ? __bfs+0x11a/0x270
[ 7174.469413]  [<ffffffff81165065>] ? next_zones_zonelist+0x5/0x70
[ 7174.469413]  [<ffffffff8114fdb7>] ? get_page_from_freelist+0xa7/0x920
[ 7174.469413]  [<ffffffff8119092f>] ? alloc_pages_current+0x10f/0x1f0
[ 7174.469413]  [<ffffffff811507fd>] __alloc_pages_nodemask+0x1cd/0xb20
[ 7174.469413]  [<ffffffff810a4de6>] ? check_irq_usage+0x96/0xe0
[ 7174.469413]  [<ffffffff816eee3f>] ? ftrace_call+0x5/0x2f
[ 7174.469413]  [<ffffffff8119092f>] alloc_pages_current+0x10f/0x1f0
[ 7174.469413]  [<ffffffff81199cd5>] ? new_slab+0x2c5/0x370
[ 7174.469413]  [<ffffffff81199cd5>] new_slab+0x2c5/0x370
[ 7174.469413]  [<ffffffff816eee3f>] ? ftrace_call+0x5/0x2f
[ 7174.469413]  [<ffffffff816db002>] __slab_alloc+0x3a9/0x501
[ 7174.469413]  [<ffffffff8119af8b>] ? __kmalloc+0x1cb/0x200
[ 7174.469413]  [<ffffffff8141dc46>] ? vring_add_indirect+0x36/0x200
[ 7174.469413]  [<ffffffff8141dc46>] ? vring_add_indirect+0x36/0x200
[ 7174.469413]  [<ffffffff8141dc46>] ? vring_add_indirect+0x36/0x200
[ 7174.469413]  [<ffffffff8119af8b>] __kmalloc+0x1cb/0x200
[ 7174.469413]  [<ffffffff8141de10>] ? vring_add_indirect+0x200/0x200
[ 7174.469413]  [<ffffffff8141dc46>] vring_add_indirect+0x36/0x200
[ 7174.469413]  [<ffffffff8141e402>] virtqueue_add_sgs+0x2e2/0x320
[ 7174.469413]  [<ffffffff8148e35a>] __virtblk_add_req+0xda/0x1b0
[ 7174.469413]  [<ffffffff8148e503>] virtio_queue_rq+0xd3/0x1d0
[ 7174.469413]  [<ffffffff8134aa0f>] __blk_mq_run_hw_queue+0x1ef/0x440
[ 7174.469413]  [<ffffffff8134b0d5>] blk_mq_run_hw_queue+0x35/0x40
[ 7174.469413]  [<ffffffff8134b7bb>] blk_mq_insert_requests+0xdb/0x160
[ 7174.469413]  [<ffffffff8134be5b>] blk_mq_flush_plug_list+0x12b/0x140
[ 7174.469413]  [<ffffffff81342237>] blk_flush_plug_list+0xc7/0x220
[ 7174.469413]  [<ffffffff816e60ef>] ? _raw_spin_unlock_irqrestore+0x3f/0x70
[ 7174.469413]  [<ffffffff816e16e8>] io_schedule_timeout+0x88/0x100
[ 7174.469413]  [<ffffffff816e1665>] ? io_schedule_timeout+0x5/0x100
[ 7174.469413]  [<ffffffff81149415>] mempool_alloc+0x145/0x170
[ 7174.469413]  [<ffffffff8109baf0>] ? __init_waitqueue_head+0x60/0x60
[ 7174.469413]  [<ffffffff811e246b>] bio_alloc_bioset+0x10b/0x1d0
[ 7174.469413]  [<ffffffff81184230>] ? end_swap_bio_read+0xc0/0xc0
[ 7174.469413]  [<ffffffff81184230>] ? end_swap_bio_read+0xc0/0xc0
[ 7174.469413]  [<ffffffff81184110>] get_swap_bio+0x30/0x90
[ 7174.469413]  [<ffffffff81184230>] ? end_swap_bio_read+0xc0/0xc0
[ 7174.469413]  [<ffffffff81184660>] __swap_writepage+0x150/0x230
[ 7174.469413]  [<ffffffff810ab405>] ? do_raw_spin_unlock+0x5/0xa0
[ 7174.469413]  [<ffffffff81184230>] ? end_swap_bio_read+0xc0/0xc0
[ 7174.469413]  [<ffffffff81184515>] ? __swap_writepage+0x5/0x230
[ 7174.469413]  [<ffffffff81184782>] swap_writepage+0x42/0x90
[ 7174.469413]  [<ffffffff8115ae96>] shrink_page_list+0x676/0xa80
[ 7174.469413]  [<ffffffff816eee3f>] ? ftrace_call+0x5/0x2f
[ 7174.469413]  [<ffffffff8115b872>] shrink_inactive_list+0x262/0x4e0
[ 7174.469413]  [<ffffffff8115c1c1>] shrink_lruvec+0x3e1/0x6a0
[ 7174.469413]  [<ffffffff8115c4bf>] shrink_zone+0x3f/0x110
[ 7174.469413]  [<ffffffff816eee3f>] ? ftrace_call+0x5/0x2f
[ 7174.469413]  [<ffffffff8115c9e6>] do_try_to_free_pages+0x156/0x4c0
[ 7174.469413]  [<ffffffff8115cf47>] try_to_free_pages+0xf7/0x1e0
[ 7174.469413]  [<ffffffff81150db3>] __alloc_pages_nodemask+0x783/0xb20
[ 7174.469413]  [<ffffffff8119092f>] alloc_pages_current+0x10f/0x1f0
[ 7174.469413]  [<ffffffff81145c0f>] ? __page_cache_alloc+0x13f/0x160
[ 7174.469413]  [<ffffffff81145c0f>] __page_cache_alloc+0x13f/0x160
[ 7174.469413]  [<ffffffff81146c6c>] find_or_create_page+0x4c/0xb0
[ 7174.469413]  [<ffffffff811463e5>] ? find_get_page+0x5/0x130
[ 7174.469413]  [<ffffffff812837b9>] ext4_mb_load_buddy+0x1e9/0x370
[ 7174.469413]  [<ffffffff81284c07>] ext4_mb_regular_allocator+0x1b7/0x460
[ 7174.469413]  [<ffffffff81281070>] ? ext4_mb_use_preallocated+0x40/0x360
[ 7174.469413]  [<ffffffff816eee3f>] ? ftrace_call+0x5/0x2f
[ 7174.469413]  [<ffffffff81287eb8>] ext4_mb_new_blocks+0x458/0x5f0
[ 7174.469413]  [<ffffffff8127d83b>] ext4_ext_map_blocks+0x70b/0x1010
[ 7174.469413]  [<ffffffff8124e6d5>] ext4_map_blocks+0x325/0x530
[ 7174.469413]  [<ffffffff81253871>] ext4_writepages+0x6d1/0xce0
[ 7174.469413]  [<ffffffff812531a0>] ? ext4_journalled_write_end+0x330/0x330
[ 7174.469413]  [<ffffffff811539b3>] do_writepages+0x23/0x40
[ 7174.469413]  [<ffffffff811d2365>] __writeback_single_inode+0x45/0x2e0
[ 7174.469413]  [<ffffffff811d36ed>] writeback_sb_inodes+0x2ad/0x500
[ 7174.469413]  [<ffffffff811d39de>] __writeback_inodes_wb+0x9e/0xd0
[ 7174.469413]  [<ffffffff811d40bb>] wb_writeback+0x29b/0x350
[ 7174.469413]  [<ffffffff81057c3d>] ? __local_bh_enable_ip+0x6d/0xd0
[ 7174.469413]  [<ffffffff811d6e9c>] bdi_writeback_workfn+0x11c/0x480
[ 7174.469413]  [<ffffffff81070610>] ? process_one_work+0x170/0x570
[ 7174.469413]  [<ffffffff81070672>] process_one_work+0x1d2/0x570
[ 7174.469413]  [<ffffffff81070610>] ? process_one_work+0x170/0x570
[ 7174.469413]  [<ffffffff81071bb6>] worker_thread+0x116/0x370
[ 7174.469413]  [<ffffffff81071aa0>] ? manage_workers.isra.19+0x2e0/0x2e0
[ 7174.469413]  [<ffffffff81078e53>] kthread+0xf3/0x110
[ 7174.469413]  [<ffffffff81078d60>] ? flush_kthread_worker+0x150/0x150
[ 7174.469413]  [<ffffffff816ef0ec>] ret_from_fork+0x7c/0xb0
[ 7174.469413]  [<ffffffff81078d60>] ? flush_kthread_worker+0x150/0x150
[ 7174.469413] Code: c0 49 bc fc 8d 76 82 ff ff ff ff e8 44 5a 5b 00 31 f6 8b 05 95 2b b3 00 48 39 c6 7d 0e 4c 8b 04 f5 20 5f c5 81 49 83 f8 ff 75 11 <0f> 0b 48 63 05 71 5a 64 01 48 29 c3 e9 d0 fd ff ff 48 8d 5e 01
[ 7174.469413] RIP  [<ffffffff8112336e>] stack_trace_call+0x2de/0x340
[ 7174.469413]  RSP <ffff880029518290>
[ 7174.469413] ---[ end trace c97d325b36b718f3 ]---

Link: http://lkml.kernel.org/p/1401683592-1651-1-git-send-email-minchan@kernel.org

Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-06-02 16:43:49 -04:00
Steven Rostedt (Red Hat) 4c27e756bc tracing: Move locking of trace_cmdline_lock into start/stop seq calls
With the conversion of the saved_cmdlines output to use seq_read, there
is now a race between accessing the values of the saved_cmdlines and
the writing to them. The trace_cmdline_lock needs to be taken at
the start and stop of the seq calls.

A new __trace_find_cmdline() call is created to allow for the look up
to happen without taking the lock.

Fixes: 42584c81c5 tracing: Have saved_cmdlines use the seq_read infrastructure
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-05-30 13:03:40 -04:00
Steven Rostedt (Red Hat) 379cfdac37 tracing: Try again for saved cmdline if failed due to locking
In order to prevent the saved cmdline cache from being filled when
tracing is not active, the comms are only recorded after a trace event
is recorded.

The problem is, a comm can fail to be recorded if the trace_cmdline_lock
is held. That lock is taken via a trylock to allow it to happen from
any context (including NMI). If the lock fails to be taken, the comm
is skipped. No big deal, as we will try again later.

But! Because of the code that was added to only record after an event,
we may not try again later as the recording is made as a oneshot per
event per CPU.

Only disable the recording of the comm if the comm is actually recorded.

Fixes: 7ffbd48d5c "tracing: Cache comms only after an event occurred"
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-05-30 09:42:39 -04:00
Yoshihiro YUNOMAE 42584c81c5 tracing: Have saved_cmdlines use the seq_read infrastructure
Current tracing_saved_cmdlines_read() implementation is naive; It allocates
a large buffer, constructs output data to that buffer for each read
operation, and then copies a portion of the buffer to the user space
buffer. This has several issues such as slow memory allocation, high
CPU usage, and even corruption of the output data.

The seq_read infrastructure is made to handle this type of work.
By converting it to use seq_read() the code becomes smaller, simplified,
as well as correct.

Link: http://lkml.kernel.org/p/20140220084431.3839.51793.stgit@yunodevel

Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Yoshihiro YUNOMAE <yoshihiro.yunomae.ez@hitachi.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-05-29 23:08:07 -04:00
Steven Rostedt (Red Hat) 81dc9f0ef2 tracing: Add tracepoint benchmark tracepoint
In order to help benchmark the time tracepoints take, a new config
option is added called CONFIG_TRACEPOINT_BENCHMARK. When this option
is set a tracepoint is created called "benchmark:benchmark_event".
When the tracepoint is enabled, it kicks off a kernel thread that
goes into an infinite loop (calling cond_sched() to let other tasks
run), and calls the tracepoint. Each iteration will record the time
it took to write to the tracepoint and the next iteration that
data will be passed to the tracepoint itself. That is, the tracepoint
will report the time it took to do the previous tracepoint.
The string written to the tracepoint is a static string of 128 bytes
to keep the time the same. The initial string is simply a write of
"START". The second string records the cold cache time of the first
write which is not added to the rest of the calculations.

As it is a tight loop, it benchmarks as hot cache. That's fine because
we care most about hot paths that are probably in cache already.

An example of the output:

     START
     first=3672 [COLD CACHED]
     last=632 first=3672 max=632 min=632 avg=316 std=446 std^2=199712
     last=278 first=3672 max=632 min=278 avg=303 std=316 std^2=100337
     last=277 first=3672 max=632 min=277 avg=296 std=258 std^2=67064
     last=273 first=3672 max=632 min=273 avg=292 std=224 std^2=50411
     last=273 first=3672 max=632 min=273 avg=288 std=200 std^2=40389
     last=281 first=3672 max=632 min=273 avg=287 std=183 std^2=33666

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-05-29 22:49:54 -04:00
Steven Rostedt 2184db46e4 tracing: Print nasty banner when trace_printk() is in use
trace_printk() is used to debug fast paths within the kernel. Places
that gets called in any context (interrupt or NMI) or thousands of
times a second. Something you do not want to do with a printk().

In order to make it completely lockless as it needs a temporary buffer
to handle some of the string formatting, a page is created per cpu for
every context (four per cpu; normal, softirq, irq, NMI).

Since trace_printk() should only be used for debugging purposes,
there's no reason to waste memory on these buffers on a production
system. That means, trace_printk() should never be used unless a
developer is debugging their kernel. There's macro magic to allocate
the buffers if trace_printk() is used anywhere in the kernel.

To help enforce that trace_printk() isn't used outside of development,
when it is used, a nasty banner is displayed on bootup (or when a module
is loaded that uses trace_printk() and the kernel core does not).

Here's the banner:

 **********************************************************
 **   NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE   **
 **                                                      **
 ** trace_printk() being used. Allocating extra memory.  **
 **                                                      **
 ** This means that this is a DEBUG kernel and it is     **
 ** unsafe for produciton use.                           **
 **                                                      **
 ** If you see this message and you are not debugging    **
 ** the kernel, report this immediately to your vendor!  **
 **                                                      **
 **   NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE   **
 **********************************************************

That should hopefully keep developers from trying to sneak in a
trace_printk() or two.

Link: http://lkml.kernel.org/p/20140528131440.2283213c@gandalf.local.home

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-05-29 20:13:59 -04:00
Robert Elliott 607e3a2920 tracing: Add funcgraph_tail option to print function name after closing braces
In the function-graph tracer, add a funcgraph_tail option
to print the function name on all } lines, not just
functions whose first line is no longer in the trace
buffer.

If a function calls other traced functions, its total
time appears on its } line.  This change allows grep
to be used to determine the function for which the
line corresponds.

Update Documentation/trace/ftrace.txt to describe
this new option.

Link: http://lkml.kernel.org/p/20140520221041.8359.6782.stgit@beardog.cce.hp.com

Signed-off-by: Robert Elliott <elliott@hp.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-05-20 23:29:32 -04:00
Robert Elliott ccdb594653 tracing: Eliminate duplicate TRACE_GRAPH_PRINT_xx defines
Eliminate duplicate TRACE_GRAPH_PRINT_xx defines
in trace_functions_graph.c that are already in
trace.h.

Add TRACE_GRAPH_PRINT_IRQS to trace.h, which is
the only one that is missing.

Link: http://lkml.kernel.org/p/20140520221031.8359.24733.stgit@beardog.cce.hp.com

Signed-off-by: Robert Elliott <elliott@hp.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-05-20 23:28:34 -04:00
Steven Rostedt (Red Hat) 4449bf927b tracing: Add __bitmask() macro to trace events to cpumasks and other bitmasks
Being able to show a cpumask of events can be useful as some events
may affect only some CPUs. There is no standard way to record the
cpumask and converting it to a string is rather expensive during
the trace as traces happen in hotpaths. It would be better to record
the raw event mask and be able to parse it at print time.

The following macros were added for use with the TRACE_EVENT() macro:

  __bitmask()
  __assign_bitmask()
  __get_bitmask()

To test this, I added this to the sched_migrate_task event, which
looked like this:

TRACE_EVENT(sched_migrate_task,

	TP_PROTO(struct task_struct *p, int dest_cpu, const struct cpumask *cpus),

	TP_ARGS(p, dest_cpu, cpus),

	TP_STRUCT__entry(
		__array(	char,	comm,	TASK_COMM_LEN	)
		__field(	pid_t,	pid			)
		__field(	int,	prio			)
		__field(	int,	orig_cpu		)
		__field(	int,	dest_cpu		)
		__bitmask(	cpumask, num_possible_cpus()	)
	),

	TP_fast_assign(
		memcpy(__entry->comm, p->comm, TASK_COMM_LEN);
		__entry->pid		= p->pid;
		__entry->prio		= p->prio;
		__entry->orig_cpu	= task_cpu(p);
		__entry->dest_cpu	= dest_cpu;
		__assign_bitmask(cpumask, cpumask_bits(cpus), num_possible_cpus());
	),

	TP_printk("comm=%s pid=%d prio=%d orig_cpu=%d dest_cpu=%d cpumask=%s",
		  __entry->comm, __entry->pid, __entry->prio,
		  __entry->orig_cpu, __entry->dest_cpu,
		  __get_bitmask(cpumask))
);

With the output of:

        ksmtuned-3613  [003] d..2   485.220508: sched_migrate_task: comm=ksmtuned pid=3615 prio=120 orig_cpu=3 dest_cpu=2 cpumask=00000000,0000000f
     migration/1-13    [001] d..5   485.221202: sched_migrate_task: comm=ksmtuned pid=3614 prio=120 orig_cpu=1 dest_cpu=0 cpumask=00000000,0000000f
             awk-3615  [002] d.H5   485.221747: sched_migrate_task: comm=rcu_preempt pid=7 prio=120 orig_cpu=0 dest_cpu=1 cpumask=00000000,000000ff
     migration/2-18    [002] d..5   485.222062: sched_migrate_task: comm=ksmtuned pid=3615 prio=120 orig_cpu=2 dest_cpu=3 cpumask=00000000,0000000f

Link: http://lkml.kernel.org/r/1399377998-14870-6-git-send-email-javi.merino@arm.com
Link: http://lkml.kernel.org/r/20140506132238.22e136d1@gandalf.local.home

Suggested-by: Javi Merino <javi.merino@arm.com>
Tested-by: Javi Merino <javi.merino@arm.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-05-15 11:29:37 -04:00
Steven Rostedt (Red Hat) f1b2f2bd58 ftrace: Remove FTRACE_UPDATE_MODIFY_CALL_REGS flag
As the decision to what needs to be done (converting a call to the
ftrace_caller to ftrace_caller_regs or to convert from ftrace_caller_regs
to ftrace_caller) can easily be determined from the rec->flags of
FTRACE_FL_REGS and FTRACE_FL_REGS_EN, there's no need to have the
ftrace_check_record() return either a UPDATE_MODIFY_CALL_REGS or a
UPDATE_MODIFY_CALL. Just he latter is enough. This added flag causes
more complexity than is required. Remove it.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-05-14 11:37:30 -04:00
Steven Rostedt (Red Hat) 7c0868e03b ftrace: Use the ftrace_addr helper functions to find the ftrace_addr
With the moving of the functions that determine what the mcount call site
should be replaced with into the generic code, there is a few places
in the generic code that can use them instead of hard coding it as it
does.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-05-14 11:37:29 -04:00
Steven Rostedt (Red Hat) 7413af1fb7 ftrace: Make get_ftrace_addr() and get_ftrace_addr_old() global
Move and rename get_ftrace_addr() and get_ftrace_addr_old() to
ftrace_get_addr_new() and ftrace_get_addr_curr() respectively.

This moves these two helper functions in the generic code out from
the arch specific code, and renames them to have a better generic
name. This will allow other archs to use them as well as makes it
a bit easier to work on getting separate trampolines for different
functions.

ftrace_get_addr_new() returns the trampoline address that the mcount
call address will be converted to.

ftrace_get_addr_curr() returns the trampoline address of what the
mcount call address currently jumps to.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-05-14 11:37:29 -04:00
Steven Rostedt (Red Hat) 68f40969f0 ftrace: Always inline ftrace_hash_empty() helper function
The ftrace_hash_empty() function is a simple test:

	return !hash || !hash->count;

But gcc seems to want to make it a call. As this is in an extreme
hot path of the function tracer, there's no reason it needs to be
a call. I only wrote it to be a helper function anyway, otherwise
it would have been inlined manually.

Force gcc to inline it, as it could have also been a macro.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-05-14 11:37:28 -04:00
Steven Rostedt (Red Hat) 19eab4a472 ftrace: Write in missing comment from a very old commit
Back in 2011 Commit ed926f9b35 "ftrace: Use counters to enable
functions to trace" changed the way ftrace accounts for enabled
and disabled traced functions. There was a comment started as:

	/*
	 *
	 */

But never finished. Well, that's rather useless. I probably forgot
to save the file before committing it. And it passed review from all
this time.

Anyway, better late than never. I updated the comment to express what
is happening in that somewhat complex code.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-05-14 11:37:27 -04:00
Steven Rostedt (Red Hat) 66209a5bd4 ftrace: Remove boolean of hash_enable and hash_disable
Commit 4104d326b6 "ftrace: Remove global function list and call
function directly" cleaned up the global_ops filtering and made
the code simpler, but it left a variable "hash_enable" that was used
to know if the hash functions should be updated or not. It was
updated if the global_ops did not override them. As the global_ops
are now no different than any other ftrace_ops, the hash always
gets updated and there's no reason to use the hash_enable boolean.

The same goes for hash_disable used in ftrace_shutdown().

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-05-14 11:37:25 -04:00
Christoph Lameter bdffd893a0 tracing: Replace __get_cpu_var uses with this_cpu_ptr
Replace uses of &__get_cpu_var for address calculation with this_cpu_ptr.

Link: http://lkml.kernel.org/p/alpine.DEB.2.10.1404291415560.18364@gentwo.org

Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-05-05 22:40:53 -04:00
Steven Rostedt (Red Hat) 561a4fe851 tracing: Use rcu_dereference_sched() for trace event triggers
As trace event triggers are now part of the mainline kernel, I added
my trace event trigger tests to my test suite I run on all my kernels.
Now these tests get run under different config options, and one of
those options is CONFIG_PROVE_RCU, which checks under lockdep that
the rcu locking primitives are being used correctly. This triggered
the following splat:

===============================
[ INFO: suspicious RCU usage. ]
3.15.0-rc2-test+ #11 Not tainted
-------------------------------
kernel/trace/trace_events_trigger.c:80 suspicious rcu_dereference_check() usage!

other info that might help us debug this:

rcu_scheduler_active = 1, debug_locks = 0
4 locks held by swapper/1/0:
 #0:  ((&(&j_cdbs->work)->timer)){..-...}, at: [<ffffffff8104d2cc>] call_timer_fn+0x5/0x1be
 #1:  (&(&pool->lock)->rlock){-.-...}, at: [<ffffffff81059856>] __queue_work+0x140/0x283
 #2:  (&p->pi_lock){-.-.-.}, at: [<ffffffff8106e961>] try_to_wake_up+0x2e/0x1e8
 #3:  (&rq->lock){-.-.-.}, at: [<ffffffff8106ead3>] try_to_wake_up+0x1a0/0x1e8

stack backtrace:
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.15.0-rc2-test+ #11
Hardware name:                  /DG965MQ, BIOS MQ96510J.86A.0372.2006.0605.1717 06/05/2006
 0000000000000001 ffff88007e083b98 ffffffff819f53a5 0000000000000006
 ffff88007b0942c0 ffff88007e083bc8 ffffffff81081307 ffff88007ad96d20
 0000000000000000 ffff88007af2d840 ffff88007b2e701c ffff88007e083c18
Call Trace:
 <IRQ>  [<ffffffff819f53a5>] dump_stack+0x4f/0x7c
 [<ffffffff81081307>] lockdep_rcu_suspicious+0x107/0x110
 [<ffffffff810ee51c>] event_triggers_call+0x99/0x108
 [<ffffffff810e8174>] ftrace_event_buffer_commit+0x42/0xa4
 [<ffffffff8106aadc>] ftrace_raw_event_sched_wakeup_template+0x71/0x7c
 [<ffffffff8106bcbf>] ttwu_do_wakeup+0x7f/0xff
 [<ffffffff8106bd9b>] ttwu_do_activate.constprop.126+0x5c/0x61
 [<ffffffff8106eadf>] try_to_wake_up+0x1ac/0x1e8
 [<ffffffff8106eb77>] wake_up_process+0x36/0x3b
 [<ffffffff810575cc>] wake_up_worker+0x24/0x26
 [<ffffffff810578bc>] insert_work+0x5c/0x65
 [<ffffffff81059982>] __queue_work+0x26c/0x283
 [<ffffffff81059999>] ? __queue_work+0x283/0x283
 [<ffffffff810599b7>] delayed_work_timer_fn+0x1e/0x20
 [<ffffffff8104d3a6>] call_timer_fn+0xdf/0x1be^M
 [<ffffffff8104d2cc>] ? call_timer_fn+0x5/0x1be
 [<ffffffff81059999>] ? __queue_work+0x283/0x283
 [<ffffffff8104d823>] run_timer_softirq+0x1a4/0x22f^M
 [<ffffffff8104696d>] __do_softirq+0x17b/0x31b^M
 [<ffffffff81046d03>] irq_exit+0x42/0x97
 [<ffffffff81a08db6>] smp_apic_timer_interrupt+0x37/0x44
 [<ffffffff81a07a2f>] apic_timer_interrupt+0x6f/0x80
 <EOI>  [<ffffffff8100a5d8>] ? default_idle+0x21/0x32
 [<ffffffff8100a5d6>] ? default_idle+0x1f/0x32
 [<ffffffff8100ac10>] arch_cpu_idle+0xf/0x11
 [<ffffffff8107b3a4>] cpu_startup_entry+0x1a3/0x213
 [<ffffffff8102a23c>] start_secondary+0x212/0x219

The cause is that the triggers are protected by rcu_read_lock_sched() but
the data is dereferenced with rcu_dereference() which expects it to
be protected with rcu_read_lock(). The proper reference should be
rcu_dereference_sched().

Cc: Tom Zanussi <tom.zanussi@linux.intel.com>
Cc: stable@vger.kernel.org # 3.14+
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-05-02 23:12:42 -04:00
Steven Rostedt (Red Hat) fd06a54990 ftrace: Have function graph tracer use global_ops for filtering
Commit 4104d326b6 "ftrace: Remove global function list and call
function directly" cleaned up the global_ops filtering and made
the code simpler. But it left out function graph filtering which
also depended on that code. The function graph filtering still
needs to use global_ops as the filter otherwise it wont filter
at all.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-05-01 23:21:16 -04:00
Oleg Nesterov 927d687480 uprobes/tracing: Fix uprobe_perf_open() on uprobe_apply() failure
uprobe_perf_open()->uprobe_apply() can fail, but this error is wrongly
ignored. Change uprobe_perf_open() to do uprobe_perf_close() and return
the error code in this case.

Change uprobe_perf_close() to propogate the error from uprobe_apply()
as well, although it should not fail.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2014-04-30 19:10:42 +02:00
Oleg Nesterov ce5f36a58f uprobes/tracing: Make uprobe_perf_close() visible to uprobe_perf_open()
Preparation. Move uprobe_perf_close() up before uprobe_perf_open() to
avoid the forward declaration in the next patch and make it readable.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2014-04-30 19:10:42 +02:00
Steven Rostedt (Red Hat) b1169cc69b tracing: Remove mock up poll wait function
Now that the ring buffer has a built in way to wake up readers
when there's data, using irq_work such that it is safe to do it
in any context. But it was still using the old "poor man's"
wait polling that checks every 1/10 of a second to see if it
should wake up a waiter. This makes the latency for a wake up
excruciatingly long. No need to do that anymore.

Completely remove the different wait_poll types from the tracers
and have them all use the default one now.

Reported-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-04-30 08:40:05 -04:00
Steven Rostedt (Red Hat) f487426104 tracing: Break out of tracing_wait_pipe() before wait_pipe() is called
When reading from trace_pipe, if tracing is off but nothing was read
it should block. If something is read and tracing is off, then EOF
is returned. If tracing is on and there's nothing to read, it will block.

But because the check of whether tracing is off and something was read
is done after the block on the pipe, it is hit or miss if the EOF is
returned or not leading to inconsistent behavior.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-04-29 16:07:28 -04:00
Steven Rostedt (Red Hat) a949ae560a ftrace/module: Hardcode ftrace_module_init() call into load_module()
A race exists between module loading and enabling of function tracer.

	CPU 1				CPU 2
	-----				-----
  load_module()
   module->state = MODULE_STATE_COMING

				register_ftrace_function()
				 mutex_lock(&ftrace_lock);
				 ftrace_startup()
				  update_ftrace_function();
				   ftrace_arch_code_modify_prepare()
				    set_all_module_text_rw();
				   <enables-ftrace>
				    ftrace_arch_code_modify_post_process()
				     set_all_module_text_ro();

				[ here all module text is set to RO,
				  including the module that is
				  loading!! ]

   blocking_notifier_call_chain(MODULE_STATE_COMING);
    ftrace_init_module()

     [ tries to modify code, but it's RO, and fails!
       ftrace_bug() is called]

When this race happens, ftrace_bug() will produces a nasty warning and
all of the function tracing features will be disabled until reboot.

The simple solution is to treate module load the same way the core
kernel is treated at boot. To hardcode the ftrace function modification
of converting calls to mcount into nops. This is done in init/main.c
there's no reason it could not be done in load_module(). This gives
a better control of the changes and doesn't tie the state of the
module to its notifiers as much. Ftrace is special, it needs to be
treated as such.

The reason this would work, is that the ftrace_module_init() would be
called while the module is in MODULE_STATE_UNFORMED, which is ignored
by the set_all_module_text_ro() call.

Link: http://lkml.kernel.org/r/1395637826-3312-1-git-send-email-indou.takao@jp.fujitsu.com

Reported-by: Takao Indoh <indou.takao@jp.fujitsu.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: stable@vger.kernel.org # 2.6.38+
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-04-28 10:37:21 -04:00
Jiaxing Wang 8d1b065d47 tracing: Fix documentation of ftrace_set_global_{filter,notrace}()
The functions ftrace_set_global_filter() and ftrace_set_global_notrace()
still have their old names in the kernel doc (ftrace_set_filter and
ftrace_set_notrace respectively). Replace these with the real names.

Link: http://lkml.kernel.org/p/1398006644-5935-3-git-send-email-wangjiaxing@insigma.com.cn

Signed-off-by: Jiaxing Wang <wangjiaxing@insigma.com.cn>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-04-24 13:38:01 -04:00
Jiaxing Wang 7eea4fce02 tracing/stack_trace: Skip 4 instead of 3 when using ftrace_ops_list_func
When using ftrace_ops_list_func, we should skip 4 instead of 3,
to avoid ftrace_call+0x5/0xb appearing in the stack trace:

        Depth    Size   Location    (110 entries)
        -----    ----   --------
  0)     2956       0   update_curr+0xe/0x1e0
  1)     2956      68   ftrace_call+0x5/0xb
  2)     2888      92   enqueue_entity+0x53/0xe80
  3)     2796      80   enqueue_task_fair+0x47/0x7e0
  4)     2716      28   enqueue_task+0x45/0x70
  5)     2688      12   activate_task+0x22/0x30

Add a function using_ftrace_ops_list_func() to test for this while keeping
ftrace_ops_list_func to remain static.

Link: http://lkml.kernel.org/p/1398006644-5935-2-git-send-email-wangjiaxing@insigma.com.cn

Signed-off-by: Jiaxing Wang <wangjiaxing@insigma.com.cn>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-04-24 13:36:03 -04:00
Masami Hiramatsu 3da0f18007 kprobes, ftrace: Use NOKPROBE_SYMBOL macro in ftrace
Use NOKPROBE_SYMBOL macro to protect functions from
kprobes instead of __kprobes annotation in ftrace.
This applies nokprobe_inline annotation for some cases,
because NOKPROBE_SYMBOL() will inhibit inlining by
referring the symbol address.

Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20140417081828.26341.55152.stgit@ltc230.yrl.intra.hitachi.co.jp
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-04-24 10:26:39 +02:00
Masami Hiramatsu fbc1963d2c kprobes, ftrace: Allow probing on some functions
There is no need to prohibit probing on the functions
used for preparation and uprobe only fetch functions.
Those are safely probed because those are not invoked
from kprobe's breakpoint/fault/debug handlers. So there
is no chance to cause recursive exceptions.

Following functions are now removed from the kprobes blacklist:

	update_bitfield_fetch_param
	free_bitfield_fetch_param
	kprobe_register
	FETCH_FUNC_NAME(stack, type) in trace_uprobe.c
	FETCH_FUNC_NAME(memory, type) in trace_uprobe.c
	FETCH_FUNC_NAME(memory, string) in trace_uprobe.c
	FETCH_FUNC_NAME(memory, string_size) in trace_uprobe.c
	FETCH_FUNC_NAME(file_offset, type) in trace_uprobe.c

Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20140417081800.26341.56504.stgit@ltc230.yrl.intra.hitachi.co.jp
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-04-24 10:03:02 +02:00
Fabian Frederick ad1438a076 tracing: Add static to local functions
This patch adds static to the following functions:
-cycle_t buffer_ftrace_now
-void free_snapshot
-int trace_selftest_startup_dynamic_tracing

Link: http://lkml.kernel.org/p/20140417214442.d7abc7c0b0e4b90e7fedecc9@skynet.be

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-04-21 14:00:46 -04:00
Mathias Krause 8275f69f07 ftrace: Statically initialize pm notifier block
Instead of initializing the pm notifier block in register_ftrace_graph(),
initialize it statically. This safes us some code.

Found in the PaX patch, written by the PaX Team.

Link: http://lkml.kernel.org/p/1396186310-3156-1-git-send-email-minipli@googlemail.com

Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: PaX Team <pageexec@freemail.hu>
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-04-21 14:00:46 -04:00
Steven Rostedt (Red Hat) 02f2f7646f tracing: Allow irq/preempt tracers to be used by instances
The irqsoff, preemptoff and preemptirqsoff tracers can now be used by
instances. But they may only be used by one instance at a time (including
the top level directory). This allows multiple tracers to run while the
irqsoff (and friends) tracer is running simultaneously.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-04-21 13:59:29 -04:00
Steven Rostedt (Red Hat) 65daaca7c6 tracing: Allow wakeup tracers to be used by instances
The wakeup and wakeup_rt tracers can now be used by instances.
But they may only be used by one instance at a time (including the
top level directory). This allows multiple tracers to run while
the wakeup tracer is running simultaneously.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-04-21 13:59:28 -04:00
Steven Rostedt (Red Hat) 0b9b12c1b8 tracing: Move ftrace_max_lock into trace_array
In preparation for having tracers enabled in instances, the max_lock
should be unique as updating the max for one tracer is a separate
operation than updating it for another tracer using a different max.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-04-21 13:59:27 -04:00
Steven Rostedt (Red Hat) 6d9b3fa5e7 tracing: Move tracing_max_latency into trace_array
In preparation for letting the latency tracers be used by instances,
remove the global tracing_max_latency variable and add a max_latency
field to the trace_array that the latency tracers will now use.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-04-21 13:59:26 -04:00
Steven Rostedt (Red Hat) 4104d326b6 ftrace: Remove global function list and call function directly
Instead of having a list of global functions that are called,
as only one global function is allow to be enabled at a time, there's
no reason to have a list.

Instead, simply have all the users of the global ops, use the global ops
directly, instead of registering their own ftrace_ops. Just switch what
function is used before enabling the function tracer.

This removes a lot of code as well as the complexity involved with it.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-04-21 13:59:25 -04:00
Linus Torvalds 7d77879bfd This contains two fixes.
The first is to remove a duplication of creating debugfs files that
 already exist and causes an error report to be printed due to the
 failure of the second creation.
 
 The second is a memory leak fix that was introduced in 3.14.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJTUGZwAAoJEKQekfcNnQGu7W8IAIAMBVfrWdP6cmGle4tGfhVE
 sHcwqTH+07oANQJ3eFwFs5wBMb08s3hXwUHUxXcpjyq2Bs+AHr0vSL/nqCG4k8Ap
 2T4ntL7esC1BWKw2lVVVYD12FiL7grUXVlx/q0WE2NuhCzWzNRTyb8sKrPoCRUEB
 3o5rAt9+45PKUb2k/eqGBGhK8b4XDz2Wtk5Gj6YB3xttse/yjjcuw0gWMHN1JWfm
 eRuQUUBDDGUGkfF98k1aLrjPZooT3LIAV8L8md5C3ebEcXSC/h86hTYCGXv3oBDO
 8sxcT0zoQcLuFhjkYLL1J1lBW6gxaVh052jYmQwMppQMos+WID2un2E92Ccg49E=
 =BwLF
 -----END PGP SIGNATURE-----

Merge tag 'trace-fixes-v3.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing fixes from Steven Rostedt:
 "This contains two fixes.

  The first is to remove a duplication of creating debugfs files that
  already exist and causes an error report to be printed due to the
  failure of the second creation.

  The second is a memory leak fix that was introduced in 3.14"

* tag 'trace-fixes-v3.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing/uprobes: Fix uprobe_cpu_buffer memory leak
  tracing: Do not try to recreated toplevel set_ftrace_* files
2014-04-18 10:16:43 -07:00
zhangwei(Jovi) 6ea6215fe3 tracing/uprobes: Fix uprobe_cpu_buffer memory leak
Forgot to free uprobe_cpu_buffer percpu page in uprobe_buffer_disable().

Link: http://lkml.kernel.org/p/534F8B3F.1090407@huawei.com

Cc: stable@vger.kernel.org # v3.14+
Acked-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: zhangwei(Jovi) <jovi.zhangwei@huawei.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-04-17 10:44:42 -04:00
Steven Rostedt (Red Hat) 5d6c97c559 tracing: Do not try to recreated toplevel set_ftrace_* files
With the restructing of the function tracer working with instances, the
"top level" buffer is a bit special, as the function tracing is mapped
to the same set of filters. This is done by using a "global_ops" descriptor
and having the "set_ftrace_filter" and "set_ftrace_notrace" map to it.

When an instance is created, it creates the same files but its for the
local instance and not the global_ops.

The issues is that the local instance creation shares some code with
the global instance one and we end up trying to create th top level
"set_ftrace_*" files twice, and on boot up, we get an error like this:

 Could not create debugfs 'set_ftrace_filter' entry
 Could not create debugfs 'set_ftrace_notrace' entry

The reason they failed to be created was because they were created
twice, and the second time gives this error as you can not create the
same file twice.

Reported-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-04-16 19:21:53 -04:00
Linus Torvalds 5166701b36 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs updates from Al Viro:
 "The first vfs pile, with deep apologies for being very late in this
  window.

  Assorted cleanups and fixes, plus a large preparatory part of iov_iter
  work.  There's a lot more of that, but it'll probably go into the next
  merge window - it *does* shape up nicely, removes a lot of
  boilerplate, gets rid of locking inconsistencie between aio_write and
  splice_write and I hope to get Kent's direct-io rewrite merged into
  the same queue, but some of the stuff after this point is having
  (mostly trivial) conflicts with the things already merged into
  mainline and with some I want more testing.

  This one passes LTP and xfstests without regressions, in addition to
  usual beating.  BTW, readahead02 in ltp syscalls testsuite has started
  giving failures since "mm/readahead.c: fix readahead failure for
  memoryless NUMA nodes and limit readahead pages" - might be a false
  positive, might be a real regression..."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits)
  missing bits of "splice: fix racy pipe->buffers uses"
  cifs: fix the race in cifs_writev()
  ceph_sync_{,direct_}write: fix an oops on ceph_osdc_new_request() failure
  kill generic_file_buffered_write()
  ocfs2_file_aio_write(): switch to generic_perform_write()
  ceph_aio_write(): switch to generic_perform_write()
  xfs_file_buffered_aio_write(): switch to generic_perform_write()
  export generic_perform_write(), start getting rid of generic_file_buffer_write()
  generic_file_direct_write(): get rid of ppos argument
  btrfs_file_aio_write(): get rid of ppos
  kill the 5th argument of generic_file_buffered_write()
  kill the 4th argument of __generic_file_aio_write()
  lustre: don't open-code kernel_recvmsg()
  ocfs2: don't open-code kernel_recvmsg()
  drbd: don't open-code kernel_recvmsg()
  constify blk_rq_map_user_iov() and friends
  lustre: switch to kernel_sendmsg()
  ocfs2: don't open-code kernel_sendmsg()
  take iov_iter stuff to mm/iov_iter.c
  process_vm_access: tidy up a bit
  ...
2014-04-12 14:49:50 -07:00
Linus Torvalds 0a7418f5f5 This includes the final patch to clean up and fix the issue with the
design of tracepoints and how a user could register a tracepoint
 and have that tracepoint not be activated but no error was shown.
 
 The design was for an out of tree module but broke in tree users.
 The clean up was to remove the saving of the hash table of tracepoint
 names such that they can be enabled before they exist (enabling
 a module tracepoint before that module is loaded). This added more
 complexity than needed. The clean up was to remove that code and
 just enable tracepoints that exist or fail if they do not.
 
 This removed a lot of code as well as the complexity that it brought.
 As a side effect, instead of registering a tracepoint by its name,
 the tracepoint needs to be registered with the tracepoint descriptor.
 This removes having to duplicate the tracepoint names that are
 enabled.
 
 The second patch was added that simplified the way modules were
 searched for.
 
 This cleanup required changes that were in the 3.15 queue as well as
 some changes that were added late in the 3.14-rc cycle. This final
 change waited till the two were merged in upstream and then the
 change was added and full tests were run. Unfortunately, the
 test found some errors, but after it was already submitted to the
 for-next branch and not to be rebased. Sparse errors were detected
 by Fengguang Wu's bot tests, and my internal tests discovered that
 the anonymous union initialization triggered a bug in older gcc compilers.
 Luckily, there was a bugzilla for the gcc bug which gave a work around
 to the problem. The third and fourth patch handled the sparse error
 and the gcc bug respectively.
 
 A final patch was tagged along to fix a missing documentation for
 the README file.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJTR+pwAAoJEKQekfcNnQGuvfoH/A4XZu4/1h2ZuKhzGi6lrrWr
 +zHUQ+JmGiAYRziQFwr2t/gqJ2vmDfHJnbDjKi6Emx8JcxesHas6CQOWps4zEic0
 dwYSQjvuGNGFIFt+7I0K1OxfVVdt2PQ2lVrB5WgYdbash5J4Bi+09QBv0RbUKheo
 37dKSeN3pbsuQsR70OTVP8laG3dA9IbHW7PsKnxIEB5zeIUHUBME/QdPPj/CuJwk
 wxZjXC2dbc3rdRlQjTVtWV3ZkGgZJB0k+JxjvZTA0N6u8Hj8LiFPuNawzf7ceBHx
 gc++57+WuMW0f0X/ar5/+3UPGFQKMSvKmdxIQCnWXQz5seTYYKDEx7mTH22fxgg=
 =OgeQ
 -----END PGP SIGNATURE-----

Merge tag 'trace-3.15-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull more tracing updates from Steven Rostedt:
 "This includes the final patch to clean up and fix the issue with the
  design of tracepoints and how a user could register a tracepoint and
  have that tracepoint not be activated but no error was shown.

  The design was for an out of tree module but broke in tree users.  The
  clean up was to remove the saving of the hash table of tracepoint
  names such that they can be enabled before they exist (enabling a
  module tracepoint before that module is loaded).  This added more
  complexity than needed.  The clean up was to remove that code and just
  enable tracepoints that exist or fail if they do not.

  This removed a lot of code as well as the complexity that it brought.
  As a side effect, instead of registering a tracepoint by its name, the
  tracepoint needs to be registered with the tracepoint descriptor.
  This removes having to duplicate the tracepoint names that are
  enabled.

  The second patch was added that simplified the way modules were
  searched for.

  This cleanup required changes that were in the 3.15 queue as well as
  some changes that were added late in the 3.14-rc cycle.  This final
  change waited till the two were merged in upstream and then the change
  was added and full tests were run.  Unfortunately, the test found some
  errors, but after it was already submitted to the for-next branch and
  not to be rebased.  Sparse errors were detected by Fengguang Wu's bot
  tests, and my internal tests discovered that the anonymous union
  initialization triggered a bug in older gcc compilers.  Luckily, there
  was a bugzilla for the gcc bug which gave a work around to the
  problem.  The third and fourth patch handled the sparse error and the
  gcc bug respectively.

  A final patch was tagged along to fix a missing documentation for the
  README file"

* tag 'trace-3.15-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Add missing function triggers dump and cpudump to README
  tracing: Fix anonymous unions in struct ftrace_event_call
  tracepoint: Fix sparse warnings in tracepoint.c
  tracepoint: Simplify tracepoint module search
  tracepoint: Use struct pointer instead of name hash for reg/unreg tracepoints
2014-04-12 13:06:10 -07:00
Al Viro a786c06d9f missing bits of "splice: fix racy pipe->buffers uses"
that commit has fixed only the parts of that mess in fs/splice.c itself;
there had been more in several other ->splice_read() instances...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-04-12 07:04:19 -04:00
Steven Rostedt (Red Hat) 17a280ea81 tracing: Add missing function triggers dump and cpudump to README
The debugfs tracing README file lists all the function triggers except for
dump and cpudump. These should be added too.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-04-10 22:43:37 -04:00
Mathieu Desnoyers abb43f6998 tracing: Fix anonymous unions in struct ftrace_event_call
gcc <= 4.5.x has significant limitations with respect to initialization
of anonymous unions within structures. They need to be surrounded by
brackets, _and_ they need to be initialized in the same order in which
they appear in the structure declaration.

Link: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=10676
Link: http://lkml.kernel.org/r/1397077568-3156-1-git-send-email-mathieu.desnoyers@efficios.com

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-04-09 20:02:55 -04:00
Mathieu Desnoyers de7b297390 tracepoint: Use struct pointer instead of name hash for reg/unreg tracepoints
Register/unregister tracepoint probes with struct tracepoint pointer
rather than tracepoint name.

This change, which vastly simplifies tracepoint.c, has been proposed by
Steven Rostedt. It also removes 8.8kB (mostly of text) to the vmlinux
size.

From this point on, the tracers need to pass a struct tracepoint pointer
to probe register/unregister. A probe can now only be connected to a
tracepoint that exists. Moreover, tracers are responsible for
unregistering the probe before the module containing its associated
tracepoint is unloaded.

   text    data     bss     dec     hex filename
10443444        4282528 10391552        25117524        17f4354 vmlinux.orig
10434930        4282848 10391552        25109330        17f2352 vmlinux

Link: http://lkml.kernel.org/r/1396992381-23785-2-git-send-email-mathieu.desnoyers@efficios.com

CC: Ingo Molnar <mingo@kernel.org>
CC: Frederic Weisbecker <fweisbec@gmail.com>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Frank Ch. Eigler <fche@redhat.com>
CC: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
[ SDR - fixed return val in void func in tracepoint_module_going() ]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-04-08 20:43:28 -04:00
Linus Torvalds 26c12d9334 Merge branch 'akpm' (incoming from Andrew)
Merge second patch-bomb from Andrew Morton:
 - the rest of MM
 - zram updates
 - zswap updates
 - exit
 - procfs
 - exec
 - wait
 - crash dump
 - lib/idr
 - rapidio
 - adfs, affs, bfs, ufs
 - cris
 - Kconfig things
 - initramfs
 - small amount of IPC material
 - percpu enhancements
 - early ioremap support
 - various other misc things

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (156 commits)
  MAINTAINERS: update Intel C600 SAS driver maintainers
  fs/ufs: remove unused ufs_super_block_third pointer
  fs/ufs: remove unused ufs_super_block_second pointer
  fs/ufs: remove unused ufs_super_block_first pointer
  fs/ufs/super.c: add __init to init_inodecache()
  doc/kernel-parameters.txt: add early_ioremap_debug
  arm64: add early_ioremap support
  arm64: initialize pgprot info earlier in boot
  x86: use generic early_ioremap
  mm: create generic early_ioremap() support
  x86/mm: sparse warning fix for early_memremap
  lglock: map to spinlock when !CONFIG_SMP
  percpu: add preemption checks to __this_cpu ops
  vmstat: use raw_cpu_ops to avoid false positives on preemption checks
  slub: use raw_cpu_inc for incrementing statistics
  net: replace __this_cpu_inc in route.c with raw_cpu_inc
  modules: use raw_cpu_write for initialization of per cpu refcount.
  mm: use raw_cpu ops for determining current NUMA node
  percpu: add raw_cpu_ops
  slub: fix leak of 'name' in sysfs_slab_add
  ...
2014-04-07 16:38:06 -07:00
Gideon Israel Dsouza 52f5684c8e kernel: use macros from compiler.h instead of __attribute__((...))
To increase compiler portability there is <linux/compiler.h> which
provides convenience macros for various gcc constructs.  Eg: __weak for
__attribute__((weak)).  I've replaced all instances of gcc attributes
with the right macro in the kernel subsystem.

Signed-off-by: Gideon Israel Dsouza <gidisrael@gmail.com>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-07 16:36:11 -07:00
Linus Torvalds 467a9e1633 CPU hotplug notifiers registration fixes for 3.15-rc1
The purpose of this single series of commits from Srivatsa S Bhat (with
 a small piece from Gautham R Shenoy) touching multiple subsystems that use
 CPU hotplug notifiers is to provide a way to register them that will not
 lead to deadlocks with CPU online/offline operations as described in the
 changelog of commit 93ae4f978c (CPU hotplug: Provide lockless versions
 of callback registration functions).
 
 The first three commits in the series introduce the API and document it
 and the rest simply goes through the users of CPU hotplug notifiers and
 converts them to using the new method.
 
 /
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABCAAGBQJTQow2AAoJEILEb/54YlRxW4QQAJlYRDUzwFJzJzYhltQYuVR+
 4D74XMtvXgoJfg3cwdSWvMKKpJZnA9BVN0f7Hcx9wYmgdexYUuHeZJmMNyc3S2+g
 KjKBIsugvgmZhHbbLd6TJ6GBbhGT5JLt9VmSfL9zIkveInU1YHFUUqL/mxdHm4J0
 BSGKjk2rN3waRJgmY+xfliFLtQjDKFwJpMuvrgtoUyfas3f4sIV43UNbqdvA/weJ
 rzedxXOlKH/id4b56lj/4iIzcoL3mwvJJ7r6n0CEMsKv87z09kqR0O+69Tsq/cgs
 j17CsvoJOmZGk3QTeKVMQWBsvk6aPoDu3zK83gLbQMt+qjOpSTbJLz/3HZw4/TrW
 ss4nuZne1DLMGS+6hoxYbTP+6Ni//Kn+l/LrHc5jb7m1X3lMO4W2aV3IROtIE1rv
 lEP1IG01NU4u9YwkVj1dyhrkSp8tLPul4SrUK8W+oNweOC5crjJV7vJbIPJgmYiM
 IZN55wln0yVRtR4TX+rmvN0PixsInE8MeaVCmReApyF9pdzul/StxlBze5BKLSJD
 cqo1kNPpsmdxoDucqUpQ/gSvy+IOl2qnlisB5PpV93sk7De6TFDYrGHxjYIW7jMf
 StXwdCDDQhzd2Q8Kfpp895A1dbIl8rKtwA6bTU2eX+BfMVFzuMdT44cvosx1+UdQ
 sWl//rg76nb13dFjvF+q
 =SW7Q
 -----END PGP SIGNATURE-----

Merge tag 'cpu-hotplug-3.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull CPU hotplug notifiers registration fixes from Rafael Wysocki:
 "The purpose of this single series of commits from Srivatsa S Bhat
  (with a small piece from Gautham R Shenoy) touching multiple
  subsystems that use CPU hotplug notifiers is to provide a way to
  register them that will not lead to deadlocks with CPU online/offline
  operations as described in the changelog of commit 93ae4f978c ("CPU
  hotplug: Provide lockless versions of callback registration
  functions").

  The first three commits in the series introduce the API and document
  it and the rest simply goes through the users of CPU hotplug notifiers
  and converts them to using the new method"

* tag 'cpu-hotplug-3.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (52 commits)
  net/iucv/iucv.c: Fix CPU hotplug callback registration
  net/core/flow.c: Fix CPU hotplug callback registration
  mm, zswap: Fix CPU hotplug callback registration
  mm, vmstat: Fix CPU hotplug callback registration
  profile: Fix CPU hotplug callback registration
  trace, ring-buffer: Fix CPU hotplug callback registration
  xen, balloon: Fix CPU hotplug callback registration
  hwmon, via-cputemp: Fix CPU hotplug callback registration
  hwmon, coretemp: Fix CPU hotplug callback registration
  thermal, x86-pkg-temp: Fix CPU hotplug callback registration
  octeon, watchdog: Fix CPU hotplug callback registration
  oprofile, nmi-timer: Fix CPU hotplug callback registration
  intel-idle: Fix CPU hotplug callback registration
  clocksource, dummy-timer: Fix CPU hotplug callback registration
  drivers/base/topology.c: Fix CPU hotplug callback registration
  acpi-cpufreq: Fix CPU hotplug callback registration
  zsmalloc: Fix CPU hotplug callback registration
  scsi, fcoe: Fix CPU hotplug callback registration
  scsi, bnx2fc: Fix CPU hotplug callback registration
  scsi, bnx2i: Fix CPU hotplug callback registration
  ...
2014-04-07 14:55:46 -07:00
Linus Torvalds 2d1eb87ae1 Merge branch 'for-linus' of git://ftp.arm.linux.org.uk/~rmk/linux-arm
Pull ARM changes from Russell King:

 - Perf updates from Will Deacon:
   - Support for Qualcomm Krait processors (run perf on your phone!)
   - Support for Cortex-A12 (run perf stat on your FPGA!)
   - Support for perf_sample_event_took, allowing us to automatically decrease
     the sample rate if we can't handle the PMU interrupts quickly enough
     (run perf record on your FPGA!).

 - Basic uprobes support from David Long:
     This patch series adds basic uprobes support to ARM. It is based on
     patches developed earlier by Rabin Vincent. That approach of adding
     hooks into the kprobes instruction parsing code was not well received.
     This approach separates the ARM instruction parsing code in kprobes out
     into a separate set of functions which can be used by both kprobes and
     uprobes. Both kprobes and uprobes then provide their own semantic action
     tables to process the results of the parsing.

 - ARMv7M (microcontroller) updates from Uwe Kleine-König

 - OMAP DMA updates (recently added Vinod's Ack even though they've been
   sitting in linux-next for a few months) to reduce the reliance of
   omap-dma on the code in arch/arm.

 - SA11x0 changes from Dmitry Eremin-Solenikov and Alexander Shiyan

 - Support for Cortex-A12 CPU

 - Align support for ARMv6 with ARMv7 so they can cooperate better in a
   single zImage.

 - Addition of first AT_HWCAP2 feature bits for ARMv8 crypto support.

 - Removal of IRQ_DISABLED from various ARM files

 - Improved efficiency of virt_to_page() for single zImage

 - Patch from Ulf Hansson to permit runtime PM callbacks to be available for
   AMBA devices for suspend/resume as well.

 - Finally kill asm/system.h on ARM.

* 'for-linus' of git://ftp.arm.linux.org.uk/~rmk/linux-arm: (89 commits)
  dmaengine: omap-dma: more consolidation of CCR register setup
  dmaengine: omap-dma: move IRQ handling to omap-dma
  dmaengine: omap-dma: move register read/writes into omap-dma.c
  ARM: omap: dma: get rid of 'p' allocation and clean up
  ARM: omap: move dma channel allocation into plat-omap code
  ARM: omap: dma: get rid of errata global
  ARM: omap: clean up DMA register accesses
  ARM: omap: remove almost-const variables
  ARM: omap: remove references to disable_irq_lch
  dmaengine: omap-dma: cleanup errata 3.3 handling
  dmaengine: omap-dma: provide register read/write functions
  dmaengine: omap-dma: use cached CCR value when enabling DMA
  dmaengine: omap-dma: move barrier to omap_dma_start_desc()
  dmaengine: omap-dma: move clnk_ctrl setting to preparation functions
  dmaengine: omap-dma: improve efficiency loading C.SA/C.EI/C.FI registers
  dmaengine: omap-dma: consolidate clearing channel status register
  dmaengine: omap-dma: move CCR buffering disable errata out of the fast path
  dmaengine: omap-dma: provide register definitions
  dmaengine: omap-dma: consolidate setup of CCR
  dmaengine: omap-dma: consolidate setup of CSDP
  ...
2014-04-05 13:20:43 -07:00
Linus Torvalds 68114e5eb8 Most of the changes were largely clean ups, and some documentation.
But there were a few features that were added.
 
 Uprobes now work with event triggers and multi buffers.
 Uprobes have support under ftrace and perf.
 
 The big feature is that the function tracer can now be used within the
 multi buffer instances. That is, you can now trace some functions
 in one buffer, others in another buffer, all functions in a third buffer
 and so on. They are basically agnostic from each other. This only
 works for the function tracer and not for the function graph trace,
 although you can have the function graph tracer running in the top level
 buffer (or any tracer for that matter) and have different function tracing
 going on in the sub buffers.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJTOthtAAoJEKQekfcNnQGu5c8H/Ana/U+0tmksp1dbHkRHsKSH
 +Fsv4Jeu8gf1NaFKHEhkUTcFtnzE6qAPV2VCrcJwXbhAhhwZm+LjrnWdoy3215S3
 cQW4LftLEonh2cM36Cos74TulMEYN6XmL6dQZV+CILKQkDrWU4qJjQ64okXEkqrd
 9iG3p/mSXyvJcmnyg61ALnMOhZDLsXY3djBhWBPhiTPGS6BRb9zh4Pmw6Zv0n2rJ
 U93Gt/3AQrv1ybu73dUxqP0abp60oXOiWoF/R2jcbKqIM+K9RPJX79unCV3jq3u9
 f+6jMlB9PgAMqQj6ihJdwxKDDuzwyrVdEPnsgvl4jarCBCtVVwhKedBaKN/KS8k=
 =HdXY
 -----END PGP SIGNATURE-----

Merge tag 'trace-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing updates from Steven Rostedt:
 "Most of the changes were largely clean ups, and some documentation.
  But there were a few features that were added:

  Uprobes now work with event triggers and multi buffers and have
  support under ftrace and perf.

  The big feature is that the function tracer can now be used within the
  multi buffer instances.  That is, you can now trace some functions in
  one buffer, others in another buffer, all functions in a third buffer
  and so on.  They are basically agnostic from each other.  This only
  works for the function tracer and not for the function graph trace,
  although you can have the function graph tracer running in the top
  level buffer (or any tracer for that matter) and have different
  function tracing going on in the sub buffers"

* tag 'trace-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (45 commits)
  tracing: Add BUG_ON when stack end location is over written
  tracepoint: Remove unused API functions
  Revert "tracing: Move event storage for array from macro to standalone function"
  ftrace: Constify ftrace_text_reserved
  tracepoints: API doc update to tracepoint_probe_register() return value
  tracepoints: API doc update to data argument
  ftrace: Fix compilation warning about control_ops_free
  ftrace/x86: BUG when ftrace recovery fails
  ftrace: Warn on error when modifying ftrace function
  ftrace: Remove freelist from struct dyn_ftrace
  ftrace: Do not pass data to ftrace_dyn_arch_init
  ftrace: Pass retval through return in ftrace_dyn_arch_init()
  ftrace: Inline the code from ftrace_dyn_table_alloc()
  ftrace: Cleanup of global variables ftrace_new_pgs and ftrace_update_cnt
  tracing: Evaluate len expression only once in __dynamic_array macro
  tracing: Correctly expand len expressions from __dynamic_array macro
  tracing/module: Replace include of tracepoint.h with jump_label.h in module.h
  tracing: Fix event header migrate.h to include tracepoint.h
  tracing: Fix event header writeback.h to include tracepoint.h
  tracing: Warn if a tracepoint is not set via debugfs
  ...
2014-04-03 10:26:31 -07:00
Al Viro fbb32750a6 pipe: kill ->map() and ->unmap()
all pipe_buffer_operations have the same instances of those...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-04-01 23:19:19 -04:00
Linus Torvalds 7a48837732 Merge branch 'for-3.15/core' of git://git.kernel.dk/linux-block
Pull core block layer updates from Jens Axboe:
 "This is the pull request for the core block IO bits for the 3.15
  kernel.  It's a smaller round this time, it contains:

   - Various little blk-mq fixes and additions from Christoph and
     myself.

   - Cleanup of the IPI usage from the block layer, and associated
     helper code.  From Frederic Weisbecker and Jan Kara.

   - Duplicate code cleanup in bio-integrity from Gu Zheng.  This will
     give you a merge conflict, but that should be easy to resolve.

   - blk-mq notify spinlock fix for RT from Mike Galbraith.

   - A blktrace partial accounting bug fix from Roman Pen.

   - Missing REQ_SYNC detection fix for blk-mq from Shaohua Li"

* 'for-3.15/core' of git://git.kernel.dk/linux-block: (25 commits)
  blk-mq: add REQ_SYNC early
  rt,blk,mq: Make blk_mq_cpu_notify_lock a raw spinlock
  blk-mq: support partial I/O completions
  blk-mq: merge blk_mq_insert_request and blk_mq_run_request
  blk-mq: remove blk_mq_alloc_rq
  blk-mq: don't dump CPU -> hw queue map on driver load
  blk-mq: fix wrong usage of hctx->state vs hctx->flags
  blk-mq: allow blk_mq_init_commands() to return failure
  block: remove old blk_iopoll_enabled variable
  blktrace: fix accounting of partially completed requests
  smp: Rename __smp_call_function_single() to smp_call_function_single_async()
  smp: Remove wait argument from __smp_call_function_single()
  watchdog: Simplify a little the IPI call
  smp: Move __smp_call_function_single() below its safe version
  smp: Consolidate the various smp_call_function_single() declensions
  smp: Teach __smp_call_function_single() to check for offline cpus
  smp: Remove unused list_head from csd
  smp: Iterate functions through llist_for_each_entry_safe()
  block: Stop abusing rq->csd.list in blk-softirq
  block: Remove useless IPI struct initialization
  ...
2014-04-01 19:19:15 -07:00
Linus Torvalds 176ab02d49 Merge branch 'x86-asmlinkage-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 LTO changes from Peter Anvin:
 "More infrastructure work in preparation for link-time optimization
  (LTO).  Most of these changes is to make sure symbols accessed from
  assembly code are properly marked as visible so the linker doesn't
  remove them.

  My understanding is that the changes to support LTO are still not
  upstream in binutils, but are on the way there.  This patchset should
  conclude the x86-specific changes, and remaining patches to actually
  enable LTO will be fed through the Kbuild tree (other than keeping up
  with changes to the x86 code base, of course), although not
  necessarily in this merge window"

* 'x86-asmlinkage-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits)
  Kbuild, lto: Handle basic LTO in modpost
  Kbuild, lto: Disable LTO for asm-offsets.c
  Kbuild, lto: Add a gcc-ld script to let run gcc as ld
  Kbuild, lto: add ld-version and ld-ifversion macros
  Kbuild, lto: Drop .number postfixes in modpost
  Kbuild, lto, workaround: Don't warn for initcall_reference in modpost
  lto: Disable LTO for sys_ni
  lto: Handle LTO common symbols in module loader
  lto, workaround: Add workaround for initcall reordering
  lto: Make asmlinkage __visible
  x86, lto: Disable LTO for the x86 VDSO
  initconst, x86: Fix initconst mistake in ts5500 code
  initconst: Fix initconst mistake in dcdbas
  asmlinkage: Make trace_hardirqs_on/off_caller visible
  asmlinkage, x86: Fix 32bit memcpy for LTO
  asmlinkage Make __stack_chk_failed and memcmp visible
  asmlinkage: Mark rwsem functions that can be called from assembler asmlinkage
  asmlinkage: Make main_extable_sort_needed visible
  asmlinkage, mutex: Mark __visible
  asmlinkage: Make trace_hardirq visible
  ...
2014-03-31 14:13:25 -07:00
Linus Torvalds 971eae7c99 Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler changes from Ingo Molnar:
 "Bigger changes:

   - sched/idle restructuring: they are WIP preparation for deeper
     integration between the scheduler and idle state selection, by
     Nicolas Pitre.

   - add NUMA scheduling pseudo-interleaving, by Rik van Riel.

   - optimize cgroup context switches, by Peter Zijlstra.

   - RT scheduling enhancements, by Thomas Gleixner.

  The rest is smaller changes, non-urgnt fixes and cleanups"

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (68 commits)
  sched: Clean up the task_hot() function
  sched: Remove double calculation in fix_small_imbalance()
  sched: Fix broken setscheduler()
  sparc64, sched: Remove unused sparc64_multi_core
  sched: Remove unused mc_capable() and smt_capable()
  sched/numa: Move task_numa_free() to __put_task_struct()
  sched/fair: Fix endless loop in idle_balance()
  sched/core: Fix endless loop in pick_next_task()
  sched/fair: Push down check for high priority class task into idle_balance()
  sched/rt: Fix picking RT and DL tasks from empty queue
  trace: Replace hardcoding of 19 with MAX_NICE
  sched: Guarantee task priority in pick_next_task()
  sched/idle: Remove stale old file
  sched: Put rq's sched_avg under CONFIG_FAIR_GROUP_SCHED
  cpuidle/arm64: Remove redundant cpuidle_idle_call()
  cpuidle/powernv: Remove redundant cpuidle_idle_call()
  sched, nohz: Exclude isolated cores from load balancing
  sched: Fix select_task_rq_fair() description comments
  workqueue: Replace hardcoding of -20 and 19 with MIN_NICE and MAX_NICE
  sys: Replace hardcoding of -20 and 19 with MIN_NICE and MAX_NICE
  ...
2014-03-31 11:21:19 -07:00
Linus Torvalds 8c292f1174 Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf changes from Ingo Molnar:
 "Main changes:

  Kernel side changes:

   - Add SNB/IVB/HSW client uncore memory controller support (Stephane
     Eranian)

   - Fix various x86/P4 PMU driver bugs (Don Zickus)

  Tooling, user visible changes:

   - Add several futex 'perf bench' microbenchmarks (Davidlohr Bueso)

   - Speed up thread map generation (Don Zickus)

   - Introduce 'perf kvm --list-cmds' command line option for use by
     scripts (Ramkumar Ramachandra)

   - Print the evsel name in the annotate stdio output, prep to fix
     support outputting annotation for multiple events, not just for the
     first one (Arnaldo Carvalho de Melo)

   - Allow setting preferred callchain method in .perfconfig (Jiri Olsa)

   - Show in what binaries/modules 'perf probe's are set (Masami
     Hiramatsu)

   - Support distro-style debuginfo for uprobe in 'perf probe' (Masami
     Hiramatsu)

  Tooling, internal changes and fixes:

   - Use tid in mmap/mmap2 events to find maps (Don Zickus)

   - Record the reason for filtering an address_location (Namhyung Kim)

   - Apply all filters to an addr_location (Namhyung Kim)

   - Merge al->filtered with hist_entry->filtered in report/hists
     (Namhyung Kim)

   - Fix memory leak when synthesizing thread records (Namhyung Kim)

   - Use ui__has_annotation() in 'report' (Namhyung Kim)

   - hists browser refactorings to reuse code accross UIs (Namhyung Kim)

   - Add support for the new DWARF unwinder library in elfutils (Jiri
     Olsa)

   - Fix build race in the generation of bison files (Jiri Olsa)

   - Further streamline the feature detection display, trimming it a bit
     to show just the libraries detected, using VF=1 gets a more verbose
     output, showing the less interesting feature checks as well (Jiri
     Olsa).

   - Check compatible symtab type before loading dso (Namhyung Kim)

   - Check return value of filename__read_debuglink() (Stephane Eranian)

   - Move some hashing and fs related code from tools/perf/util/ to
     tools/lib/ so that it can be used by more tools/ living utilities
     (Borislav Petkov)

   - Prepare DWARF unwinding code for using an elfutils alternative
     unwinding library (Jiri Olsa)

   - Fix DWARF unwind max_stack processing (Jiri Olsa)

   - Add dwarf unwind 'perf test' entry (Jiri Olsa)

   - 'perf probe' improvements including memory leak fixes, sharing the
     intlist class with other tools, uprobes/kprobes code sharing and
     use of ref_reloc_sym (Masami Hiramatsu)

   - Shorten sample symbol resolving by adding cpumode to struct
     addr_location (Arnaldo Carvalho de Melo)

   - Fix synthesizing mmaps for threads (Don Zickus)

   - Fix invalid output on event group stdio report (Namhyung Kim)

   - Fixup header alignment in 'perf sched latency' output (Ramkumar
     Ramachandra)

   - Fix off-by-one error in 'perf timechart record' argv handling
     (Ramkumar Ramachandra)

  Tooling, cleanups:

   - Remove unused thread__find_map function (Jiri Olsa)

   - Remove unused simple_strtoul() function (Ramkumar Ramachandra)

  Tooling, documentation updates:

   - Update function names in debug messages (Ramkumar Ramachandra)

   - Update some code references in design.txt (Ramkumar Ramachandra)

   - Clarify load-latency information in the 'perf mem' docs (Andi
     Kleen)

   - Clarify x86 register naming in 'perf probe' docs (Andi Kleen)"

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (96 commits)
  perf tools: Remove unused simple_strtoul() function
  perf tools: Update some code references in design.txt
  perf evsel: Update function names in debug messages
  perf tools: Remove thread__find_map function
  perf annotate: Print the evsel name in the stdio output
  perf report: Use ui__has_annotation()
  perf tools: Fix memory leak when synthesizing thread records
  perf tools: Use tid in mmap/mmap2 events to find maps
  perf report: Merge al->filtered with hist_entry->filtered
  perf symbols: Apply all filters to an addr_location
  perf symbols: Record the reason for filtering an address_location
  perf sched: Fixup header alignment in 'latency' output
  perf timechart: Fix off-by-one error in 'record' argv handling
  perf machine: Factor machine__find_thread to take tid argument
  perf tools: Speed up thread map generation
  perf kvm: introduce --list-cmds for use by scripts
  perf ui hists: Pass evsel to hpp->header/width functions explicitly
  perf symbols: Introduce thread__find_cpumode_addr_location
  perf session: Change header.misc dump from decimal to hex
  perf ui/tui: Reuse generic __hpp__fmt() code
  ...
2014-03-31 11:13:25 -07:00
Steven Rostedt (Red Hat) 2c4a33aba5 tracing: Fix traceon trigger condition to actually turn tracing on
While working on my tutorial for 2014 Linux Collaboration Summit
I found that the traceon trigger did not work when conditions were
used. The other triggers worked fine though. Looking into it, it
is because of the way the triggers use the ring buffer to store
the fields it will use for the condition. But if tracing is off, nothing
is stored in the buffer, and the tracepoint exits before calling the
trigger to test the condition. This is fine for all the triggers that
only work when tracing is on, but for traceon trigger that is to
work when tracing is off, nothing happens.

The fix is simple, just use a temp ring buffer to record the event
if tracing is off and the event has a trace event conditional trigger
enabled. The rest of the tracepoint code will work just fine, but
the tracepoint wont be recorded in the other buffers.

Cc: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-03-25 23:39:41 -04:00
Aaron Tomlin 3862807880 tracing: Add BUG_ON when stack end location is over written
It is difficult to detect a stack overrun when it
actually occurs.

We have observed that this type of corruption is often
silent and can go unnoticed. Once the corrupted region
is examined, the outcome is undefined and often
results in sporadic system crashes.

When the stack tracing feature is enabled, let's check
for this condition and take appropriate action.

Note: init_task doesn't get its stack end location
set to STACK_END_MAGIC.

Link: http://lkml.kernel.org/r/1395669837-30209-1-git-send-email-atomlin@redhat.com

Signed-off-by: Aaron Tomlin <atomlin@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-03-24 10:39:11 -04:00
Steven Rostedt (Red Hat) bc4c426ee2 Revert "tracing: Move event storage for array from macro to standalone function"
I originally wrote commit 35bb4399bd to shrink the size of the overhead of
tracepoints by several kilobytes. Later, I received a patch from Vaibhav
Nagarnaik that fixed a bug in the same code that this commit touches. Not
only did it fix a bug, it also removed code and shrunk the size of the
overhead of trace events even more than this commit did.

Since this commit is scheduled for 3.15 and Vaibhav's patch is already in
mainline, I need to revert this patch in order to keep it from conflicting
with Vaibhav's patch. Not to mention, Vaibhav's patch makes this patch
obsolete.

Link: http://lkml.kernel.org/r/20140320225637.0226041b@gandalf.local.home

Cc: Vaibhav Nagarnaik <vnagarnaik@google.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-03-21 13:11:41 -04:00
Vaibhav Nagarnaik 87291347c4 tracing: Fix array size mismatch in format string
In event format strings, the array size is reported in two locations.
One in array subscript and then via the "size:" attribute. The values
reported there have a mismatch.

For e.g., in sched:sched_switch the prev_comm and next_comm character
arrays have subscript values as [32] where as the actual field size is
16.

name: sched_switch
ID: 301
format:
        field:unsigned short common_type;       offset:0;       size:2; signed:0;
        field:unsigned char common_flags;       offset:2;       size:1; signed:0;
        field:unsigned char common_preempt_count;       offset:3;       size:1;signed:0;
        field:int common_pid;   offset:4;       size:4; signed:1;

        field:char prev_comm[32];       offset:8;       size:16;        signed:1;
        field:pid_t prev_pid;   offset:24;      size:4; signed:1;
        field:int prev_prio;    offset:28;      size:4; signed:1;
        field:long prev_state;  offset:32;      size:8; signed:1;
        field:char next_comm[32];       offset:40;      size:16;        signed:1;
        field:pid_t next_pid;   offset:56;      size:4; signed:1;
        field:int next_prio;    offset:60;      size:4; signed:1;

After bisection, the following commit was blamed:
92edca0 tracing: Use direct field, type and system names

This commit removes the duplication of strings for field->name and
field->type assuming that all the strings passed in
__trace_define_field() are immutable. This is not true for arrays, where
the type string is created in event_storage variable and field->type for
all array fields points to event_storage.

Use __stringify() to create a string constant for the type string.

Also, get rid of event_storage and event_storage_mutex that are not
needed anymore.

also, an added benefit is that this reduces the overhead of events a bit more:

   text    data     bss     dec     hex filename
8424787 2036472 1302528 11763787         b3804b vmlinux
8420814 2036408 1302528 11759750         b37086 vmlinux.patched

Link: http://lkml.kernel.org/r/1392349908-29685-1-git-send-email-vnagarnaik@google.com

Cc: Laurent Chavey <chavey@google.com>
Cc: stable@vger.kernel.org # 3.10+
Signed-off-by: Vaibhav Nagarnaik <vnagarnaik@google.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-03-20 13:21:05 -04:00
Srivatsa S. Bhat d39ad278a3 trace, ring-buffer: Fix CPU hotplug callback registration
Subsystems that want to register CPU hotplug callbacks, as well as perform
initialization for the CPUs that are already online, often do it as shown
below:

	get_online_cpus();

	for_each_online_cpu(cpu)
		init_cpu(cpu);

	register_cpu_notifier(&foobar_cpu_notifier);

	put_online_cpus();

This is wrong, since it is prone to ABBA deadlocks involving the
cpu_add_remove_lock and the cpu_hotplug.lock (when running concurrently
with CPU hotplug operations).

Instead, the correct and race-free way of performing the callback
registration is:

	cpu_notifier_register_begin();

	for_each_online_cpu(cpu)
		init_cpu(cpu);

	/* Note the use of the double underscored version of the API */
	__register_cpu_notifier(&foobar_cpu_notifier);

	cpu_notifier_register_done();

Fix the tracing ring-buffer code by using this latter form of callback
registration.

Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-03-20 13:43:48 +01:00
David A. Long 09294e31b1 uprobes: Kconfig dependency fix
Suggested change from Oleg Nesterov. Fixes incomplete dependencies
for uprobes feature.

Signed-off-by: David A. Long <dave.long@linaro.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>
2014-03-18 16:39:33 -04:00
Sasha Levin d88471cb8b ftrace: Constify ftrace_text_reserved
Link: http://lkml.kernel.org/r/1357772960-4436-5-git-send-email-sasha.levin@oracle.com

Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-03-11 22:52:43 -04:00
Jiri Slaby db0fbadcbd ftrace: Fix compilation warning about control_ops_free
With CONFIG_DYNAMIC_FTRACE=n, I see a warning:
kernel/trace/ftrace.c:240:13: warning: 'control_ops_free' defined but not used
 static void control_ops_free(struct ftrace_ops *ops)
             ^
Move that function around to an already existing #ifdef
CONFIG_DYNAMIC_FTRACE block as the function is used solely from the
dynamic function tracing functions.

Link: http://lkml.kernel.org/r/1394484131-5107-1-git-send-email-jslaby@suse.cz

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-03-11 19:38:20 -04:00
Jiri Olsa 63c45f4ba5 perf: Disallow user-space stack dumps for function trace events
Recent issues with user space callchains processing within
page fault handler tracing showed as Peter said 'there's
just too much fail surface'.

The user space stack dump is just another source of the this issue.

Related list discussions:
  http://marc.info/?t=139302086500001&r=1&w=2
  http://marc.info/?t=139301437300003&r=1&w=2

Suggested-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1393775800-13524-3-git-send-email-jolsa@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-03-11 11:57:58 +01:00
Jiri Olsa cfa77bc4af perf: Disallow user-space callchains for function trace events
Recent issues with user space callchains processing within
page fault handler tracing showed as Peter said 'there's
just too much fail surface'.

Related list discussions:

  http://marc.info/?t=139302086500001&r=1&w=2
  http://marc.info/?t=139301437300003&r=1&w=2

Suggested-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1393775800-13524-2-git-send-email-jolsa@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-03-11 11:57:57 +01:00
Ingo Molnar a02ed5e3e0 Merge branch 'sched/urgent' into sched/core
Pick up fixes before queueing up new changes.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-03-11 11:34:27 +01:00
Petr Mladek cd21067f69 ftrace: Warn on error when modifying ftrace function
We should print some warning and kill ftrace functionality when the ftrace
function is not set correctly. Otherwise, ftrace might do crazy things without
an explanation. The error value has been ignored so far.

Note that an error that happens during updating all the traced calls is handled
in ftrace_replace_code(). We print more details about the particular
failing address via ftrace_bug() there.

Link: http://lkml.kernel.org/r/1393258342-29978-3-git-send-email-pmladek@suse.cz

Signed-off-by: Petr Mladek <pmladek@suse.cz>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-03-07 10:06:15 -05:00
Jiri Slaby 3a36cb11ca ftrace: Do not pass data to ftrace_dyn_arch_init
As the data parameter is not really used by any ftrace_dyn_arch_init,
remove that from ftrace_dyn_arch_init. This also removes the addr
local variable from ftrace_init which is now unused.

Note the documentation was imprecise as it did not suggest to set
(*data) to 0.

Link: http://lkml.kernel.org/r/1393268401-24379-4-git-send-email-jslaby@suse.cz

Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: linux-arch@vger.kernel.org
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-03-07 10:06:14 -05:00
Jiri Slaby af64a7cb09 ftrace: Pass retval through return in ftrace_dyn_arch_init()
No architecture uses the "data" parameter in ftrace_dyn_arch_init() in any
way, it just sets the value to 0. And this is used as a return value
in the caller -- ftrace_init, which just checks the retval against
zero.

Note there is also "return 0" in every ftrace_dyn_arch_init.  So it is
enough to check the retval and remove all the indirect sets of data on
all archs.

Link: http://lkml.kernel.org/r/1393268401-24379-3-git-send-email-jslaby@suse.cz

Cc: linux-arch@vger.kernel.org
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-03-07 10:06:13 -05:00
Jiri Slaby c867ccd838 ftrace: Inline the code from ftrace_dyn_table_alloc()
The function used to do allocations some time ago. This no longer
happens and it only checks the count and prints some info. This patch
inlines the body to the only caller. There are two reasons:
* the name of the function was misleading
* it's clear what is going on in ftrace_init now

Link: http://lkml.kernel.org/r/1393268401-24379-2-git-send-email-jslaby@suse.cz

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-03-07 10:06:12 -05:00
Jiri Slaby 1dc43cf0be ftrace: Cleanup of global variables ftrace_new_pgs and ftrace_update_cnt
Some of them can be local to functions, so make them local and pass
them as parameters where needed:
* __start_mcount_loc+__stop_mcount_loc are local to ftrace_init
* ftrace_new_pgs -> new_pgs/start_pg
* ftrace_update_cnt -> local update_cnt in ftrace_update_code

Link: http://lkml.kernel.org/r/1393268401-24379-1-git-send-email-jslaby@suse.cz

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-03-07 10:06:12 -05:00
Steven Rostedt 3fd40d1ee6 tracing: Use helper functions in event assignment to shrink macro size
The functions that assign the contents for the ftrace events are
defined by the TRACE_EVENT() macros. Each event has its own unique
way to assign data to its buffer. When you have over 500 events,
that means there's 500 functions assigning data uniquely for each
event (not really that many, as DECLARE_EVENT_CLASS() and multiple
DEFINE_EVENT()s will only need a single function).

By making helper functions in the core kernel to do some of the work
instead, we can shrink the size of the kernel down a bit.

With a kernel configured with 502 events, the change in size was:

   text    data     bss     dec     hex filename
12987390        1913504 9785344 24686238        178ae9e /tmp/vmlinux
12959102        1913504 9785344 24657950        178401e /tmp/vmlinux.patched

That's a total of 28288 bytes, which comes down to 56 bytes per event.

Link: http://lkml.kernel.org/r/20120810034708.370808175@goodmis.org

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-03-07 10:06:07 -05:00
Steven Rostedt 35bb4399bd tracing: Move event storage for array from macro to standalone function
The code that shows array fields for events is defined for all events.
This can add up quite a bit when you have over 500 events.

By making helper functions in the core kernel to do the work
instead, we can shrink the size of the kernel down a bit.

With a kernel configured with 502 events, the change in size was:

   text    data     bss     dec     hex filename
12990946        1913568 9785344 24689858        178bcc2 /tmp/vmlinux
12987390        1913504 9785344 24686238        178ae9e /tmp/vmlinux.patched

That's a total of 3556 bytes, which comes down to 7 bytes per event.
Although it's not much, this code is just called at initialization of
the events.

Link: http://lkml.kernel.org/r/20120810034708.084036335@goodmis.org

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-03-07 10:06:06 -05:00
Steven Rostedt 1d6bae966e tracing: Move raw output code from macro to standalone function
The code for trace events to format the raw recorded event data
into human readable format in the 'trace' file is repeated for every
event in the system. When you have over 500 events, this can add up
quite a bit.

By making helper functions in the core kernel to do the work
instead, we can shrink the size of the kernel down a bit.

With a kernel configured with 502 events, the change in size was:

   text    data     bss     dec     hex filename
12991007        1913568 9785344 24689919        178bcff /tmp/vmlinux.orig
12990946        1913568 9785344 24689858        178bcc2 /tmp/vmlinux.patched

Note, this version does not save as much as the version of this patch
I had a few years ago. That is because in the mean time, commit
f71130de5c ("tracing: Add a helper function for event print functions")
did a lot of the work my original patch did. But this change helps
slightly, and is part of a larger clean up to reduce the size much further.

Link: http://lkml.kernel.org/r/20120810034707.378538034@goodmis.org

Cc: Li Zefan <lizefan@huawei.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-03-07 10:06:05 -05:00
Roman Pen af5040da01 blktrace: fix accounting of partially completed requests
trace_block_rq_complete does not take into account that request can
be partially completed, so we can get the following incorrect output
of blkparser:

  C   R 232 + 240 [0]
  C   R 240 + 232 [0]
  C   R 248 + 224 [0]
  C   R 256 + 216 [0]

but should be:

  C   R 232 + 8 [0]
  C   R 240 + 8 [0]
  C   R 248 + 8 [0]
  C   R 256 + 8 [0]

Also, the whole output summary statistics of completed requests and
final throughput will be incorrect.

This patch takes into account real completion size of the request and
fixes wrong completion accounting.

Signed-off-by: Roman Pen <r.peniaev@gmail.com>
CC: Steven Rostedt <rostedt@goodmis.org>
CC: Frederic Weisbecker <fweisbec@gmail.com>
CC: Ingo Molnar <mingo@redhat.com>
CC: linux-kernel@vger.kernel.org
Cc: stable@kernel.org
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-03-05 16:11:21 -07:00
Steven Rostedt (Red Hat) 45ab2813d4 tracing: Do not add event files for modules that fail tracepoints
If a module fails to add its tracepoints due to module tainting, do not
create the module event infrastructure in the debugfs directory. As the events
will not work and worse yet, they will silently fail, making the user wonder
why the events they enable do not display anything.

Having a warning on module load and the events not visible to the users
will make the cause of the problem much clearer.

Link: http://lkml.kernel.org/r/20140227154923.265882695@goodmis.org

Fixes: 6d723736e4 "tracing/events: add support for modules to TRACE_EVENT"
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: stable@vger.kernel.org # 2.6.31+
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-03-03 21:11:05 -05:00
Dongsheng Yang 2b3942e4bb trace: Replace hardcoding of 19 with MAX_NICE
Use MAX_NICE instead of the value 19 for ring_buffer_benchmark.

Signed-off-by: Dongsheng Yang <yangds.fnst@cn.fujitsu.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/1393251121-25534-1-git-send-email-yangds.fnst@cn.fujitsu.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-02-27 12:41:03 +01:00
Steven Rostedt (Red Hat) 1fcc155351 ftrace: Have static function trace clear ENABLED flag on unregister
The ENABLED flag needs to be cleared when a ftrace_ops is unregistered
otherwise it wont be able to be registered again.

This is only for static tracing and does not affect DYNAMIC_FTRACE at
all.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-02-20 12:32:55 -05:00
Steven Rostedt e1e232ca6b tracing: Add trace_clock=<clock> kernel parameter
Being able to change the trace clock at boot can be advantageous if
you need a better source of when things happen across CPUs. The default
trace clock is the fastest, but it uses local clocks which may not be
synced across CPUs and it does not let you know when events took place
with respect to events on other CPUs.

The global trace clock can help in this case, and if you do not care
about timings, the counter "clock" is the best, as that is just a  simple
atomic counter that is incremented for every event.

Usage is to add "trace_clock=counter" on the kernel command line. You
can replace counter with "global" or any of the clocks listed in
/sys/kernel/debug/tracing/trace_clock

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Appreciated-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-02-20 12:32:54 -05:00
Namhyung Kim 43fe98913c tracing/uprobes: Support mix of ftrace and perf
It seems there's no reason to prevent mixed used of ftrace and perf
for a single uprobe event.  At least the kprobes already support it.

Link: http://lkml.kernel.org/r/1389946120-19610-6-git-send-email-namhyung@kernel.org

Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: zhangwei(Jovi) <jovi.zhangwei@huawei.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-02-20 12:30:11 -05:00
Namhyung Kim ca3b162021 tracing/uprobes: Support event triggering
Add support for event triggering to uprobes.  This is same as kprobes
support added by Tom (plus cleanup by Steven).

Link: http://lkml.kernel.org/r/1389946120-19610-5-git-send-email-namhyung@kernel.org

Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: zhangwei(Jovi) <jovi.zhangwei@huawei.com>
Cc: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-02-20 12:30:10 -05:00
zhangwei(Jovi) 70ed91c6ec tracing/uprobes: Support ftrace_event_file base multibuffer
Support multi-buffer on uprobe-based dynamic events by
using ftrace_event_file.

This patch is based kprobe-based dynamic events multibuffer
support work initially, commited by Masami(commit 41a7dd420c),
but revised as below:

Oleg changed the kprobe-based multibuffer design from
array-pointers of ftrace_event_file into simple list,
so this patch also change to the list design.

rcu_read_lock/unlock added into uprobe_trace_func/uretprobe_trace_func,
to synchronize with ftrace_event_file list add and delete.

Even though we allow multi-uprobes instances now,
but TP_FLAG_PROFILE/TP_FLAG_TRACE are still mutually exclusive
in probe_event_enable currently, this means we cannot allow
one user is using uprobe-tracer, and another user is using
perf-probe on same uprobe concurrently.
(Perhaps this will be fix in future, kprobe don't have this
limitation now)

Link: http://lkml.kernel.org/r/1389946120-19610-4-git-send-email-namhyung@kernel.org

Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: zhangwei(Jovi) <jovi.zhangwei@huawei.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-02-20 12:30:09 -05:00
Namhyung Kim dd9fa555d7 tracing/uprobes: Move argument fetching to uprobe_dispatcher()
A single uprobe event might serve different users like ftrace and
perf.  And this is especially important for upcoming multi buffer
support.  But in this case it'll fetch (same) data from userspace
multiple times.  So move it to the beginning of the dispatcher
function and reuse it for each users.

Link: http://lkml.kernel.org/r/1389946120-19610-3-git-send-email-namhyung@kernel.org

Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: zhangwei(Jovi) <jovi.zhangwei@huawei.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-02-20 12:30:09 -05:00
Namhyung Kim a43b970430 tracing/uprobes: Rename uprobe_{trace,perf}_print() functions
The uprobe_{trace,perf}_print functions are misnomers since what they
do is not printing.  There's also a real print function named
print_uprobe_event() so they'll only increase confusion IMHO.

Rename them with double underscores to follow convention of kprobe.

Link: http://lkml.kernel.org/r/1389946120-19610-2-git-send-email-namhyung@kernel.org

Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-02-20 12:30:08 -05:00
Steven Rostedt (Red Hat) 591dffdade ftrace: Allow for function tracing instance to filter functions
Create a "set_ftrace_filter" and "set_ftrace_notrace" files in the instance
directories to let users filter of functions to trace for the given instance.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-02-20 12:29:07 -05:00
Steven Rostedt (Red Hat) e3b3e2e847 ftrace: Pass in global_ops for use with filtering files
In preparation for having the function tracing instances be able to
filter on functions, the generic filter functions must first be
converted to take in the global_ops as a parameter.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-02-20 12:13:19 -05:00
Steven Rostedt (Red Hat) f20a580627 ftrace: Allow instances to use function tracing
Allow instances (sub-buffers) to enable function tracing.
Each instance will have its own function tracing capability.
For now, instances will not have function stack tracing, or will
they be able to pick and choose what functions they can trace.

Picking and choosing their own functions will come later.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-02-20 12:13:18 -05:00
Steven Rostedt (Red Hat) 50512ab576 tracing: Convert tracer->enabled to counter
As tracers will soon be used by instances, the tracer enabled field
needs to be converted to a counter instead of a boolean.
This counter is protected by the trace_types_lock mutex.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-02-20 12:13:17 -05:00
Steven Rostedt (Red Hat) 6b450d2533 tracing: Disable tracers before deletion of instance
When an instance is about to be deleted, make sure the tracer
is set to nop. If it isn't reset the tracer and set it to the nop
tracer, otherwise memory leaks and bad pointers may result.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-02-20 12:13:16 -05:00
Steven Rostedt (Red Hat) e6435e96ec ftrace: Copy ops private to global_ops private
If global_ops function is being called directly, instead of the global_ops
list function, set the global_ops private to be the same as the ops private
that's being called directly.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-02-20 12:13:14 -05:00
Steven Rostedt (Red Hat) f1b21c9a40 tracing: Only let top level have option files
Currently, only the top level instance can have tracing options.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-02-20 12:13:11 -05:00
Steven Rostedt (Red Hat) 607e2ea167 tracing: Set up infrastructure to allow tracers for instances
Currently the tracers (function, function_graph, irqsoff, etc) can only
be used by the top level tracing directory (not for instances).

This sets up the infrastructure to allow instances to be able to
run a separate tracer apart from the what the top level tracing is
doing.

As tracers need to adapt for being used by instances, the tracers
must flag if they can be used by instances or not. Currently only the
'nop' tracer can be used by all instances.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-02-20 12:13:10 -05:00
Steven Rostedt (Red Hat) bf6065b5c7 tracing: Pass trace_array to flag_changed callback
As options (flags) may affect instances instead of being global
the flag_changed() callbacks need to receive the trace_array descriptor
of the instance they will be modifying.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-02-20 12:13:08 -05:00
Steven Rostedt (Red Hat) 8c1a49aedb tracing: Pass trace_array to set_flag callback
As options (flags) may affect instances instead of being global
the set_flag() callbacks need to receive the trace_array descriptor
of the instance they will be modifying.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-02-20 12:13:07 -05:00
Andi Kleen 285c00adf6 asmlinkage: Make trace_hardirqs_on/off_caller visible
These functions are called from assembler, and thus need to be
__visible.

Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Link: http://lkml.kernel.org/r/1391845930-28580-12-git-send-email-ak@linux.intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-02-13 18:14:54 -08:00
Steven Rostedt (Red Hat) d651aa1d68 ring-buffer: Fix first commit on sub-buffer having non-zero delta
Each sub-buffer (buffer page) has a full 64 bit timestamp. The events on
that page use a 27 bit delta against that timestamp in order to save on
bits written to the ring buffer. If the time between events is larger than
what the 27 bits can hold, a "time extend" event is added to hold the
entire 64 bit timestamp again and the events after that hold a delta from
that timestamp.

As a "time extend" is always paired with an event, it is logical to just
allocate the event with the time extend, to make things a bit more efficient.

Unfortunately, when the pairing code was written, it removed the "delta = 0"
from the first commit on a page, causing the events on the page to be
slightly skewed.

Fixes: 69d1b839f7 "ring-buffer: Bind time extend and data events together"
Cc: stable@vger.kernel.org # 2.6.37+
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-02-11 13:38:54 -05:00
Linus Torvalds f568849eda Merge branch 'for-3.14/core' of git://git.kernel.dk/linux-block
Pull core block IO changes from Jens Axboe:
 "The major piece in here is the immutable bio_ve series from Kent, the
  rest is fairly minor.  It was supposed to go in last round, but
  various issues pushed it to this release instead.  The pull request
  contains:

   - Various smaller blk-mq fixes from different folks.  Nothing major
     here, just minor fixes and cleanups.

   - Fix for a memory leak in the error path in the block ioctl code
     from Christian Engelmayer.

   - Header export fix from CaiZhiyong.

   - Finally the immutable biovec changes from Kent Overstreet.  This
     enables some nice future work on making arbitrarily sized bios
     possible, and splitting more efficient.  Related fixes to immutable
     bio_vecs:

        - dm-cache immutable fixup from Mike Snitzer.
        - btrfs immutable fixup from Muthu Kumar.

  - bio-integrity fix from Nic Bellinger, which is also going to stable"

* 'for-3.14/core' of git://git.kernel.dk/linux-block: (44 commits)
  xtensa: fixup simdisk driver to work with immutable bio_vecs
  block/blk-mq-cpu.c: use hotcpu_notifier()
  blk-mq: for_each_* macro correctness
  block: Fix memory leak in rw_copy_check_uvector() handling
  bio-integrity: Fix bio_integrity_verify segment start bug
  block: remove unrelated header files and export symbol
  blk-mq: uses page->list incorrectly
  blk-mq: use __smp_call_function_single directly
  btrfs: fix missing increment of bi_remaining
  Revert "block: Warn and free bio if bi_end_io is not set"
  block: Warn and free bio if bi_end_io is not set
  blk-mq: fix initializing request's start time
  block: blk-mq: don't export blk_mq_free_queue()
  block: blk-mq: make blk_sync_queue support mq
  block: blk-mq: support draining mq queue
  dm cache: increment bi_remaining when bi_end_io is restored
  block: fixup for generic bio chaining
  block: Really silence spurious compiler warnings
  block: Silence spurious compiler warnings
  block: Kill bio_pair_split()
  ...
2014-01-30 11:19:05 -08:00
Linus Torvalds ba635f8cd2 The first two patches fix the debugfs README file to reflect better
the new features added to 3.14.
 
 The third patch is a minor bugfix to the trace_puts() functions that
 will crash the system if a developer adds one before the tracing system
 is setup. It also affects trace_printk() if it has no arguments, as
 the code will convert it to a trace_puts() as well. Note, this bug
 will not affect unmodified kernels, as trace_printk() and trace_puts()
 should only be used by developers for testing.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.15 (GNU/Linux)
 
 iQEcBAABAgAGBQJS5oAkAAoJEKQekfcNnQGuOfkH/1ScwoRslERCCjhPEZu958bG
 VugNvliB/uDwq3qkmBcncMHwqkFBgJay9ieah+JFeK6x3G6mO/uf+UhN5cmLzVBh
 vA5dKR2zW6WTxKYSvXYapD7OzC7R+V6CLvpMl5WMZ1t3ESQhR6gUrpPdigecBWs9
 015rMVA2xXjNnHNDM1nem1PhnMba78A/N98lFErYGpvVBjkCpB8mSt8adds9bYp8
 a83P6tGUfXvsYO3sOqqBnOOqcfzCjjJbmr94v/F+SmLyuxuV0tVzBkIwXkKTNhEK
 bQZj9Pfe+1oIkldXxstn5jWaOvI6RlAGW4b4qXt30AgrGRSyEo5xpvfLfyNUif4=
 =N3kq
 -----END PGP SIGNATURE-----

Merge tag 'trace-fixes-3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing fixes from Steven Rostedt:
 "The first two patches fix the debugfs README file to reflect better
  the new features added to 3.14.

  The third patch is a minor bugfix to the trace_puts() functions that
  will crash the system if a developer adds one before the tracing
  system is setup.  It also affects trace_printk() if it has no
  arguments, as the code will convert it to a trace_puts() as well.

  Note, this bug will not affect unmodified kernels, as trace_printk()
  and trace_puts() should only be used by developers for testing"

* tag 'trace-fixes-3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Check if tracing is enabled in trace_puts()
  tracing: Fix formatting of trace README file
  tracing/README: Add event file usage to tracing mini-HOWTO
2014-01-27 08:22:30 -08:00
Steven Rostedt (Red Hat) 3132e107d6 tracing: Check if tracing is enabled in trace_puts()
If trace_puts() is used very early in boot up, it can crash the machine
if it is called before the ring buffer is allocated. If a trace_printk()
is used with no arguments, then it will be converted into a trace_puts()
and suffer the same fate.

Cc: stable@vger.kernel.org # 3.10+
Fixes: 09ae72348e "tracing: Add trace_puts() for even faster trace_printk() tracing"
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-01-23 12:27:59 -05:00
Steven Rostedt (Red Hat) 71485c4589 tracing: Fix formatting of trace README file
Fix the formatting of the README file in the trace debugfs to fit in
an 80 character window.

Also add a comment about the event trigger counter with regards to
traceon and traceoff.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-01-23 00:10:04 -05:00
Tom Zanussi 26f255646e tracing/README: Add event file usage to tracing mini-HOWTO
It would be useful to have a cheat-sheet for everything under
tracing/events/ alongside the existing text describing the other files
in the tracing/ dir.

Add short descriptions of the directories and files under events/
along with examples, similar to the existing text for the other files
in tracing/.

Also clean up a few minor alignment problems noticed when adding the
new text.

Link: http://lkml.kernel.org/r/1389993104.3040.445.camel@empanada

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-01-22 23:06:57 -05:00
Linus Torvalds 60eaa0190f This pull request has a new feature to ftrace, namely the trace event
triggers by Tom Zanussi. A trigger is a way to enable an action when an
 event is hit. The actions are:
 
  o  trace on/off - enable or disable tracing
  o  snapshot     - save the current trace buffer in the snapshot
  o  stacktrace   - dump the current stack trace to the ringbuffer
  o  enable/disable events - enable or disable another event
 
 Namhyung Kim added updates to the tracing uprobes code. Having the
 uprobes add support for fetch methods.
 
 The rest are various bug fixes with the new code, and minor ones for
 the old code.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.15 (GNU/Linux)
 
 iQEcBAABAgAGBQJS3Z9fAAoJEKQekfcNnQGuFf0H/0CteaN+BJjpif6Tnxia15Sp
 pcftzU0lgqfNzsfitmbjiVTgXWqCghoZo8UI9tQZvBZ9wmDIxeXQR73uoBgVlSCQ
 ovyBO/R8r+lq+7EsDCwntZvrLbcdn6s/jzoruRvt7r35ghK5pH81DNR1BOzTQBhW
 x+361Xtc13aok7N7JN8KR96VDUP9f8KU6PWqJ5lgS2Zl+wbVw6b0p8OV8IMCHczP
 MdYrx8y4Jv4QWW7rMShAAVBe9qJQ56JWiWA17ysa4kY8BkKQ7QtlEFr+r1YY0nX5
 67brXiL8u0NFzRx5y2VRpGc25BbImnVBFpoLQ5Itluq9OdZE3aOQubzXlY70R6g=
 =Hkho
 -----END PGP SIGNATURE-----

Merge tag 'trace-3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing updates from Steven Rostedt:
 "This pull request has a new feature to ftrace, namely the trace event
  triggers by Tom Zanussi.  A trigger is a way to enable an action when
  an event is hit.  The actions are:

   o  trace on/off - enable or disable tracing
   o  snapshot     - save the current trace buffer in the snapshot
   o  stacktrace   - dump the current stack trace to the ringbuffer
   o  enable/disable events - enable or disable another event

  Namhyung Kim added updates to the tracing uprobes code.  Having the
  uprobes add support for fetch methods.

  The rest are various bug fixes with the new code, and minor ones for
  the old code"

* tag 'trace-3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (38 commits)
  tracing: Fix buggered tee(2) on tracing_pipe
  tracing: Have trace buffer point back to trace_array
  ftrace: Fix synchronization location disabling and freeing ftrace_ops
  ftrace: Have function graph only trace based on global_ops filters
  ftrace: Synchronize setting function_trace_op with ftrace_trace_function
  tracing: Show available event triggers when no trigger is set
  tracing: Consolidate event trigger code
  tracing: Fix counter for traceon/off event triggers
  tracing: Remove double-underscore naming in syscall trigger invocations
  tracing/kprobes: Add trace event trigger invocations
  tracing/probes: Fix build break on !CONFIG_KPROBE_EVENT
  tracing/uprobes: Add @+file_offset fetch method
  uprobes: Allocate ->utask before handler_chain() for tracing handlers
  tracing/uprobes: Add support for full argument access methods
  tracing/uprobes: Fetch args before reserving a ring buffer
  tracing/uprobes: Pass 'is_return' to traceprobe_parse_probe_arg()
  tracing/probes: Implement 'memory' fetch method for uprobes
  tracing/probes: Add fetch{,_size} member into deref fetch method
  tracing/probes: Move 'symbol' fetch method to kprobes
  tracing/probes: Implement 'stack' fetch method for uprobes
  ...
2014-01-22 16:35:21 -08:00
Al Viro 92fdd98cf8 tracing: Fix buggered tee(2) on tracing_pipe
In kernel/trace/trace.c we have this:
static void tracing_pipe_buf_release(struct pipe_inode_info *pipe,
                                     struct pipe_buffer *buf)
{
        __free_page(buf->page);
}
static const struct pipe_buf_operations tracing_pipe_buf_ops = {
        .can_merge              = 0,
        .map                    = generic_pipe_buf_map,
        .unmap                  = generic_pipe_buf_unmap,
        .confirm                = generic_pipe_buf_confirm,
        .release                = tracing_pipe_buf_release,
        .steal                  = generic_pipe_buf_steal,
        .get                    = generic_pipe_buf_get,
};
with
void generic_pipe_buf_get(struct pipe_inode_info *pipe, struct pipe_buffer *buf)
{
        page_cache_get(buf->page);
}

and I don't see anything that would've prevented tee(2) called on the pipe
that got stuff spliced into it from that sucker.  ->ops->get() will be
called, then buf gets copied into target pipe's ->bufs[] and eventually
readers get to both copies of the buffer.  With
	get_page(page)
	look at that page
	__free_page(page)
	look at that page
	__free_page(page)
which is not a good thing, to put it mildly.  AFAICS, that ought to use
the normal generic_pipe_buf_release() (aka page_cache_release(buf->page)),
shouldn't it?

[
 SDR - As trace_pipe just allocates the page with alloc_page(GFP_KERNEL),
  and doesn't do anything special with it (no LRU logic). The __free_page()
  should be fine, as it wont actually free a page with reference count.
  Maybe there's a chance to leak memory? Anyway, This change is at a minimum
  good for being symmetric with generic_pipe_buf_get, it is fine to add.
]

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
[ SDR - Removed no longer used tracing_pipe_buf_release ]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-01-19 16:53:13 -05:00
Steven Rostedt (Red Hat) dced341b2d tracing: Have trace buffer point back to trace_array
The trace buffer has a descriptor pointer that goes back to the trace
array. But it was never assigned. Luckily, nothing uses it (yet), but
it will in the future.

Although nothing currently uses this, if any of the new features get
backported to older kernels, and because this is such a simple change,
I'm marking it for stable too.

Cc: stable@vger.kernel.org # v3.10+
Fixes: 12883efb67 "tracing: Consolidate max_tr into main trace_array structure"
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-01-14 10:19:46 -05:00
Steven Rostedt (Red Hat) a4c35ed241 ftrace: Fix synchronization location disabling and freeing ftrace_ops
The synchronization needed after ftrace_ops are unregistered must happen
after the callback is disabled from becing called by functions.

The current location happens after the function is being removed from the
internal lists, but not after the function callbacks were disabled, leaving
the functions susceptible of being called after their callbacks are freed.

This affects perf and any externel users of function tracing (LTTng and
SystemTap).

Cc: stable@vger.kernel.org # 3.0+
Fixes: cdbe61bfe7 "ftrace: Allow dynamically allocated function tracers"
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-01-13 12:56:21 -05:00
Steven Rostedt (Red Hat) 23a8e8441a ftrace: Have function graph only trace based on global_ops filters
Doing some different tests, I discovered that function graph tracing, when
filtered via the set_ftrace_filter and set_ftrace_notrace files, does
not always keep with them if another function ftrace_ops is registered
to trace functions.

The reason is that function graph just happens to trace all functions
that the function tracer enables. When there was only one user of
function tracing, the function graph tracer did not need to worry about
being called by functions that it did not want to trace. But now that there
are other users, this becomes a problem.

For example, one just needs to do the following:

 # cd /sys/kernel/debug/tracing
 # echo schedule > set_ftrace_filter
 # echo function_graph > current_tracer
 # cat trace
[..]
 0)               |  schedule() {
 ------------------------------------------
 0)    <idle>-0    =>   rcu_pre-7
 ------------------------------------------

 0) ! 2980.314 us |  }
 0)               |  schedule() {
 ------------------------------------------
 0)   rcu_pre-7    =>    <idle>-0
 ------------------------------------------

 0) + 20.701 us   |  }

 # echo 1 > /proc/sys/kernel/stack_tracer_enabled
 # cat trace
[..]
 1) + 20.825 us   |      }
 1) + 21.651 us   |    }
 1) + 30.924 us   |  } /* SyS_ioctl */
 1)               |  do_page_fault() {
 1)               |    __do_page_fault() {
 1)   0.274 us    |      down_read_trylock();
 1)   0.098 us    |      find_vma();
 1)               |      handle_mm_fault() {
 1)               |        _raw_spin_lock() {
 1)   0.102 us    |          preempt_count_add();
 1)   0.097 us    |          do_raw_spin_lock();
 1)   2.173 us    |        }
 1)               |        do_wp_page() {
 1)   0.079 us    |          vm_normal_page();
 1)   0.086 us    |          reuse_swap_page();
 1)   0.076 us    |          page_move_anon_rmap();
 1)               |          unlock_page() {
 1)   0.082 us    |            page_waitqueue();
 1)   0.086 us    |            __wake_up_bit();
 1)   1.801 us    |          }
 1)   0.075 us    |          ptep_set_access_flags();
 1)               |          _raw_spin_unlock() {
 1)   0.098 us    |            do_raw_spin_unlock();
 1)   0.105 us    |            preempt_count_sub();
 1)   1.884 us    |          }
 1)   9.149 us    |        }
 1) + 13.083 us   |      }
 1)   0.146 us    |      up_read();

When the stack tracer was enabled, it enabled all functions to be traced, which
now the function graph tracer also traces. This is a side effect that should
not occur.

To fix this a test is added when the function tracing is changed, as well as when
the graph tracer is enabled, to see if anything other than the ftrace global_ops
function tracer is enabled. If so, then the graph tracer calls a test trampoline
that will look at the function that is being traced and compare it with the
filters defined by the global_ops.

As an optimization, if there's no other function tracers registered, or if
the only registered function tracers also use the global ops, the function
graph infrastructure will call the registered function graph callback directly
and not go through the test trampoline.

Cc: stable@vger.kernel.org # 3.3+
Fixes: d2d45c7a03 "tracing: Have stack_tracer use a separate list of functions"
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-01-13 10:52:58 -05:00
Peter Zijlstra 35af99e646 sched/clock, x86: Use a static_key for sched_clock_stable
In order to avoid the runtime condition and variable load turn
sched_clock_stable into a static_key.

Also provide a shorter implementation of local_clock() and
cpu_clock(int) when sched_clock_stable==1.

                        MAINLINE   PRE       POST

    sched_clock_stable: 1          1         1
    (cold) sched_clock: 329841     221876    215295
    (cold) local_clock: 301773     234692    220773
    (warm) sched_clock: 38375      25602     25659
    (warm) local_clock: 100371     33265     27242
    (warm) rdtsc:       27340      24214     24208
    sched_clock_stable: 0          0         0
    (cold) sched_clock: 382634     235941    237019
    (cold) local_clock: 396890     297017    294819
    (warm) sched_clock: 38194      25233     25609
    (warm) local_clock: 143452     71234     71232
    (warm) rdtsc:       27345      24245     24243

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/n/tip-eummbdechzz37mwmpags1gjr@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-13 15:13:13 +01:00
Dario Faggioli 2d3d891d33 sched/deadline: Add SCHED_DEADLINE inheritance logic
Some method to deal with rt-mutexes and make sched_dl interact with
the current PI-coded is needed, raising all but trivial issues, that
needs (according to us) to be solved with some restructuring of
the pi-code (i.e., going toward a proxy execution-ish implementation).

This is under development, in the meanwhile, as a temporary solution,
what this commits does is:

 - ensure a pi-lock owner with waiters is never throttled down. Instead,
   when it runs out of runtime, it immediately gets replenished and it's
   deadline is postponed;

 - the scheduling parameters (relative deadline and default runtime)
   used for that replenishments --during the whole period it holds the
   pi-lock-- are the ones of the waiting task with earliest deadline.

Acting this way, we provide some kind of boosting to the lock-owner,
still by using the existing (actually, slightly modified by the previous
commit) pi-architecture.

We would stress the fact that this is only a surely needed, all but
clean solution to the problem. In the end it's only a way to re-start
discussion within the community. So, as always, comments, ideas, rants,
etc.. are welcome! :-)

Signed-off-by: Dario Faggioli <raistlin@linux.it>
Signed-off-by: Juri Lelli <juri.lelli@gmail.com>
[ Added !RT_MUTEXES build fix. ]
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1383831828-15501-11-git-send-email-juri.lelli@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-13 13:42:56 +01:00
Dario Faggioli af6ace764d sched/deadline: Add latency tracing for SCHED_DEADLINE tasks
It is very likely that systems that wants/needs to use the new
SCHED_DEADLINE policy also want to have the scheduling latency of
the -deadline tasks under control.

For this reason a new version of the scheduling wakeup latency,
called "wakeup_dl", is introduced.

As a consequence of applying this patch there will be three wakeup
latency tracer:

 * "wakeup", that deals with all tasks in the system;
 * "wakeup_rt", that deals with -rt and -deadline tasks only;
 * "wakeup_dl", that deals with -deadline tasks only.

Signed-off-by: Dario Faggioli <raistlin@linux.it>
Signed-off-by: Juri Lelli <juri.lelli@gmail.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1383831828-15501-9-git-send-email-juri.lelli@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-13 13:41:11 +01:00
Steven Rostedt (Red Hat) 405e1d8348 ftrace: Synchronize setting function_trace_op with ftrace_trace_function
ftrace_trace_function is a variable that holds what function will be called
directly by the assembly code (mcount). If just a single function is
registered and it handles recursion itself, then the assembly will call that
function directly without any helper function. It also passes in the
ftrace_op that was registered with the callback. The ftrace_op to send is
stored in the function_trace_op variable.

The ftrace_trace_function and function_trace_op needs to be coordinated such
that the called callback wont be called with the wrong ftrace_op, otherwise
bad things can happen if it expected a different op. Luckily, there's no
callback that doesn't use the helper functions that requires this. But
there soon will be and this needs to be fixed.

Use a set_function_trace_op to store the ftrace_op to set the
function_trace_op to when it is safe to do so (during the update function
within the breakpoint or stop machine calls). Or if dynamic ftrace is not
being used (static tracing) then we have to do a bit more synchronization
when the ftrace_trace_function is set as that takes affect immediately
(as oppose to dynamic ftrace doing it with the modification of the trampoline).

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-01-09 22:00:25 -05:00
Steven Rostedt (Red Hat) dd97b95438 tracing: Show available event triggers when no trigger is set
Currently there's no way to know what triggers exist on a kernel without
looking at the source of the kernel or randomly trying out triggers.
Instead of creating another file in the debugfs system, simply show
what available triggers are there when cat'ing the trigger file when
it has no events:

 [root /sys/kernel/debug/tracing]# cat events/sched/sched_switch/trigger
 # Available triggers:
 # traceon traceoff snapshot stacktrace enable_event disable_event

This stays consistent with other debugfs files where meta data like
this is always proceeded with a '#' at the start of the line so that
tools can strip these out.

Link: http://lkml.kernel.org/r/20140107103548.0a84536d@gandalf.local.home

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-01-09 21:20:32 -05:00
Steven Rostedt (Red Hat) 13a1e4aef5 tracing: Consolidate event trigger code
The event trigger code that checks for callback triggers before and
after recording of an event has lots of flags checks. This code is
duplicated throughout the ftrace events, kprobes and system calls.
They all do the exact same checks against the event flags.

Added helper functions ftrace_trigger_soft_disabled(),
event_trigger_unlock_commit() and event_trigger_unlock_commit_regs()
that consolidated the code and these are used instead.

Link: http://lkml.kernel.org/r/20140106222703.5e7dbba2@gandalf.local.home

Acked-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Tested-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-01-09 21:20:07 -05:00
Steven Rostedt (Red Hat) e8dc637152 tracing: Fix counter for traceon/off event triggers
The counters for the traceon and traceoff are only suppose to decrement
when the trigger enables or disables tracing. It is not suppose to decrement
every time the event is hit.

Only decrement the counter if the trigger actually did something.

Link: http://lkml.kernel.org/r/20140106223124.0e5fd0b4@gandalf.local.home

Acked-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-01-09 21:19:44 -05:00
Tom Zanussi 4bf0566db1 tracing: Remove double-underscore naming in syscall trigger invocations
There's no reason to use double-underscores for any variable name in
ftrace_syscall_enter()/exit(), since those functions aren't generated
and there's no need to avoid namespace collisions as with the event
macros, which is where the original invocation code came from.

Link: http://lkml.kernel.org/r/0b489c9d1f7ee315cff60fa0e4c2b433ade8ae0d.1389036657.git.tom.zanussi@linux.intel.com

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-01-06 15:22:07 -05:00
Tom Zanussi 0641d368f2 tracing/kprobes: Add trace event trigger invocations
Add code to the kprobe/kretprobe event functions that will invoke any
event triggers associated with a probe's ftrace_event_file.

The code to do this is very similar to the invocation code already
used to invoke the triggers associated with static events and
essentially replaces the existing soft-disable checks with a superset
that preserves the original behavior but adds the bits needed to
support event triggers.

Link: http://lkml.kernel.org/r/f2d49f157b608070045fdb26c9564d5a05a5a7d0.1389036657.git.tom.zanussi@linux.intel.com

Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-01-06 15:21:43 -05:00
Namhyung Kim e0d18fe063 tracing/probes: Fix build break on !CONFIG_KPROBE_EVENT
When kprobe-based dynamic event tracer is not enabled, it caused
following build error:

   kernel/built-in.o: In function `traceprobe_update_arg':
   (.text+0x10c8dd): undefined reference to `fetch_symbol_u8'
   kernel/built-in.o: In function `traceprobe_update_arg':
   (.text+0x10c8e9): undefined reference to `fetch_symbol_u16'
   kernel/built-in.o: In function `traceprobe_update_arg':
   (.text+0x10c8f5): undefined reference to `fetch_symbol_u32'
   kernel/built-in.o: In function `traceprobe_update_arg':
   (.text+0x10c901): undefined reference to `fetch_symbol_u64'
   kernel/built-in.o: In function `traceprobe_update_arg':
   (.text+0x10c909): undefined reference to `fetch_symbol_string'
   kernel/built-in.o: In function `traceprobe_update_arg':
   (.text+0x10c913): undefined reference to `fetch_symbol_string_size'
   ...

It was due to the fetch methods are referred from CHECK_FETCH_FUNCS
macro and since it was only defined in trace_kprobe.c.  Move NULL
definition of such fetch functions to the header file.

Note, it also requires CONFIG_BRANCH_PROFILING enabled to trigger
this failure as well. This is because the "fetch_symbol_*" variables
are referenced in a "else if" statement that will only call
update_symbol_cache(), which is a static inline stub function
when CONFIG_KPROBE_EVENT is not enabled. gcc is smart enough
to optimize this "else if" out and that also removes the code that
references the undefined variables.

But when BRANCH_PROFILING is enabled, it fools gcc into keeping
the if statement around and thus references the undefined symbols
and fails to build.

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-01-03 15:27:18 -05:00
Namhyung Kim b7e0bf341f tracing/uprobes: Add @+file_offset fetch method
Enable to fetch data from a file offset.  Currently it only supports
fetching from same binary uprobe set.  It'll translate the file offset
to a proper virtual address in the process.

The syntax is "@+OFFSET" as it does similar to normal memory fetching
(@ADDR) which does no address translation.

Suggested-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: zhangwei(Jovi) <jovi.zhangwei@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2014-01-02 20:57:05 -05:00
Namhyung Kim b079d374fd tracing/uprobes: Add support for full argument access methods
Enable to fetch other types of argument for the uprobes.  IOW, we can
access stack, memory, deref, bitfield and retval from uprobes now.

The format for the argument types are same as kprobes (but @SYMBOL
type is not supported for uprobes), i.e:

  @ADDR   : Fetch memory at ADDR
  $stackN : Fetch Nth entry of stack (N >= 0)
  $stack  : Fetch stack address
  $retval : Fetch return value
  +|-offs(FETCHARG) : Fetch memory at FETCHARG +|- offs address

Note that the retval only can be used with uretprobes.

Original-patch-by: Hyeoncheol Lee <cheol.lee@lge.com>
Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: zhangwei(Jovi) <jovi.zhangwei@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: Hyeoncheol Lee <cheol.lee@lge.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2014-01-02 20:56:21 -05:00
Namhyung Kim dcad1a204f tracing/uprobes: Fetch args before reserving a ring buffer
Fetching from user space should be done in a non-atomic context.  So
use a per-cpu buffer and copy its content to the ring buffer
atomically.  Note that we can migrate during accessing user memory
thus use a per-cpu mutex to protect concurrent accesses.

This is needed since we'll be able to fetch args from an user memory
which can be swapped out.  Before that uprobes could fetch args from
registers only which saved in a kernel space.

While at it, use __get_data_size() and store_trace_args() to reduce
code duplication.  And add struct uprobe_cpu_buffer and its helpers as
suggested by Oleg.

Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: zhangwei(Jovi) <jovi.zhangwei@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2014-01-02 16:17:44 -05:00
Namhyung Kim a4734145a4 tracing/uprobes: Pass 'is_return' to traceprobe_parse_probe_arg()
Currently uprobes don't pass is_return to the argument parser so that
it cannot make use of "$retval" fetch method since it only works for
return probes.

Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: zhangwei(Jovi) <jovi.zhangwei@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2014-01-02 16:17:43 -05:00
Namhyung Kim 5baaa59ef0 tracing/probes: Implement 'memory' fetch method for uprobes
Use separate method to fetch from memory.  Move existing functions to
trace_kprobe.c and make them static.  Also add new memory fetch
implementation for uprobes.

Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: zhangwei(Jovi) <jovi.zhangwei@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2014-01-02 16:17:43 -05:00
Hyeoncheol Lee 3925f4a5af tracing/probes: Add fetch{,_size} member into deref fetch method
The deref fetch methods access a memory region but it assumes that
it's a kernel memory since uprobes does not support them.

Add ->fetch and ->fetch_size member in order to provide a proper
access methods for supporting uprobes.

Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: zhangwei(Jovi) <jovi.zhangwei@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: Hyeoncheol Lee <cheol.lee@lge.com>
[namhyung@kernel.org: Split original patch into pieces as requested]
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2014-01-02 16:17:42 -05:00
Namhyung Kim 1301a44e77 tracing/probes: Move 'symbol' fetch method to kprobes
Move existing functions to trace_kprobe.c and add NULL entries to the
uprobes fetch type table.  I don't make them static since some generic
routines like update/free_XXX_fetch_param() require pointers to the
functions.

Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: zhangwei(Jovi) <jovi.zhangwei@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2014-01-02 16:17:41 -05:00
Namhyung Kim 3fd996a295 tracing/probes: Implement 'stack' fetch method for uprobes
Use separate method to fetch from stack.  Move existing functions to
trace_kprobe.c and make them static.  Also add new stack fetch
implementation for uprobes.

Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: zhangwei(Jovi) <jovi.zhangwei@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2014-01-02 16:17:40 -05:00
Namhyung Kim 34fee3a104 tracing/probes: Split [ku]probes_fetch_type_table
Use separate fetch_type_table for kprobes and uprobes.  It currently
shares all fetch methods but some of them will be implemented
differently later.

This is not to break build if [ku]probes is configured alone (like
!CONFIG_KPROBE_EVENT and CONFIG_UPROBE_EVENT).  So I added '__weak'
to the table declaration so that it can be safely omitted when it
configured out.

Acked-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: zhangwei(Jovi) <jovi.zhangwei@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2014-01-02 16:17:39 -05:00
Namhyung Kim b26c74e116 tracing/probes: Move fetch function helpers to trace_probe.h
Move fetch function helper macros/functions to the header file and
make them external.  This is preparation of supporting uprobe fetch
table in next patch.

Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: zhangwei(Jovi) <jovi.zhangwei@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2014-01-02 16:17:38 -05:00
Namhyung Kim 5bf652aaf4 tracing/probes: Integrate duplicate set_print_fmt()
The set_print_fmt() functions are implemented almost same for
[ku]probes.  Move it to a common place and get rid of the duplication.

Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: zhangwei(Jovi) <jovi.zhangwei@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2014-01-02 16:17:38 -05:00
Namhyung Kim 2dc1018372 tracing/kprobes: Move common functions to trace_probe.h
The __get_data_size() and store_trace_args() will be used by uprobes
too.  Move them to a common location.

Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: zhangwei(Jovi) <jovi.zhangwei@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2014-01-02 16:17:37 -05:00
Namhyung Kim 14577c3992 tracing/uprobes: Convert to struct trace_probe
Convert struct trace_uprobe to make use of the common trace_probe
structure.

Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: zhangwei(Jovi) <jovi.zhangwei@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2014-01-02 16:17:36 -05:00
Namhyung Kim c31ffb3ff6 tracing/kprobes: Factor out struct trace_probe
There are functions that can be shared to both of kprobes and uprobes.
Separate common data structure to struct trace_probe and use it from
the shared functions.

Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: zhangwei(Jovi) <jovi.zhangwei@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2014-01-02 16:17:29 -05:00
Namhyung Kim 50eb2672ce tracing/probes: Fix basic print type functions
The print format of s32 type was "ld" and it's casted to "long".  So
it turned out to print 4294967295 for "-1" on 64-bit systems.  Not
sure whether it worked well on 32-bit systems.

Anyway, it doesn't need to have cast argument at all since it already
casted using type pointer - just get rid of it.  Thanks to Oleg for
pointing that out.

And print 0x prefix for unsigned type as it shows hex numbers.

Suggested-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: zhangwei(Jovi) <jovi.zhangwei@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2014-01-02 16:17:24 -05:00
Namhyung Kim 306cfe2025 tracing/uprobes: Fix documentation of uprobe registration syntax
The uprobe syntax requires an offset after a file path not a symbol.

Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: zhangwei(Jovi) <jovi.zhangwei@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2014-01-02 16:17:23 -05:00
Steven Rostedt (Red Hat) d8a30f2034 tracing: Fix rcu handling of event_trigger_data filter field
The filter field of the event_trigger_data structure is protected under
RCU sched locks. It was not annotated as such, and after doing so,
sparse pointed out several locations that required fix ups.

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Tested-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Acked-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-01-02 16:17:22 -05:00
Steven Rostedt (Red Hat) 098c879e1f tracing: Add generic tracing_lseek() function
Trace event triggers added a lseek that uses the ftrace_filter_lseek()
function. Unfortunately, when function tracing is not configured in
that function is not defined and the kernel fails to build.

This is the second time that function was added to a file ops and
it broke the build due to requiring special config dependencies.

Make a generic tracing_lseek() that all the tracing utilities may
use.

Also, modify the old ftrace_filter_lseek() to return 0 instead of
1 on WRONLY. Not sure why it was a 1 as that does not make sense.

This also changes the old tracing_seek() to modify the file pos
pointer on WRONLY as well.

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Tested-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Acked-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-01-02 16:17:12 -05:00
Jens Axboe b28bc9b38c Linux 3.13-rc6
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJSwLfoAAoJEHm+PkMAQRiGi6QH/1U1B7lmHChDTw3jj1lfm9gA
 189Si4QJlnxFWCKHvKEL+pcaVuACU+aMGI8+KyMYK4/JfuWVjjj5fr/SvyHH2/8m
 LdSK8aHMhJ46uBS4WJ/l6v46qQa5e2vn8RKSBAyKm/h4vpt+hd6zJdoFrFai4th7
 k/TAwOAEHI5uzexUChwLlUBRTvbq4U8QUvDu+DeifC8cT63CGaaJ4qVzjOZrx1an
 eP6UXZrKDASZs7RU950i7xnFVDQu4PsjlZi25udsbeiKcZJgPqGgXz5ULf8ZH8RQ
 YCi1JOnTJRGGjyIOyLj7pyB01h7XiSM2+eMQ0S7g54F2s7gCJ58c2UwQX45vRWU=
 =/4/R
 -----END PGP SIGNATURE-----

Merge tag 'v3.13-rc6' into for-3.14/core

Needed to bring blk-mq uptodate, since changes have been going in
since for-3.14/core was established.

Fixup merge issues related to the immutable biovec changes.

Signed-off-by: Jens Axboe <axboe@kernel.dk>

Conflicts:
	block/blk-flush.c
	fs/btrfs/check-integrity.c
	fs/btrfs/extent_io.c
	fs/btrfs/scrub.c
	fs/logfs/dev_bdev.c
2013-12-31 09:51:02 -07:00
Tom Zanussi bac5fb97a1 tracing: Add and use generic set_trigger_filter() implementation
Add a generic event_command.set_trigger_filter() op implementation and
have the current set of trigger commands use it - this essentially
gives them all support for filters.

Syntactically, filters are supported by adding 'if <filter>' just
after the command, in which case only events matching the filter will
invoke the trigger.  For example, to add a filter to an
enable/disable_event command:

    echo 'enable_event:system:event if common_pid == 999' > \
              .../othersys/otherevent/trigger

The above command will only enable the system:event event if the
common_pid field in the othersys:otherevent event is 999.

As another example, to add a filter to a stacktrace command:

    echo 'stacktrace if common_pid == 999' > \
                   .../somesys/someevent/trigger

The above command will only trigger a stacktrace if the common_pid
field in the event is 999.

The filter syntax is the same as that described in the 'Event
filtering' section of Documentation/trace/events.txt.

Because triggers can now use filters, the trigger-invoking logic needs
to be moved in those cases - e.g. for ftrace_raw_event_calls, if a
trigger has a filter associated with it, the trigger invocation now
needs to happen after the { assign; } part of the call, in order for
the trigger condition to be tested.

There's still a SOFT_DISABLED-only check at the top of e.g. the
ftrace_raw_events function, so when an event is soft disabled but not
because of the presence of a trigger, the original SOFT_DISABLED
behavior remains unchanged.

There's also a bit of trickiness in that some triggers need to avoid
being invoked while an event is currently in the process of being
logged, since the trigger may itself log data into the trace buffer.
Thus we make sure the current event is committed before invoking those
triggers.  To do that, we split the trigger invocation in two - the
first part (event_triggers_call()) checks the filter using the current
trace record; if a command has the post_trigger flag set, it sets a
bit for itself in the return value, otherwise it directly invoks the
trigger.  Once all commands have been either invoked or set their
return flag, event_triggers_call() returns.  The current record is
then either committed or discarded; if any commands have deferred
their triggers, those commands are finally invoked following the close
of the current event by event_triggers_post_call().

To simplify the above and make it more efficient, the TRIGGER_COND bit
is introduced, which is set only if a soft-disabled trigger needs to
use the log record for filter testing or needs to wait until the
current log record is closed.

The syscall event invocation code is also changed in analogous ways.

Because event triggers need to be able to create and free filters,
this also adds a couple external wrappers for the existing
create_filter and free_filter functions, which are too generic to be
made extern functions themselves.

Link: http://lkml.kernel.org/r/7164930759d8719ef460357f143d995406e4eead.1382622043.git.tom.zanussi@linux.intel.com

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-12-21 22:02:17 -05:00
Steven Rostedt (Red Hat) 2875a08b2d tracing: Move ftrace_event_file() out of DYNAMIC_FTRACE ifdef
Now that event triggers use ftrace_event_file(), it needs to be outside
the #ifdef CONFIG_DYNAMIC_FTRACE, as it can now be used when that is
not defined.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-12-21 22:02:17 -05:00
Tom Zanussi 7862ad1846 tracing: Add 'enable_event' and 'disable_event' event trigger commands
Add 'enable_event' and 'disable_event' event_command commands.

enable_event and disable_event event triggers are added by the user
via these commands in a similar way and using practically the same
syntax as the analagous 'enable_event' and 'disable_event' ftrace
function commands, but instead of writing to the set_ftrace_filter
file, the enable_event and disable_event triggers are written to the
per-event 'trigger' files:

    echo 'enable_event:system:event' > .../othersys/otherevent/trigger
    echo 'disable_event:system:event' > .../othersys/otherevent/trigger

The above commands will enable or disable the 'system:event' trace
events whenever the othersys:otherevent events are hit.

This also adds a 'count' version that limits the number of times the
command will be invoked:

    echo 'enable_event:system:event:N' > .../othersys/otherevent/trigger
    echo 'disable_event:system:event:N' > .../othersys/otherevent/trigger

Where N is the number of times the command will be invoked.

The above commands will will enable or disable the 'system:event'
trace events whenever the othersys:otherevent events are hit, but only
N times.

This also makes the find_event_file() helper function extern, since
it's useful to use from other places, such as the event triggers code,
so make it accessible.

Link: http://lkml.kernel.org/r/f825f3048c3f6b026ee37ae5825f9fc373451828.1382622043.git.tom.zanussi@linux.intel.com

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-12-21 22:02:16 -05:00
Tom Zanussi f21ecbb35f tracing: Add 'stacktrace' event trigger command
Add 'stacktrace' event_command.  stacktrace event triggers are added
by the user via this command in a similar way and using practically
the same syntax as the analogous 'stacktrace' ftrace function command,
but instead of writing to the set_ftrace_filter file, the stacktrace
event trigger is written to the per-event 'trigger' files:

    echo 'stacktrace' > .../tracing/events/somesys/someevent/trigger

The above command will turn on stacktraces for someevent i.e. whenever
someevent is hit, a stacktrace will be logged.

This also adds a 'count' version that limits the number of times the
command will be invoked:

    echo 'stacktrace:N' > .../tracing/events/somesys/someevent/trigger

Where N is the number of times the command will be invoked.

The above command will log N stacktraces for someevent i.e. whenever
someevent is hit N times, a stacktrace will be logged.

Link: http://lkml.kernel.org/r/0c30c008a0828c660aa0e1bbd3255cf179ed5c30.1382622043.git.tom.zanussi@linux.intel.com

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-12-21 22:02:15 -05:00
Tom Zanussi 93e31ffbf4 tracing: Add 'snapshot' event trigger command
Add 'snapshot' event_command.  snapshot event triggers are added by
the user via this command in a similar way and using practically the
same syntax as the analogous 'snapshot' ftrace function command, but
instead of writing to the set_ftrace_filter file, the snapshot event
trigger is written to the per-event 'trigger' files:

    echo 'snapshot' > .../somesys/someevent/trigger

The above command will turn on snapshots for someevent i.e. whenever
someevent is hit, a snapshot will be done.

This also adds a 'count' version that limits the number of times the
command will be invoked:

    echo 'snapshot:N' > .../somesys/someevent/trigger

Where N is the number of times the command will be invoked.

The above command will snapshot N times for someevent i.e. whenever
someevent is hit N times, a snapshot will be done.

Also adds a new tracing_alloc_snapshot() function - the existing
tracing_snapshot_alloc() function is a special version of
tracing_snapshot() that also does the snapshot allocation - the
snapshot triggers would like to be able to do just the allocation but
not take a snapshot; the existing tracing_snapshot_alloc() in turn now
also calls tracing_alloc_snapshot() underneath to do that allocation.

Link: http://lkml.kernel.org/r/c9524dd07ce01f9dcbd59011290e0a8d5b47d7ad.1382622043.git.tom.zanussi@linux.intel.com

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
[ fix up from kbuild test robot <fengguang.wu@intel.com report ]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-12-21 22:01:22 -05:00
Tom Zanussi 2a2df32115 tracing: Add 'traceon' and 'traceoff' event trigger commands
Add 'traceon' and 'traceoff' event_command commands.  traceon and
traceoff event triggers are added by the user via these commands in a
similar way and using practically the same syntax as the analagous
'traceon' and 'traceoff' ftrace function commands, but instead of
writing to the set_ftrace_filter file, the traceon and traceoff
triggers are written to the per-event 'trigger' files:

    echo 'traceon' > .../tracing/events/somesys/someevent/trigger
    echo 'traceoff' > .../tracing/events/somesys/someevent/trigger

The above command will turn tracing on or off whenever someevent is
hit.

This also adds a 'count' version that limits the number of times the
command will be invoked:

    echo 'traceon:N' > .../tracing/events/somesys/someevent/trigger
    echo 'traceoff:N' > .../tracing/events/somesys/someevent/trigger

Where N is the number of times the command will be invoked.

The above commands will will turn tracing on or off whenever someevent
is hit, but only N times.

Some common register/unregister_trigger() implementations of the
event_command reg()/unreg() callbacks are also provided, which add and
remove trigger instances to the per-event list of triggers, and
arm/disarm them as appropriate.  event_trigger_callback() is a
general-purpose event_command func() implementation that orchestrates
command parsing and registration for most normal commands.

Most event commands will use these, but some will override and
possibly reuse them.

The event_trigger_init(), event_trigger_free(), and
event_trigger_print() functions are meant to be common implementations
of the event_trigger_ops init(), free(), and print() ops,
respectively.

Most trigger_ops implementations will use these, but some will
override and possibly reuse them.

Link: http://lkml.kernel.org/r/00a52816703b98d2072947478dd6e2d70cde5197.1382622043.git.tom.zanussi@linux.intel.com

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-12-20 18:40:24 -05:00
Tom Zanussi 85f2b08268 tracing: Add basic event trigger framework
Add a 'trigger' file for each trace event, enabling 'trace event
triggers' to be set for trace events.

'trace event triggers' are patterned after the existing 'ftrace
function triggers' implementation except that triggers are written to
per-event 'trigger' files instead of to a single file such as the
'set_ftrace_filter' used for ftrace function triggers.

The implementation is meant to be entirely separate from ftrace
function triggers, in order to keep the respective implementations
relatively simple and to allow them to diverge.

The event trigger functionality is built on top of SOFT_DISABLE
functionality.  It adds a TRIGGER_MODE bit to the ftrace_event_file
flags which is checked when any trace event fires.  Triggers set for a
particular event need to be checked regardless of whether that event
is actually enabled or not - getting an event to fire even if it's not
enabled is what's already implemented by SOFT_DISABLE mode, so trigger
mode directly reuses that.  Event trigger essentially inherit the soft
disable logic in __ftrace_event_enable_disable() while adding a bit of
logic and trigger reference counting via tm_ref on top of that in a
new trace_event_trigger_enable_disable() function.  Because the base
__ftrace_event_enable_disable() code now needs to be invoked from
outside trace_events.c, a wrapper is also added for those usages.

The triggers for an event are actually invoked via a new function,
event_triggers_call(), and code is also added to invoke them for
ftrace_raw_event calls as well as syscall events.

The main part of the patch creates a new trace_events_trigger.c file
to contain the trace event triggers implementation.

The standard open, read, and release file operations are implemented
here.

The open() implementation sets up for the various open modes of the
'trigger' file.  It creates and attaches the trigger iterator and sets
up the command parser.  If opened for reading set up the trigger
seq_ops.

The read() implementation parses the event trigger written to the
'trigger' file, looks up the trigger command, and passes it along to
that event_command's func() implementation for command-specific
processing.

The release() implementation does whatever cleanup is needed to
release the 'trigger' file, like releasing the parser and trigger
iterator, etc.

A couple of functions for event command registration and
unregistration are added, along with a list to add them to and a mutex
to protect them, as well as an (initially empty) registration function
to add the set of commands that will be added by future commits, and
call to it from the trace event initialization code.

also added are a couple trigger-specific data structures needed for
these implementations such as a trigger iterator and a struct for
trigger-specific data.

A couple structs consisting mostly of function meant to be implemented
in command-specific ways, event_command and event_trigger_ops, are
used by the generic event trigger command implementations.  They're
being put into trace.h alongside the other trace_event data structures
and functions, in the expectation that they'll be needed in several
trace_event-related files such as trace_events_trigger.c and
trace_events.c.

The event_command.func() function is meant to be called by the trigger
parsing code in order to add a trigger instance to the corresponding
event.  It essentially coordinates adding a live trigger instance to
the event, and arming the triggering the event.

Every event_command func() implementation essentially does the
same thing for any command:

   - choose ops - use the value of param to choose either a number or
     count version of event_trigger_ops specific to the command
   - do the register or unregister of those ops
   - associate a filter, if specified, with the triggering event

The reg() and unreg() ops allow command-specific implementations for
event_trigger_op registration and unregistration, and the
get_trigger_ops() op allows command-specific event_trigger_ops
selection to be parameterized.  When a trigger instance is added, the
reg() op essentially adds that trigger to the triggering event and
arms it, while unreg() does the opposite.  The set_filter() function
is used to associate a filter with the trigger - if the command
doesn't specify a set_filter() implementation, the command will ignore
filters.

Each command has an associated trigger_type, which serves double duty,
both as a unique identifier for the command as well as a value that
can be used for setting a trigger mode bit during trigger invocation.

The signature of func() adds a pointer to the event_command struct,
used to invoke those functions, along with a command_data param that
can be passed to the reg/unreg functions.  This allows func()
implementations to use command-specific blobs and supports code
re-use.

The event_trigger_ops.func() command corrsponds to the trigger 'probe'
function that gets called when the triggering event is actually
invoked.  The other functions are used to list the trigger when
needed, along with a couple mundane book-keeping functions.

This also moves event_file_data() into trace.h so it can be used
outside of trace_events.c.

Link: http://lkml.kernel.org/r/316d95061accdee070aac8e5750afba0192fa5b9.1382622043.git.tom.zanussi@linux.intel.com

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Idea-by: Steve Rostedt <rostedt@goodmis.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-12-20 18:40:22 -05:00
Linus Torvalds 5263f0a880 This fixes a long standing bug in the ftrace profiler.
The problem is that the profiler only initializes the online
 CPUs, and not possible CPUs. This causes issues if the user takes
 CPUs online or offline while the profiler is running.
 
 If we online a CPU after starting the profiler, we lose all the
 trace information on the CPU going online.
 
 If we offline a CPU after running a test and start a new test, it
 will not clear the old data from that CPU.
 
 This bug causes incorrect data to be reported to the user if they
 online or offline CPUs during the profiling.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.15 (GNU/Linux)
 
 iQEcBAABAgAGBQJSsNBHAAoJEKQekfcNnQGuKP8H/2mol/d7z2vANh7/FeNjTKIN
 VkRzDEwUIwoaJBsL75EDDXBFx7w8jjAsXyoTrqrvMRV4UNcsfm46mohQTPAmK39y
 muqodL1VnVXdKrUmtw/1nL7yDi2KltQH1UwOgvwXGuUFIq5cuCXNQxNK9/1fVVVn
 tIMNz5kEAG3XCwnqP0PgQxWCuA7s+aQR0ijTf4vPf1G3IJujPyG9VhJWcGS3dJTR
 t8TPyatd9D/S+7/r7iZ9hS8nWpaka3qJfhiWqk16SC9LiUXVA8oFOVMoN7n6Co5E
 6r2dNo01WOABlojCxi1t3afUtcV1bUjBnVkiDva5cSc84pQSxe1qRrIpjTmHk00=
 =MSZs
 -----END PGP SIGNATURE-----

Merge tag 'trace-fixes-v3.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull ftrace fix from Steven Rostedt:
 "This fixes a long standing bug in the ftrace profiler.  The problem is
  that the profiler only initializes the online CPUs, and not possible
  CPUs.  This causes issues if the user takes CPUs online or offline
  while the profiler is running.

  If we online a CPU after starting the profiler, we lose all the trace
  information on the CPU going online.

  If we offline a CPU after running a test and start a new test, it will
  not clear the old data from that CPU.

  This bug causes incorrect data to be reported to the user if they
  online or offline CPUs during the profiling"

* tag 'trace-fixes-v3.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  ftrace: Initialize the ftrace profiler for each possible cpu
2013-12-20 09:32:30 -08:00
Miao Xie c4602c1c81 ftrace: Initialize the ftrace profiler for each possible cpu
Ftrace currently initializes only the online CPUs. This implementation has
two problems:
- If we online a CPU after we enable the function profile, and then run the
  test, we will lose the trace information on that CPU.
  Steps to reproduce:
  # echo 0 > /sys/devices/system/cpu/cpu1/online
  # cd <debugfs>/tracing/
  # echo <some function name> >> set_ftrace_filter
  # echo 1 > function_profile_enabled
  # echo 1 > /sys/devices/system/cpu/cpu1/online
  # run test
- If we offline a CPU before we enable the function profile, we will not clear
  the trace information when we enable the function profile. It will trouble
  the users.
  Steps to reproduce:
  # cd <debugfs>/tracing/
  # echo <some function name> >> set_ftrace_filter
  # echo 1 > function_profile_enabled
  # run test
  # cat trace_stat/function*
  # echo 0 > /sys/devices/system/cpu/cpu1/online
  # echo 0 > function_profile_enabled
  # echo 1 > function_profile_enabled
  # cat trace_stat/function*
  # run test
  # cat trace_stat/function*

So it is better that we initialize the ftrace profiler for each possible cpu
every time we enable the function profile instead of just the online ones.

Link: http://lkml.kernel.org/r/1387178401-10619-1-git-send-email-miaox@cn.fujitsu.com

Cc: stable@vger.kernel.org # 2.6.31+
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-12-16 10:53:46 -05:00
Linus Torvalds 843f4f4bb1 A regression showed up that there's a large delay when enabling
all events. This was prevalent when FTRACE_SELFTEST was enabled which
 enables all events several times, and caused the system bootup to
 pause for over a minute.
 
 This was tracked down to an addition of a synchronize_sched() performed
 when system call tracepoints are unregistered.
 
 The synchronize_sched() is needed between the unregistering of the
 system call tracepoint and a deletion of a tracing instance buffer.
 But placing the synchronize_sched() in the unreg of *every* system call
 tracepoint is a bit overboard. A single synchronize_sched() before
 the deletion of the instance is sufficient.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.15 (GNU/Linux)
 
 iQEcBAABAgAGBQJSofYNAAoJEKQekfcNnQGuTUcIAJOx8745KlkFjN4VX+nCWNfP
 xrrpnLBymbGMA9lQ4fXk+kdiuhH8DjRYdKq9fU4T481MFYKkToUIZH6NeaLI5fr1
 0nBPPjVyAlJ+yt9JbOAYa1jEYnAr27ORDHEtdQnqb6OJSky3oh9jCQi+toxmh2qX
 Sv1tIYeAf3K2V/h5xt6uSl9oiZ6KBtwE3f+xkHWNizaU9i2rq2gxd77fSbPNTIps
 wLdsESYziA2UeAm13eh8xXo1uqRbfvx7bPr59cu0+3AqdOoaXFG+JoE/MqmljIO5
 HkyCKJVqP8HD+QieuEB+hw4zpSsl6A6iKSlaiHm8NMnRRocfM6qMKMQXQ28KhxA=
 =sYm/
 -----END PGP SIGNATURE-----

Merge tag 'trace-fixes-3.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing fix from Steven Rostedt:
 "A regression showed up that there's a large delay when enabling all
  events.  This was prevalent when FTRACE_SELFTEST was enabled which
  enables all events several times, and caused the system bootup to
  pause for over a minute.

  This was tracked down to an addition of a synchronize_sched()
  performed when system call tracepoints are unregistered.

  The synchronize_sched() is needed between the unregistering of the
  system call tracepoint and a deletion of a tracing instance buffer.
  But placing the synchronize_sched() in the unreg of *every* system
  call tracepoint is a bit overboard.  A single synchronize_sched()
  before the deletion of the instance is sufficient"

* tag 'trace-fixes-3.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Only run synchronize_sched() at instance deletion time
2013-12-06 08:34:16 -08:00
Steven Rostedt 3ccb012392 tracing: Only run synchronize_sched() at instance deletion time
It has been reported that boot up with FTRACE_SELFTEST enabled can take a
very long time. There can be stalls of over a minute.

This was tracked down to the synchronize_sched() called when a system call
event is disabled. As the self tests enable and disable thousands of events,
this makes the synchronize_sched() get called thousands of times.

The synchornize_sched() was added with d562aff93b "tracing: Add support
for SOFT_DISABLE to syscall events" which caused this regression (added
in 3.13-rc1).

The synchronize_sched() is to protect against the events being accessed
when a tracer instance is being deleted. When an instance is being deleted
all the events associated to it are unregistered. The synchronize_sched()
makes sure that no more users are running when it finishes.

Instead of calling synchronize_sched() for all syscall events, we only
need to call it once, after the events are unregistered and before the
instance is deleted. The event_mutex is held during this action to
prevent new users from enabling events.

Link: http://lkml.kernel.org/r/20131203124120.427b9661@gandalf.local.home

Reported-by: Petr Mladek <pmladek@suse.cz>
Acked-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Acked-by: Petr Mladek <pmladek@suse.cz>
Tested-by: Petr Mladek <pmladek@suse.cz>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-12-05 14:22:30 -05:00
Linus Torvalds e321ae4c20 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "Misc kernel and tooling fixes"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  tools lib traceevent: Fix conversion of pointer to integer of different size
  perf/trace: Properly use u64 to hold event_id
  perf: Remove fragile swevent hlist optimization
  ftrace, perf: Avoid infinite event generation loop
  tools lib traceevent: Fix use of multiple options in processing field
  perf header: Fix possible memory leaks in process_group_desc()
  perf header: Fix bogus group name
  perf tools: Tag thread comm as overriden
2013-12-02 10:13:09 -08:00
Steven Rostedt (Red Hat) 8a56d7761d ftrace: Fix function graph with loading of modules
Commit 8c4f3c3fa9 "ftrace: Check module functions being traced on reload"
fixed module loading and unloading with respect to function tracing, but
it missed the function graph tracer. If you perform the following

 # cd /sys/kernel/debug/tracing
 # echo function_graph > current_tracer
 # modprobe nfsd
 # echo nop > current_tracer

You'll get the following oops message:

 ------------[ cut here ]------------
 WARNING: CPU: 2 PID: 2910 at /linux.git/kernel/trace/ftrace.c:1640 __ftrace_hash_rec_update.part.35+0x168/0x1b9()
 Modules linked in: nfsd exportfs nfs_acl lockd ipt_MASQUERADE sunrpc ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables uinput snd_hda_codec_idt
 CPU: 2 PID: 2910 Comm: bash Not tainted 3.13.0-rc1-test #7
 Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M., BIOS SDBLI944.86P 05/08/2007
  0000000000000668 ffff8800787efcf8 ffffffff814fe193 ffff88007d500000
  0000000000000000 ffff8800787efd38 ffffffff8103b80a 0000000000000668
  ffffffff810b2b9a ffffffff81a48370 0000000000000001 ffff880037aea000
 Call Trace:
  [<ffffffff814fe193>] dump_stack+0x4f/0x7c
  [<ffffffff8103b80a>] warn_slowpath_common+0x81/0x9b
  [<ffffffff810b2b9a>] ? __ftrace_hash_rec_update.part.35+0x168/0x1b9
  [<ffffffff8103b83e>] warn_slowpath_null+0x1a/0x1c
  [<ffffffff810b2b9a>] __ftrace_hash_rec_update.part.35+0x168/0x1b9
  [<ffffffff81502f89>] ? __mutex_lock_slowpath+0x364/0x364
  [<ffffffff810b2cc2>] ftrace_shutdown+0xd7/0x12b
  [<ffffffff810b47f0>] unregister_ftrace_graph+0x49/0x78
  [<ffffffff810c4b30>] graph_trace_reset+0xe/0x10
  [<ffffffff810bf393>] tracing_set_tracer+0xa7/0x26a
  [<ffffffff810bf5e1>] tracing_set_trace_write+0x8b/0xbd
  [<ffffffff810c501c>] ? ftrace_return_to_handler+0xb2/0xde
  [<ffffffff811240a8>] ? __sb_end_write+0x5e/0x5e
  [<ffffffff81122aed>] vfs_write+0xab/0xf6
  [<ffffffff8150a185>] ftrace_graph_caller+0x85/0x85
  [<ffffffff81122dbd>] SyS_write+0x59/0x82
  [<ffffffff8150a185>] ftrace_graph_caller+0x85/0x85
  [<ffffffff8150a2d2>] system_call_fastpath+0x16/0x1b
 ---[ end trace 940358030751eafb ]---

The above mentioned commit didn't go far enough. Well, it covered the
function tracer by adding checks in __register_ftrace_function(). The
problem is that the function graph tracer circumvents that (for a slight
efficiency gain when function graph trace is running with a function
tracer. The gain was not worth this).

The problem came with ftrace_startup() which should always be called after
__register_ftrace_function(), if you want this bug to be completely fixed.

Anyway, this solution moves __register_ftrace_function() inside of
ftrace_startup() and removes the need to call them both.

Reported-by: Dave Wysochanski <dwysocha@redhat.com>
Fixes: ed926f9b35 ("ftrace: Use counters to enable functions to trace")
Cc: stable@vger.kernel.org # 3.0+
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-11-26 10:36:50 -05:00
Kent Overstreet 4f024f3797 block: Abstract out bvec iterator
Immutable biovecs are going to require an explicit iterator. To
implement immutable bvecs, a later patch is going to add a bi_bvec_done
member to this struct; for now, this patch effectively just renames
things.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: "Ed L. Cashin" <ecashin@coraid.com>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Lars Ellenberg <drbd-dev@lists.linbit.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Yehuda Sadeh <yehuda@inktank.com>
Cc: Sage Weil <sage@inktank.com>
Cc: Alex Elder <elder@inktank.com>
Cc: ceph-devel@vger.kernel.org
Cc: Joshua Morris <josh.h.morris@us.ibm.com>
Cc: Philip Kelleher <pjk1939@linux.vnet.ibm.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Neil Brown <neilb@suse.de>
Cc: Alasdair Kergon <agk@redhat.com>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: dm-devel@redhat.com
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: linux390@de.ibm.com
Cc: Boaz Harrosh <bharrosh@panasas.com>
Cc: Benny Halevy <bhalevy@tonian.com>
Cc: "James E.J. Bottomley" <JBottomley@parallels.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Nicholas A. Bellinger" <nab@linux-iscsi.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Chris Mason <chris.mason@fusionio.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Dave Kleikamp <shaggy@kernel.org>
Cc: Joern Engel <joern@logfs.org>
Cc: Prasad Joshi <prasadjoshi.linux@gmail.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Ben Myers <bpm@sgi.com>
Cc: xfs@oss.sgi.com
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com>
Cc: Ben Hutchings <ben@decadent.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Guo Chao <yan@linux.vnet.ibm.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Asai Thambi S P <asamymuthupa@micron.com>
Cc: Selvan Mani <smani@micron.com>
Cc: Sam Bradshaw <sbradshaw@micron.com>
Cc: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Cc: "Roger Pau Monné" <roger.pau@citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Ian Campbell <Ian.Campbell@citrix.com>
Cc: Sebastian Ott <sebott@linux.vnet.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Jiang Liu <jiang.liu@huawei.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Jerome Marchand <jmarchand@redhat.com>
Cc: Joe Perches <joe@perches.com>
Cc: Peng Tao <tao.peng@emc.com>
Cc: Andy Adamson <andros@netapp.com>
Cc: fanchaoting <fanchaoting@cn.fujitsu.com>
Cc: Jie Liu <jeff.liu@oracle.com>
Cc: Sunil Mushran <sunil.mushran@gmail.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: Namjae Jeon <namjae.jeon@samsung.com>
Cc: Pankaj Kumar <pankaj.km@samsung.com>
Cc: Dan Magenheimer <dan.magenheimer@oracle.com>
Cc: Mel Gorman <mgorman@suse.de>6
2013-11-23 22:33:47 -08:00
Vince Weaver 0022cedd4a perf/trace: Properly use u64 to hold event_id
The 64-bit attr.config value for perf trace events was being copied into
an "int" before doing a comparison, meaning the top 32 bits were
being truncated.

As far as I can tell this didn't cause any errors, but it did mean
it was possible to create valid aliases for all the tracepoint ids
which I don't think was intended.  (For example, 0xffffffff00000018
and 0x18 both enable the same tracepoint).

Signed-off-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1311151236100.11932@vincent-weaver-1.um.maine.edu
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-11-19 16:57:44 +01:00
Peter Zijlstra d5b5f391d4 ftrace, perf: Avoid infinite event generation loop
Vince's perf-trinity fuzzer found yet another 'interesting' problem.

When we sample the irq_work_exit tracepoint with period==1 (or
PERF_SAMPLE_PERIOD) and we add an fasync SIGNAL handler we create an
infinite event generation loop:

  ,-> <IPI>
  |     irq_work_exit() ->
  |       trace_irq_work_exit() ->
  |         ...
  |           __perf_event_overflow() -> (due to fasync)
  |             irq_work_queue() -> (irq_work_list must be empty)
  '---------      arch_irq_work_raise()

Similar things can happen due to regular poll() wakeups if we exceed
the ring-buffer wakeup watermark, or have an event_limit.

To avoid this, dis-allow sampling this particular tracepoint.

In order to achieve this, create a special perf_perm function pointer
for each event and call this (when set) on trying to create a
tracepoint perf event.

[ roasted: use expr... to allow for ',' in your expression ]

Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Tested-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Dave Jones <davej@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/20131114152304.GC5364@laptop.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-11-19 16:57:40 +01:00
Linus Torvalds b29c8306a3 This batch of changes is mostly clean ups and small bug fixes.
The only real feature that was added this release is from Namhyung Kim,
 who introduced "set_graph_notrace" filter that lets you run the function
 graph tracer and not trace particular functions and their call chain.
 
 Tom Zanussi added some updates to the ftrace multibuffer tracing that
 made it more consistent with the top level tracing.
 
 One of the fixes for perf function tracing required an API change in
 RCU; the addition of "rcu_is_watching()". As Paul McKenney is pushing
 that change in this release too, he gave me a branch that included
 all the changes to get that working, and I pulled that into my tree
 in order to complete the perf function tracing fix.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.14 (GNU/Linux)
 
 iQEcBAABAgAGBQJSgX5SAAoJEKQekfcNnQGulUAH/jORqJrKaNAulmZ314VsAqfa
 zMtF5UAAPf7kqc3AN/jtFrhJUNEfxWOo7A4r0FsM/rKdWJF+98GA6aqYVD+XoWFt
 +36fg1enxbXUjixQ96Uh+o1+BJUgYDqljuWzqSu/oiXWfWwl8+WL4kcbhb+V9WcF
 SpdzLCWVZRfhyDiN3+0zvyQ8RSG2Pd7CWn9zroI0e4sxGo0Ki6JUnIcXtZGOBDOQ
 IIZdjXvGSfpJ+3u3XvRPXJcltRCtOsVWxYzrmvRlmHDW5QMe1+WmmrlojTePrLaJ
 xn8+3WINqetAR+ZQnazbpt1XzJzKa8QtFgpiN0kT6qL7cg3N1Owc4vLGohl7wok=
 =Nesf
 -----END PGP SIGNATURE-----

Merge tag 'trace-3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing update from Steven Rostedt:
 "This batch of changes is mostly clean ups and small bug fixes.  The
  only real feature that was added this release is from Namhyung Kim,
  who introduced "set_graph_notrace" filter that lets you run the
  function graph tracer and not trace particular functions and their
  call chain.

  Tom Zanussi added some updates to the ftrace multibuffer tracing that
  made it more consistent with the top level tracing.

  One of the fixes for perf function tracing required an API change in
  RCU; the addition of "rcu_is_watching()".  As Paul McKenney is pushing
  that change in this release too, he gave me a branch that included all
  the changes to get that working, and I pulled that into my tree in
  order to complete the perf function tracing fix"

* tag 'trace-3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Add rcu annotation for syscall trace descriptors
  tracing: Do not use signed enums with unsigned long long in fgragh output
  tracing: Remove unused function ftrace_off_permanent()
  tracing: Do not assign filp->private_data to freed memory
  tracing: Add helper function tracing_is_disabled()
  tracing: Open tracer when ftrace_dump_on_oops is used
  tracing: Add support for SOFT_DISABLE to syscall events
  tracing: Make register/unregister_ftrace_command __init
  tracing: Update event filters for multibuffer
  recordmcount.pl: Add support for __fentry__
  ftrace: Have control op function callback only trace when RCU is watching
  rcu: Do not trace rcu_is_watching() functions
  ftrace/x86: skip over the breakpoint for ftrace caller
  trace/trace_stat: use rbtree postorder iteration helper instead of opencoding
  ftrace: Add set_graph_notrace filter
  ftrace: Narrow down the protected area of graph_lock
  ftrace: Introduce struct ftrace_graph_data
  ftrace: Get rid of ftrace_graph_filter_enabled
  tracing: Fix potential out-of-bounds in trace_get_user()
  tracing: Show more exact help information about snapshot
2013-11-16 12:23:18 -08:00
Linus Torvalds 0910c0bdf7 Merge branch 'for-3.13/core' of git://git.kernel.dk/linux-block
Pull block IO core updates from Jens Axboe:
 "This is the pull request for the core changes in the block layer for
  3.13.  It contains:

   - The new blk-mq request interface.

     This is a new and more scalable queueing model that marries the
     best part of the request based interface we currently have (which
     is fully featured, but scales poorly) and the bio based "interface"
     which the new drivers for high IOPS devices end up using because
     it's much faster than the request based one.

     The bio interface has no block layer support, since it taps into
     the stack much earlier.  This means that drivers end up having to
     implement a lot of functionality on their own, like tagging,
     timeout handling, requeue, etc.  The blk-mq interface provides all
     these.  Some drivers even provide a switch to select bio or rq and
     has code to handle both, since things like merging only works in
     the rq model and hence is faster for some workloads.  This is a
     huge mess.  Conversion of these drivers nets us a substantial code
     reduction.  Initial results on converting SCSI to this model even
     shows an 8x improvement on single queue devices.  So while the
     model was intended to work on the newer multiqueue devices, it has
     substantial improvements for "classic" hardware as well.  This code
     has gone through extensive testing and development, it's now ready
     to go.  A pull request is coming to convert virtio-blk to this
     model will be will be coming as well, with more drivers scheduled
     for 3.14 conversion.

   - Two blktrace fixes from Jan and Chen Gang.

   - A plug merge fix from Alireza Haghdoost.

   - Conversion of __get_cpu_var() from Christoph Lameter.

   - Fix for sector_div() with 64-bit divider from Geert Uytterhoeven.

   - A fix for a race between request completion and the timeout
     handling from Jeff Moyer.  This is what caused the merge conflict
     with blk-mq/core, in case you are looking at that.

   - A dm stacking fix from Mike Snitzer.

   - A code consolidation fix and duplicated code removal from Kent
     Overstreet.

   - A handful of block bug fixes from Mikulas Patocka, fixing a loop
     crash and memory corruption on blk cg.

   - Elevator switch bug fix from Tomoki Sekiyama.

  A heads-up that I had to rebase this branch.  Initially the immutable
  bio_vecs had been queued up for inclusion, but a week later, it became
  clear that it wasn't fully cooked yet.  So the decision was made to
  pull this out and postpone it until 3.14.  It was a straight forward
  rebase, just pruning out the immutable series and the later fixes of
  problems with it.  The rest of the patches applied directly and no
  further changes were made"

* 'for-3.13/core' of git://git.kernel.dk/linux-block: (31 commits)
  block: replace IS_ERR and PTR_ERR with PTR_ERR_OR_ZERO
  block: replace IS_ERR and PTR_ERR with PTR_ERR_OR_ZERO
  block: Do not call sector_div() with a 64-bit divisor
  kernel: trace: blktrace: remove redundent memcpy() in compat_blk_trace_setup()
  block: Consolidate duplicated bio_trim() implementations
  block: Use rw_copy_check_uvector()
  block: Enable sysfs nomerge control for I/O requests in the plug list
  block: properly stack underlying max_segment_size to DM device
  elevator: acquire q->sysfs_lock in elevator_change()
  elevator: Fix a race in elevator switching and md device initialization
  block: Replace __get_cpu_var uses
  bdi: test bdi_init failure
  block: fix a probe argument to blk_register_region
  loop: fix crash if blk_alloc_queue fails
  blk-core: Fix memory corruption if blkcg_init_queue fails
  block: fix race between request completion and timeout handling
  blktrace: Send BLK_TN_PROCESS events to all running traces
  blk-mq: don't disallow request merges for req->special being set
  blk-mq: mq plug list breakage
  blk-mq: fix for flush deadlock
  ...
2013-11-14 12:08:14 +09:00
Linus Torvalds 39cf275a1a Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler changes from Ingo Molnar:
 "The main changes in this cycle are:

   - (much) improved CONFIG_NUMA_BALANCING support from Mel Gorman, Rik
     van Riel, Peter Zijlstra et al.  Yay!

   - optimize preemption counter handling: merge the NEED_RESCHED flag
     into the preempt_count variable, by Peter Zijlstra.

   - wait.h fixes and code reorganization from Peter Zijlstra

   - cfs_bandwidth fixes from Ben Segall

   - SMP load-balancer cleanups from Peter Zijstra

   - idle balancer improvements from Jason Low

   - other fixes and cleanups"

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (129 commits)
  ftrace, sched: Add TRACE_FLAG_PREEMPT_RESCHED
  stop_machine: Fix race between stop_two_cpus() and stop_cpus()
  sched: Remove unnecessary iteration over sched domains to update nr_busy_cpus
  sched: Fix asymmetric scheduling for POWER7
  sched: Move completion code from core.c to completion.c
  sched: Move wait code from core.c to wait.c
  sched: Move wait.c into kernel/sched/
  sched/wait: Fix __wait_event_interruptible_lock_irq_timeout()
  sched: Avoid throttle_cfs_rq() racing with period_timer stopping
  sched: Guarantee new group-entities always have weight
  sched: Fix hrtimer_cancel()/rq->lock deadlock
  sched: Fix cfs_bandwidth misuse of hrtimer_expires_remaining
  sched: Fix race on toggling cfs_bandwidth_used
  sched: Remove extra put_online_cpus() inside sched_setaffinity()
  sched/rt: Fix task_tick_rt() comment
  sched/wait: Fix build breakage
  sched/wait: Introduce prepare_to_wait_event()
  sched/wait: Add ___wait_cond_timeout() to wait_event*_timeout() too
  sched: Remove get_online_cpus() usage
  sched: Fix race in migrate_swap_stop()
  ...
2013-11-12 10:20:12 +09:00
Steven Rostedt (Red Hat) 3a81a5210b tracing: Add rcu annotation for syscall trace descriptors
sparse complains about the enter/exit_sysycall_files[] variables being
dereferenced with rcu_dereference_sched(). The fields need to be
annotated with __rcu.

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-11-11 11:47:06 -05:00
Peter Zijlstra e5137b50a0 ftrace, sched: Add TRACE_FLAG_PREEMPT_RESCHED
Since the introduction of PREEMPT_NEED_RESCHED in:

  f27dde8dee ("sched: Add NEED_RESCHED to the preempt_count")

we need to be able to look at both TIF_NEED_RESCHED and
PREEMPT_NEED_RESCHED to understand the full preemption behaviour.

Add it to the trace output.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Link: http://lkml.kernel.org/r/20131004152826.GP3081@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-11-11 12:43:39 +01:00
Chen Gang f8c5e94486 kernel: trace: blktrace: remove redundent memcpy() in compat_blk_trace_setup()
do_blk_trace_setup() will fully initialize 'buts.name', so can remove
the related memcpy(). And also use BLKTRACE_BDEV_SIZE and ARRAY_SIZE
instead of hard code number '32'.

Signed-off-by: Chen Gang <gang.chen@asianux.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-11-08 09:04:30 -07:00
Jan Kara a404d5576b blktrace: Send BLK_TN_PROCESS events to all running traces
Currently each task sends BLK_TN_PROCESS event to the first traced
device it interacts with after a new trace is started. When there are
several traced devices and the task accesses more devices, this logic
can result in BLK_TN_PROCESS being sent several times to some devices
while it is never sent to other devices. Thus blkparse doesn't display
command name when parsing some blktrace files.

Fix the problem by sending BLK_TN_PROCESS event to all traced devices
when a task interacts with any of them.

Signed-off-by: Jan Kara <jack@suse.cz>
Review-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-11-08 08:59:00 -07:00
Steven Rostedt (Red Hat) 6fc84ea70e tracing: Do not use signed enums with unsigned long long in fgragh output
The duration field of print_graph_duration() can also be used
to do the space filling by passing an enum in it:

  DURATION_FILL_FULL
  DURATION_FILL_START
  DURATION_FILL_END

The problem is that these are enums and defined as negative,
but the duration field is unsigned long long. Most archs are
fine with this but blackfin fails to compile because of it:

kernel/built-in.o: In function `print_graph_duration':
kernel/trace/trace_functions_graph.c:782: undefined reference to `__ucmpdi2'

Overloading a unsigned long long with an signed enum is just
bad in principle. We can accomplish the same thing by using
part of the flags field instead.

Cc: Mike Frysinger <vapier@gentoo.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-11-06 15:26:56 -05:00
Steven Rostedt (Red Hat) 042b10d83d tracing: Remove unused function ftrace_off_permanent()
In the past, ftrace_off_permanent() was called if something
strange was detected. But the ftrace_bug() now handles all the
anomolies that can happen with ftrace (function tracing), and there
are no uses of ftrace_off_permanent(). Get rid of it.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-11-06 15:26:55 -05:00
Geyslan G. Bem d6d3523caa tracing: Do not assign filp->private_data to freed memory
In system_tr_open(), the filp->private_data can be assigned the 'dir'
variable even if it was freed. This is on the error path, and is
harmless because the error return code will prevent filp->private_data
from being used. But for correctness, we should not assign it to
a recently freed variable, as that can cause static tools to give
false warnings.

Also have both subsystem_open() and system_tr_open() return -ENODEV
if tracing has been disabled.

Link: http://lkml.kernel.org/r/1383764571-7318-1-git-send-email-geyslan@gmail.com

Signed-off-by: Geyslan G. Bem <geyslan@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-11-06 15:26:54 -05:00
Steven Rostedt 12ae030d54 perf/ftrace: Fix paranoid level for enabling function tracer
The current default perf paranoid level is "1" which has
"perf_paranoid_kernel()" return false, and giving any operations that
use it, access to normal users. Unfortunately, this includes function
tracing and normal users should not be allowed to enable function
tracing by default.

The proper level is defined at "-1" (full perf access), which
"perf_paranoid_tracepoint_raw()" will only give access to. Use that
check instead for enabling function tracing.

Reported-by: Dave Jones <davej@redhat.com>
Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Tested-by: Vince Weaver <vincent.weaver@maine.edu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: stable@vger.kernel.org # 3.4+
CVE: CVE-2013-2930
Fixes: ced39002f5 ("ftrace, perf: Add support to use function tracepoint in perf")
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-11-06 14:44:49 -05:00
Geyslan G. Bem 2e86421deb tracing: Add helper function tracing_is_disabled()
This patch creates the function 'tracing_is_disabled', which
can be used outside of trace.c.

Link: http://lkml.kernel.org/r/1382141754-12155-1-git-send-email-geyslan@gmail.com

Signed-off-by: Geyslan G. Bem <geyslan@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-11-06 11:06:00 -05:00
Cody P Schafer b2f974d6af tracing: Open tracer when ftrace_dump_on_oops is used
With ftrace_dump_on_oops, we previously did not open the tracer in
question, sometimes causing the trace output to be useless.

For example, the function_graph tracer with tracing_thresh set dumped via
ftrace_dump_on_oops would show a series of '}' indented at different levels,
but no function names.

call trace->open() (and do a few other fixups copied from the normal dump
path) to make the output more intelligible.

Link: http://lkml.kernel.org/r/1382554197-16961-1-git-send-email-cody@linux.vnet.ibm.com

Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-11-06 10:03:11 -05:00
Tom Zanussi d562aff93b tracing: Add support for SOFT_DISABLE to syscall events
The original SOFT_DISABLE patches didn't add support for soft disable
of syscall events; this adds it.

Add an array of ftrace_event_file pointers indexed by syscall number
to the trace array and remove the existing enabled bitmaps, which as a
result are now redundant.  The ftrace_event_file structs in turn
contain the soft disable flags we need for per-syscall soft disable
accounting.

Adding ftrace_event_files also means we can remove the USE_CALL_FILTER
bit, thus enabling multibuffer filter support for syscall events.

Link: http://lkml.kernel.org/r/6e72b566e85d8df8042f133efbc6c30e21fb017e.1382620672.git.tom.zanussi@linux.intel.com

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-11-05 17:48:49 -05:00
Tom Zanussi 38de93abec tracing: Make register/unregister_ftrace_command __init
register/unregister_ftrace_command() are only ever called from __init
functions, so can themselves be made __init.

Also make register_snapshot_cmd() __init for the same reason.

Link: http://lkml.kernel.org/r/d4042c8cadb7ae6f843ac9a89a24e1c6a3099727.1382620672.git.tom.zanussi@linux.intel.com

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-11-05 17:43:40 -05:00
Tom Zanussi f306cc82a9 tracing: Update event filters for multibuffer
The trace event filters are still tied to event calls rather than
event files, which means you don't get what you'd expect when using
filters in the multibuffer case:

Before:

  # echo 'bytes_alloc > 8192' > /sys/kernel/debug/tracing/events/kmem/kmalloc/filter
  # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/filter
  bytes_alloc > 8192
  # mkdir /sys/kernel/debug/tracing/instances/test1
  # echo 'bytes_alloc > 2048' > /sys/kernel/debug/tracing/instances/test1/events/kmem/kmalloc/filter
  # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/filter
  bytes_alloc > 2048
  # cat /sys/kernel/debug/tracing/instances/test1/events/kmem/kmalloc/filter
  bytes_alloc > 2048

Setting the filter in tracing/instances/test1/events shouldn't affect
the same event in tracing/events as it does above.

After:

  # echo 'bytes_alloc > 8192' > /sys/kernel/debug/tracing/events/kmem/kmalloc/filter
  # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/filter
  bytes_alloc > 8192
  # mkdir /sys/kernel/debug/tracing/instances/test1
  # echo 'bytes_alloc > 2048' > /sys/kernel/debug/tracing/instances/test1/events/kmem/kmalloc/filter
  # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/filter
  bytes_alloc > 8192
  # cat /sys/kernel/debug/tracing/instances/test1/events/kmem/kmalloc/filter
  bytes_alloc > 2048

We'd like to just move the filter directly from ftrace_event_call to
ftrace_event_file, but there are a couple cases that don't yet have
multibuffer support and therefore have to continue using the current
event_call-based filters.  For those cases, a new USE_CALL_FILTER bit
is added to the event_call flags, whose main purpose is to keep the
old behavior for those cases until they can be updated with
multibuffer support; at that point, the USE_CALL_FILTER flag (and the
new associated call_filter_check_discard() function) can go away.

The multibuffer support also made filter_current_check_discard()
redundant, so this change removes that function as well and replaces
it with filter_check_discard() (or call_filter_check_discard() as
appropriate).

Link: http://lkml.kernel.org/r/f16e9ce4270c62f46b2e966119225e1c3cca7e60.1382620672.git.tom.zanussi@linux.intel.com

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-11-05 16:50:20 -05:00
Steven Rostedt (Red Hat) b5aa3a472b ftrace: Have control op function callback only trace when RCU is watching
Dave Jones reported that trinity would be able to trigger the following
back trace:

 ===============================
 [ INFO: suspicious RCU usage. ]
 3.10.0-rc2+ #38 Not tainted
 -------------------------------
 include/linux/rcupdate.h:771 rcu_read_lock() used illegally while idle!
 other info that might help us debug this:

 RCU used illegally from idle CPU!  rcu_scheduler_active = 1, debug_locks = 0
 RCU used illegally from extended quiescent state!
 1 lock held by trinity-child1/18786:
  #0:  (rcu_read_lock){.+.+..}, at: [<ffffffff8113dd48>] __perf_event_overflow+0x108/0x310
 stack backtrace:
 CPU: 3 PID: 18786 Comm: trinity-child1 Not tainted 3.10.0-rc2+ #38
  0000000000000000 ffff88020767bac8 ffffffff816e2f6b ffff88020767baf8
  ffffffff810b5897 ffff88021de92520 0000000000000000 ffff88020767bbf8
  0000000000000000 ffff88020767bb78 ffffffff8113ded4 ffffffff8113dd48
 Call Trace:
  [<ffffffff816e2f6b>] dump_stack+0x19/0x1b
  [<ffffffff810b5897>] lockdep_rcu_suspicious+0xe7/0x120
  [<ffffffff8113ded4>] __perf_event_overflow+0x294/0x310
  [<ffffffff8113dd48>] ? __perf_event_overflow+0x108/0x310
  [<ffffffff81309289>] ? __const_udelay+0x29/0x30
  [<ffffffff81076054>] ? __rcu_read_unlock+0x54/0xa0
  [<ffffffff816f4000>] ? ftrace_call+0x5/0x2f
  [<ffffffff8113dfa1>] perf_swevent_overflow+0x51/0xe0
  [<ffffffff8113e08f>] perf_swevent_event+0x5f/0x90
  [<ffffffff8113e1c9>] perf_tp_event+0x109/0x4f0
  [<ffffffff8113e36f>] ? perf_tp_event+0x2af/0x4f0
  [<ffffffff81074630>] ? __rcu_read_lock+0x20/0x20
  [<ffffffff8112d79f>] perf_ftrace_function_call+0xbf/0xd0
  [<ffffffff8110e1e1>] ? ftrace_ops_control_func+0x181/0x210
  [<ffffffff81074630>] ? __rcu_read_lock+0x20/0x20
  [<ffffffff81100cae>] ? rcu_eqs_enter_common+0x5e/0x470
  [<ffffffff8110e1e1>] ftrace_ops_control_func+0x181/0x210
  [<ffffffff816f4000>] ftrace_call+0x5/0x2f
  [<ffffffff8110e229>] ? ftrace_ops_control_func+0x1c9/0x210
  [<ffffffff816f4000>] ? ftrace_call+0x5/0x2f
  [<ffffffff81074635>] ? debug_lockdep_rcu_enabled+0x5/0x40
  [<ffffffff81074635>] ? debug_lockdep_rcu_enabled+0x5/0x40
  [<ffffffff81100cae>] ? rcu_eqs_enter_common+0x5e/0x470
  [<ffffffff8110112a>] rcu_eqs_enter+0x6a/0xb0
  [<ffffffff81103673>] rcu_user_enter+0x13/0x20
  [<ffffffff8114541a>] user_enter+0x6a/0xd0
  [<ffffffff8100f6d8>] syscall_trace_leave+0x78/0x140
  [<ffffffff816f46af>] int_check_syscall_exit_work+0x34/0x3d
 ------------[ cut here ]------------

Perf uses rcu_read_lock() but as the function tracer can trace functions
even when RCU is not currently active, this makes the rcu_read_lock()
used by perf ineffective.

As perf is currently the only user of the ftrace_ops_control_func() and
perf is also the only function callback that actively uses rcu_read_lock(),
the quick fix is to prevent the ftrace_ops_control_func() from calling
its callbacks if RCU is not active.

With Paul's new "rcu_is_watching()" we can tell if RCU is active or not.

Reported-by: Dave Jones <davej@redhat.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-11-05 16:04:26 -05:00
Cody P Schafer 9cd804ac1f trace/trace_stat: use rbtree postorder iteration helper instead of opencoding
Use rbtree_postorder_for_each_entry_safe() to destroy the rbtree instead
of opencoding an alternate postorder iteration that modifies the tree

Link: http://lkml.kernel.org/r/1383345566-25087-2-git-send-email-cody@linux.vnet.ibm.com

Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-11-05 16:01:47 -05:00
Namhyung Kim 29ad23b004 ftrace: Add set_graph_notrace filter
The set_graph_notrace filter is analogous to set_ftrace_notrace and
can be used for eliminating uninteresting part of function graph trace
output.  It also works with set_graph_function nicely.

  # cd /sys/kernel/debug/tracing/
  # echo do_page_fault > set_graph_function
  # perf ftrace live true
   2)               |  do_page_fault() {
   2)               |    __do_page_fault() {
   2)   0.381 us    |      down_read_trylock();
   2)   0.055 us    |      __might_sleep();
   2)   0.696 us    |      find_vma();
   2)               |      handle_mm_fault() {
   2)               |        handle_pte_fault() {
   2)               |          __do_fault() {
   2)               |            filemap_fault() {
   2)               |              find_get_page() {
   2)   0.033 us    |                __rcu_read_lock();
   2)   0.035 us    |                __rcu_read_unlock();
   2)   1.696 us    |              }
   2)   0.031 us    |              __might_sleep();
   2)   2.831 us    |            }
   2)               |            _raw_spin_lock() {
   2)   0.046 us    |              add_preempt_count();
   2)   0.841 us    |            }
   2)   0.033 us    |            page_add_file_rmap();
   2)               |            _raw_spin_unlock() {
   2)   0.057 us    |              sub_preempt_count();
   2)   0.568 us    |            }
   2)               |            unlock_page() {
   2)   0.084 us    |              page_waitqueue();
   2)   0.126 us    |              __wake_up_bit();
   2)   1.117 us    |            }
   2)   7.729 us    |          }
   2)   8.397 us    |        }
   2)   8.956 us    |      }
   2)   0.085 us    |      up_read();
   2) + 12.745 us   |    }
   2) + 13.401 us   |  }
  ...

  # echo handle_mm_fault > set_graph_notrace
  # perf ftrace live true
   1)               |  do_page_fault() {
   1)               |    __do_page_fault() {
   1)   0.205 us    |      down_read_trylock();
   1)   0.041 us    |      __might_sleep();
   1)   0.344 us    |      find_vma();
   1)   0.069 us    |      up_read();
   1)   4.692 us    |    }
   1)   5.311 us    |  }
  ...

Link: http://lkml.kernel.org/r/1381739066-7531-5-git-send-email-namhyung@kernel.org

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-10-18 22:23:16 -04:00
Namhyung Kim 6a10108bdb ftrace: Narrow down the protected area of graph_lock
The parser set up is just a generic utility that uses local variables
allocated by the function. There's no need to hold the graph_lock for
this set up.

This also makes the code simpler.

Link: http://lkml.kernel.org/r/1381739066-7531-4-git-send-email-namhyung@kernel.org

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-10-18 22:20:33 -04:00
Namhyung Kim faf982a60f ftrace: Introduce struct ftrace_graph_data
The struct ftrace_graph_data is for generalizing the access to
set_graph_function file.  This is a preparation for adding support to
set_graph_notrace.

Link: http://lkml.kernel.org/r/1381739066-7531-3-git-send-email-namhyung@kernel.org

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-10-18 22:17:51 -04:00
Namhyung Kim 9aa72b4bf8 ftrace: Get rid of ftrace_graph_filter_enabled
The ftrace_graph_filter_enabled means that user sets function filter
and it always has same meaning of ftrace_graph_count > 0.

Link: http://lkml.kernel.org/r/1381739066-7531-2-git-send-email-namhyung@kernel.org

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-10-18 22:15:25 -04:00
Steven Rostedt 057db8488b tracing: Fix potential out-of-bounds in trace_get_user()
Andrey reported the following report:

ERROR: AddressSanitizer: heap-buffer-overflow on address ffff8800359c99f3
ffff8800359c99f3 is located 0 bytes to the right of 243-byte region [ffff8800359c9900, ffff8800359c99f3)
Accessed by thread T13003:
  #0 ffffffff810dd2da (asan_report_error+0x32a/0x440)
  #1 ffffffff810dc6b0 (asan_check_region+0x30/0x40)
  #2 ffffffff810dd4d3 (__tsan_write1+0x13/0x20)
  #3 ffffffff811cd19e (ftrace_regex_release+0x1be/0x260)
  #4 ffffffff812a1065 (__fput+0x155/0x360)
  #5 ffffffff812a12de (____fput+0x1e/0x30)
  #6 ffffffff8111708d (task_work_run+0x10d/0x140)
  #7 ffffffff810ea043 (do_exit+0x433/0x11f0)
  #8 ffffffff810eaee4 (do_group_exit+0x84/0x130)
  #9 ffffffff810eafb1 (SyS_exit_group+0x21/0x30)
  #10 ffffffff81928782 (system_call_fastpath+0x16/0x1b)

Allocated by thread T5167:
  #0 ffffffff810dc778 (asan_slab_alloc+0x48/0xc0)
  #1 ffffffff8128337c (__kmalloc+0xbc/0x500)
  #2 ffffffff811d9d54 (trace_parser_get_init+0x34/0x90)
  #3 ffffffff811cd7b3 (ftrace_regex_open+0x83/0x2e0)
  #4 ffffffff811cda7d (ftrace_filter_open+0x2d/0x40)
  #5 ffffffff8129b4ff (do_dentry_open+0x32f/0x430)
  #6 ffffffff8129b668 (finish_open+0x68/0xa0)
  #7 ffffffff812b66ac (do_last+0xb8c/0x1710)
  #8 ffffffff812b7350 (path_openat+0x120/0xb50)
  #9 ffffffff812b8884 (do_filp_open+0x54/0xb0)
  #10 ffffffff8129d36c (do_sys_open+0x1ac/0x2c0)
  #11 ffffffff8129d4b7 (SyS_open+0x37/0x50)
  #12 ffffffff81928782 (system_call_fastpath+0x16/0x1b)

Shadow bytes around the buggy address:
  ffff8800359c9700: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  ffff8800359c9780: fd fd fd fd fd fd fd fd fa fa fa fa fa fa fa fa
  ffff8800359c9800: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  ffff8800359c9880: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  ffff8800359c9900: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>ffff8800359c9980: 00 00 00 00 00 00 00 00 00 00 00 00 00 00[03]fb
  ffff8800359c9a00: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  ffff8800359c9a80: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  ffff8800359c9b00: fa fa fa fa fa fa fa fa 00 00 00 00 00 00 00 00
  ffff8800359c9b80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  ffff8800359c9c00: 00 00 00 00 00 00 00 00 fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07
  Heap redzone:          fa
  Heap kmalloc redzone:  fb
  Freed heap region:     fd
  Shadow gap:            fe

The out-of-bounds access happens on 'parser->buffer[parser->idx] = 0;'

Although the crash happened in ftrace_regex_open() the real bug
occurred in trace_get_user() where there's an incrementation to
parser->idx without a check against the size. The way it is triggered
is if userspace sends in 128 characters (EVENT_BUF_SIZE + 1), the loop
that reads the last character stores it and then breaks out because
there is no more characters. Then the last character is read to determine
what to do next, and the index is incremented without checking size.

Then the caller of trace_get_user() usually nulls out the last character
with a zero, but since the index is equal to the size, it writes a nul
character after the allocated space, which can corrupt memory.

Luckily, only root user has write access to this file.

Link: http://lkml.kernel.org/r/20131009222323.04fd1a0d@gandalf.local.home

Reported-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-10-18 21:02:56 -04:00
Wang YanQing b9be6d026d tracing: Show more exact help information about snapshot
The current "help" that comes out of the snapshot file when it is
not allocated looks like this:

 # * Snapshot is freed *
 #
 # Snapshot commands:
 # echo 0 > snapshot : Clears and frees snapshot buffer
 # echo 1 > snapshot : Allocates snapshot buffer, if not already allocated.
 #                      Takes a snapshot of the main buffer.
 # echo 2 > snapshot : Clears snapshot buffer (but does not allocate)
 #                      (Doesn't have to be '2' works with any number that
 #                       is not a '0' or '1')

Echo 2 says that it does not allocate the buffer, which is correct,
but to be more consistent with "echo 0" it should also state
that it does not free.

Link: http://lkml.kernel.org/r/20130914045916.GA4243@udknight

Signed-off-by: Wang YanQing <udknight@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-10-09 21:38:22 -04:00