Commit Graph

662342 Commits

Author SHA1 Message Date
Steven Rostedt (VMware) 73a757e631 ring-buffer: Return reader page back into existing ring buffer
When reading the ring buffer for consuming, it is optimized for splice,
where a page is taken out of the ring buffer (zero copy) and sent to the
reading consumer. When the read is finished with the page, it calls
ring_buffer_free_read_page(), which simply frees the page. The next time the
reader needs to get a page from the ring buffer, it must call
ring_buffer_alloc_read_page() which allocates and initializes a reader page
for the ring buffer to be swapped into the ring buffer for a new filled page
for the reader.

The problem is that there's no reason to actually free the page when it is
passed back to the ring buffer. It can hold it off and reuse it for the next
iteration. This completely removes the interaction with the page_alloc
mechanism.

Using the trace-cmd utility to record all events (causing trace-cmd to
require reading lots of pages from the ring buffer, and calling
ring_buffer_alloc/free_read_page() several times), and also assigning a
stack trace trigger to the mm_page_alloc event, we can see how many times
the ring_buffer_alloc_read_page() needed to allocate a page for the ring
buffer.

Before this change:

  # trace-cmd record -e all -e mem_page_alloc -R stacktrace sleep 1
  # trace-cmd report |grep ring_buffer_alloc_read_page | wc -l
  9968

After this change:

  # trace-cmd record -e all -e mem_page_alloc -R stacktrace sleep 1
  # trace-cmd report |grep ring_buffer_alloc_read_page | wc -l
  4

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-05-01 10:26:40 -04:00
Steven Rostedt (VMware) ca2958f14c selftests: ftrace: Allow some event trigger tests to run in an instance
Some of the event triggers can run fine in an instance. Have them tested in
one as well. The ones that still need work are the snapshot, stacktrace and
traceon/off triggers, as they don't currently pass a handle to the
trace_array they are attached to. But that can be for a future project.

Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-26 08:54:49 -04:00
Steven Rostedt (VMware) 03c201759e selftests: ftrace: Have some basic tests run in a tracing instance too
Some of the basic ftrace selftests should also be run in an instance. These
test a quick case of running all tracers in the available_tracers file
within the instance. The other is testing the clock used for the instance.

Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-26 08:54:40 -04:00
Steven Rostedt (VMware) 35df6a894e selftests: ftrace: Have event tests also run in an tracing instance
The ftrace selftests of events: event-enable, event-pid, and
subsystem-enable can all be run inside an instance. Change their tests to do
both a toplevel run an an instance run.

Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-26 08:54:32 -04:00
Steven Rostedt (VMware) 6df0fee3e8 selftests: ftrace: Make func_event_triggers and func_traceonoff_triggers tests do instances
Both the func_event_triggers and func_traceonoff_triggers tests can be
performed in both the toplevel instance as well as for individual instances.
Have their tests run in both cases.

Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-26 08:54:23 -04:00
Steven Rostedt (VMware) b5b77be812 selftests: ftrace: Allow some tests to be run in a tracing instance
An tracing instance has several of the same capabilities as the top level
instance, but may be implemented slightly different. Instead of just writing
tests that duplicat the same test cases of the top level instance, allow a
test to be written for both the top level as well as for an instance.

If a test case can be run in both the top level as well as in an tracing
instance directory, then it should add a tag "# flags: instance" in the
header of the test file. Then after all tests have run, any test that has an
instance flag set, will run again within a tracing instance.

Link: http://lkml.kernel.org/r/20170421233850.1d0e9e05@gandalf.local.home

Cc: Shuah Khan <shuah@kernel.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-26 08:54:01 -04:00
Steven Rostedt (VMware) dcc19d2809 tracing/ftrace: Allow for instances to trigger their own stacktrace probes
Have the stacktrace function trigger probe trigger stack traces within the
instance that they were added to in the set_ftrace_filter.

 ># cd /sys/kernel/debug/tracing
 ># mkdir instances/foo
 ># cd instances/foo
 ># echo schedule:stacktrace:1 > set_ftrace_filter
 ># cat trace
 # tracer: nop
 #
 # entries-in-buffer/entries-written: 1/1   #P:4
 #
 #                              _-----=> irqs-off
 #                             / _----=> need-resched
 #                            | / _---=> hardirq/softirq
 #                            || / _--=> preempt-depth
 #                            ||| /     delay
 #           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
 #              | |       |   ||||       |         |
           <idle>-0     [001] .N.2   202.585010: <stack trace>
  =>
  => schedule
  => schedule_preempt_disabled
  => do_idle
  => cpu_startup_entry
  => start_secondary
  => verify_cpu

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-20 22:06:49 -04:00
Steven Rostedt (VMware) 2290f2c589 tracing/ftrace: Allow for the traceonoff probe be unique to instances
Have the traceon/off function probe triggers affect only the instance they
are set in. This required making the trace_on/off accessible for other files
in the tracing directory.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-20 22:06:48 -04:00
Steven Rostedt (VMware) cab5037950 tracing/ftrace: Enable snapshot function trigger to work with instances
Modify the snapshot probe trigger to work with instances. This way the
snapshot function trigger will only affect the instance that it is added to
in the set_ftrace_filter file.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-20 22:06:48 -04:00
Steven Rostedt (VMware) d2afd57a4b tracing/ftrace: Allow instances to have their own function probes
Pass around the local trace_array that is the descriptor for tracing
instances, when enabling and disabling probes. This by default sets the
enable/disable of event probe triggers to work with instances.

The other probes will need some more work to get them working with
instances.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-20 22:06:47 -04:00
Steven Rostedt (VMware) 6e4443199e tracing/ftrace: Add a better way to pass data via the probe functions
With the redesign of the registration and execution of the function probes
(triggers), data can now be passed from the setup of the probe to the probe
callers that are specific to the trace_array it is on. Although, all probes
still only affect the toplevel trace array, this change will allow for
instances to have their own probes separated from other instances and the
top array.

That is, something like the stacktrace probe can be set to trace only in an
instance and not the toplevel trace array. This isn't implement yet, but
this change sets the ground work for the change.

When a probe callback is triggered (someone writes the probe format into
set_ftrace_filter), it calls register_ftrace_function_probe() passing in
init_data that will be used to initialize the probe. Then for every matching
function, register_ftrace_function_probe() will call the probe_ops->init()
function with the init data that was passed to it, as well as an address to
a place holder that is associated with the probe and the instance. The first
occurrence will have a NULL in the pointer. The init() function will then
initialize it. If other probes are added, or more functions are part of the
probe, the place holder will be passed to the init() function with the place
holder data that it was initialized to the last time.

Then this place_holder is passed to each of the other probe_ops functions,
where it can be used in the function callback. When the probe_ops free()
function is called, it can be called either with the rip of the function
that is being removed from the probe, or zero, indicating that there are no
more functions attached to the probe, and the place holder is about to be
freed. This gives the probe_ops a way to free the data it assigned to the
place holder if it was allocade during the first init call.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-20 22:06:46 -04:00
Steven Rostedt (VMware) 7b60f3d876 ftrace: Dynamically create the probe ftrace_ops for the trace_array
In order to eventually have each trace_array instance have its own unique
set of function probes (triggers), the trace array needs to hold the ops and
the filters for the probes.

This is the first step to accomplish this. Instead of having the private
data of the probe ops point to the trace_array, create a separate list that
the trace_array holds. There's only one private_data for a probe, we need
one per trace_array. The probe ftrace_ops will be dynamically created for
each instance, instead of being static.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-20 22:06:46 -04:00
Steven Rostedt (VMware) b5f081b563 tracing: Pass the trace_array into ftrace_probe_ops functions
Pass the trace_array associated to a ftrace_probe_ops into the probe_ops
func(), init() and free() functions. The trace_array is the descriptor that
describes a tracing instance. This will help create the infrastructure that
will allow having function probes unique to tracing instances.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-20 22:06:45 -04:00
Steven Rostedt (VMware) 04ec7bb642 tracing: Have the trace_array hold the list of registered func probes
Add a link list to the trace_array to hold func probes that are registered.
Currently, all function probes are the same for all instances as it was
before, that is, only the top level trace_array holds the function probes.
But this lays the ground work to have function probes be attached to
individual instances, and having the event trigger only affect events in the
given instance. But that work is still to be done.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-20 22:06:45 -04:00
Steven Rostedt (VMware) 8d70725e45 ftrace: If the hash for a probe fails to update then free what was initialized
If the ftrace_hash_move_and_update_ops() fails, and an ops->free() function
exists, then it needs to be called on all the ops that were added by this
registration.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-20 22:06:44 -04:00
Steven Rostedt (VMware) eee8ded131 ftrace: Have the function probes call their own function
Now that the function probes have their own ftrace_ops, there's no reason to
continue using the ftrace_func_hash to find which probe to call in the
function callback. The ops that is passed in to the function callback is
part of the probe_ops to call.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-20 22:06:43 -04:00
Steven Rostedt (VMware) 1ec3a81a0c ftrace: Have each function probe use its own ftrace_ops
Have the function probes have their own ftrace_ops, and remove the
trace_probe_ops. This simplifies some of the ftrace infrastructure code.

Individual entries for each function is still allocated for the use of the
output for set_ftrace_filter, but they will be removed soon too.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-20 22:06:43 -04:00
Steven Rostedt (VMware) d3d532d798 ftrace: Have unregister_ftrace_function_probe_func() return a value
Currently unregister_ftrace_function_probe_func() is a void function. It
does not give any feedback if an error occurred or no item was found to
remove and nothing was done.

Change it to return status and success if it removed something. Also update
the callers to return that feedback to the user.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-20 22:06:42 -04:00
Steven Rostedt (VMware) e16b35ddb8 ftrace: Add helper function ftrace_hash_move_and_update_ops()
The processes of updating a ops filter_hash is a bit complex, and requires
setting up an old hash to perform the update. This is done exactly the same
in two locations for the same reasons. Create a helper function that does it
in one place.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-20 22:06:42 -04:00
Steven Rostedt (VMware) 1a48df0041 ftrace: Remove data field from ftrace_func_probe structure
No users of the function probes uses the data field anymore. Remove it, and
change the init function to take a void *data parameter instead of a
void **data, because the init will just get the data that the registering
function was received, and there's no state after it is called.

The other functions for ftrace_probe_ops still take the data parameter, but
it will currently only be passed NULL. It will stay as a parameter for
future data to be passed to these functions.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-20 22:06:41 -04:00
Steven Rostedt (VMware) 02b77e2afb ftrace: Remove printing of data in showing of a function probe
None of the probe users uses the data field anymore of the entry. They all
have their own print() function. Remove showing the data field in the
generic function as the data field will be going away.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-20 22:06:40 -04:00
Steven Rostedt (VMware) 78f78e07d5 ftrace: Remove unused unregister_ftrace_function_probe_all() function
There are no users of unregister_ftrace_function_probe_all(). The only probe
function that is used is unregister_ftrace_function_probe_func(). Rename the
internal static function __unregister_ftrace_function_probe() to
unregister_ftrace_function_probe_func() and make it global.

Also remove the PROBE_TEST_FUNC as it would be always set.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-20 22:06:40 -04:00
Steven Rostedt (VMware) 0fe7e7e3f8 ftrace: Remove unused unregister_ftrace_function_probe() function
Nothing calls unregister_ftrace_function_probe(). Remove it as well as the
flag PROBE_TEST_DATA, as this function was the only one to set it.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-20 22:06:39 -04:00
Steven Rostedt (VMware) fe014e24b6 ftrace: Convert the rest of the function trigger over to the mapping functions
As the data pointer for individual ips will soon be removed and no longer
passed to the callback function probe handlers, convert the rest of the function
trigger counters over to the new ftrace_func_mapper helper functions.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-20 22:06:39 -04:00
Steven Rostedt (VMware) 1a93f8bd19 tracing: Have the snapshot trigger use the mapping helper functions
As the data pointer for individual ips will soon be removed and no longer
passed to the callback function probe handlers, convert the snapshot
trigger counter over to the new ftrace_func_mapper helper functions.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-20 22:06:38 -04:00
Steven Rostedt (VMware) 41794f1907 ftrace: Added ftrace_func_mapper for function probe triggers
In order to move the ops to the function probes directly, they need a way to
map function ips to their own data without depending on the infrastructure
of the function probes, as the data field will be going away.

New helper functions are added that are based on the ftrace_hash code.
ftrace_func_mapper functions are there to let the probes map ips to their
data. These can be allocated by the probe ops, and referenced in the
function callbacks.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-20 22:06:37 -04:00
Steven Rostedt (VMware) bca6c8d048 ftrace: Pass probe ops to probe function
In preparation to cleaning up the probe function registration code, the
"data" parameter will eventually be removed from the probe->func() call.
Instead it will receive its own "ops" function, in which it can set up its
own data that it needs to map.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-20 22:06:37 -04:00
Steven Rostedt (VMware) e51a989679 ftrace: Remove unused "flags" field from struct ftrace_func_probe
Nothing uses "flags" in the ftrace_func_probe descriptor. Remove it.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-20 22:06:36 -04:00
Steven Rostedt (VMware) 92a68fa047 ftrace: Move the function commands into the tracing directory
As nothing outside the tracing directory uses the function command mechanism,
I'm moving the prototypes out of the include/linux/ftrace.h and into the
local kernel/trace/trace.h header. I plan on making them hook to the
trace_array structure which is local to kernel/trace, and I do not want to
expose it to the rest of the kernel. This requires that the command functions
must also be local to tracing. But luckily nothing else uses them.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-20 22:06:33 -04:00
Steven Rostedt (VMware) ec19b85913 ftrace: Move the probe function into the tracing directory
As nothing outside the tracing directory uses the function probes mechanism,
I'm moving the prototypes out of the include/linux/ftrace.h and into the
local kernel/trace/trace.h header. I plan on making them hook to the
trace_array structure which is local to kernel/trace, and I do not want to
expose it to the rest of the kernel. This requires that the probe functions
must also be local to tracing. But luckily nothing else uses them.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-18 13:49:59 -04:00
Steven Rostedt (VMware) a9064f676e selftests: ftrace: Add test to test reading of set_ftrace_file
The set_ftrace_file lists both functions that are filtered, as well as
function probes (triggers) that are attached to a function, like traceon or
stacktrace, etc. The reading of this file is not as trivial as most pseudo
files are, and there's been various bugs that have appeared in the past
when there's a mix of probes and functions listed. There's also a difference
when reading the file using dd with a block size of 1.

This test performs the following:

 o Resets set_ftrace_filter

 o Makes sure only "#### all functions enabled ####" is listed

    (All checks uses cat, and dd with bs=1 and bs=100)

 o Adds a traceon trigger to schedule

 o Checks if only "#### all function enabled ####" and the trigger is there.

 o Adds tracing of schedule

 o Checks if only schedule and the trigger is there

 o Adds tracing of do_IRQ as well

 o Checks if only schedule, do_IRQ and the trigger is there

 o Adds a traceon trigger to do_IRQ

 o Checks if only schedule, do_IRQ and both triggers are there

 o Removes tracing of do_IRQ

 o Checks if only schedule and both triggers are there

 o Removes tracing of schedule

 o Checks if only  "#### all functions enabled ####" and both triggers are there

 o Removes the triggers

 o Checks if only "#### all functions enabled ####" is there

 o Adds tracing of schedule

 o Checks if only schedule is there

 o Adds tracing of do_IRQ

 o Checks if only schedule and do_IRQ are there

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-18 13:48:27 -04:00
Steven Rostedt (VMware) d8b39e1d98 selftests: ftrace: Add a test to test function triggers to start and stop tracing
This adds a test to test the function tiggers traceon and traceoff to make
sure that it starts and stops tracing when a function is hit.

The test performs the following:

 o Enables all events

 o Writes schedule:traceoff into set_ftrace_filter

 o Makes sure the tigger exists in the file

 o Makes sure the trace file no longer grows

 o Makes sure that tracing_on is now zero

 o Clears the trace file

 o Makes sure it's still empty

 o Removes the trigger

 o Makes sure tracing is still off (tracing_on is zero)

 o Writes schedule:traceon into set_ftrace_filter

 o Makes sure the trace file is no longer empty

 o Makes sure that tracing_on file is set to one

 o Removes the trigger

 o Makes sure the trigger is no longer there

 o Writes schedule:traceoff:3 into set_ftrace_filter

 o Makes sure that tracing_on turns off

   . Writes 1 into tracing_on

   . Makes sure that it turns off 2 more times

 o Writes 1 into tracing_on

 o Makes sure that tracing_on is still a one

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-18 13:48:27 -04:00
Steven Rostedt (VMware) 43bb45da82 selftests: ftrace: Add a selftest to test event enable/disable func trigger
This adds a test to enable and disable trace events via the function
triggers. It tests enabling and disabling the sched:sched_switch event via
the the event_enable and event_disable function triggers attached to the
schedule() kernel function.

The test does the following:

 o disable all events

 o disables or enables the sched_switch event

 o writes schedule:event_enable/disable:sched:sched_switch into set_ftrace_filter

 o 5 times it checks to make sure:

    . Writes 0/1 into the sched_switch/enable

    . Checks that the sched_switch/enable goes back to 1/0

 o Resets the events

 o writes schedule:event_enable/disable:sched:sched_switch:3 into set_ftrace_filter

 o Does a loop of 3 to see that sched_switch/enable file gets updated

 o Makes sure the sched_switch/enable stops getting updated

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-18 13:48:26 -04:00
Steven Rostedt (VMware) 8e5e19c1b9 selftests: ftrace: Add a way to reset triggers in the set_ftrace_filter file
Just writing into the set_ftrace_filter file does not reset triggers, even
though it can reset the function list. Triggers require writing the trigger
name with a "!" prepended. It's worse that it requires the number if the
trigger has a count associated to it.

Add a reset_ftrace_filter function to the ftrace self tests to allow for the
tests to have a generic way to clear them. It also resets any functions that
are listed in that file as well.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-18 13:48:25 -04:00
Namhyung Kim 560642d9ab selftests: ftrace: Add -l/--logdir option
In my virtual machine setup, running ftracetest failed on creating
LOG_DIR on a read-only filesystem.  It'd be convenient to provide an
option to specify a different directory as log directory.

Link: http://lkml.kernel.org/r/20170417024430.21194-4-namhyung@kernel.org

Cc: Ingo Molnar <mingo@kernel.org>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Shuah Khan <shuahkh@osg.samsung.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-17 17:26:50 -04:00
Namhyung Kim 1e10486ffe ftrace: Add 'function-fork' trace option
The function-fork option is same as event-fork that it tracks task
fork/exit and set the pid filter properly.  This can be useful if user
wants to trace selected tasks including their children only.

Link: http://lkml.kernel.org/r/20170417024430.21194-3-namhyung@kernel.org

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-17 17:13:00 -04:00
Steven Rostedt (VMware) b980b117c9 tracing: Have the trace_event benchmark thread call cond_resched_rcu_qs()
The trace_event benchmark thread runs in kernel space in an infinite loop
while also calling cond_resched() in case anything else wants to schedule
in. Unfortunately, on a PREEMPT kernel, that makes it a nop, in which case,
this will never voluntarily schedule. That will cause synchronize_rcu_tasks()
to forever block on this thread, while it is running.

This is exactly what cond_resched_rcu_qs() is for. Use that instead.

Acked-by: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-17 15:21:19 -04:00
Steven Rostedt (VMware) fcdc712579 ftrace: Fix indexing of t_hash_start() from t_next()
t_hash_start() does not increment *pos, where as t_next() must. But when
t_next() does increment *pos, it must still pass in the original *pos to
t_hash_start() otherwise it will skip the first instance:

 # cd /sys/kernel/debug/tracing
 # echo schedule:traceoff > set_ftrace_filter
 # echo do_IRQ:traceoff > set_ftrace_filter
 # echo call_rcu > set_ftrace_filter
 # cat set_ftrace_filter
call_rcu
schedule:traceoff:unlimited
do_IRQ:traceoff:unlimited

The above called t_hash_start() from t_start() as there was only one
function (call_rcu), but if we add another function:

 # echo xfrm_policy_destroy_rcu >> set_ftrace_filter
 # cat set_ftrace_filter
call_rcu
xfrm_policy_destroy_rcu
do_IRQ:traceoff:unlimited

The "schedule:traceoff" disappears.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-17 10:22:29 -04:00
Steven Rostedt (VMware) acceb72e90 ftrace: Fix removing of second function probe
When two function probes are added to set_ftrace_filter, and then one of
them is removed, the update to the function locations is not performed, and
the record keeping of the function states are corrupted, and causes an
ftrace_bug() to occur.

This is easily reproducable by adding two probes, removing one, and then
adding it back again.

 # cd /sys/kernel/debug/tracing
 # echo schedule:traceoff > set_ftrace_filter
 # echo do_IRQ:traceoff > set_ftrace_filter
 # echo \!do_IRQ:traceoff > /debug/tracing/set_ftrace_filter
 # echo do_IRQ:traceoff > set_ftrace_filter

Causes:
 ------------[ cut here ]------------
 WARNING: CPU: 2 PID: 1098 at kernel/trace/ftrace.c:2369 ftrace_get_addr_curr+0x143/0x220
 Modules linked in: [...]
 CPU: 2 PID: 1098 Comm: bash Not tainted 4.10.0-test+ #405
 Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v02.05 05/07/2012
 Call Trace:
  dump_stack+0x68/0x9f
  __warn+0x111/0x130
  ? trace_irq_work_interrupt+0xa0/0xa0
  warn_slowpath_null+0x1d/0x20
  ftrace_get_addr_curr+0x143/0x220
  ? __fentry__+0x10/0x10
  ftrace_replace_code+0xe3/0x4f0
  ? ftrace_int3_handler+0x90/0x90
  ? printk+0x99/0xb5
  ? 0xffffffff81000000
  ftrace_modify_all_code+0x97/0x110
  arch_ftrace_update_code+0x10/0x20
  ftrace_run_update_code+0x1c/0x60
  ftrace_run_modify_code.isra.48.constprop.62+0x8e/0xd0
  register_ftrace_function_probe+0x4b6/0x590
  ? ftrace_startup+0x310/0x310
  ? debug_lockdep_rcu_enabled.part.4+0x1a/0x30
  ? update_stack_state+0x88/0x110
  ? ftrace_regex_write.isra.43.part.44+0x1d3/0x320
  ? preempt_count_sub+0x18/0xd0
  ? mutex_lock_nested+0x104/0x800
  ? ftrace_regex_write.isra.43.part.44+0x1d3/0x320
  ? __unwind_start+0x1c0/0x1c0
  ? _mutex_lock_nest_lock+0x800/0x800
  ftrace_trace_probe_callback.isra.3+0xc0/0x130
  ? func_set_flag+0xe0/0xe0
  ? __lock_acquire+0x642/0x1790
  ? __might_fault+0x1e/0x20
  ? trace_get_user+0x398/0x470
  ? strcmp+0x35/0x60
  ftrace_trace_onoff_callback+0x48/0x70
  ftrace_regex_write.isra.43.part.44+0x251/0x320
  ? match_records+0x420/0x420
  ftrace_filter_write+0x2b/0x30
  __vfs_write+0xd7/0x330
  ? do_loop_readv_writev+0x120/0x120
  ? locks_remove_posix+0x90/0x2f0
  ? do_lock_file_wait+0x160/0x160
  ? __lock_is_held+0x93/0x100
  ? rcu_read_lock_sched_held+0x5c/0xb0
  ? preempt_count_sub+0x18/0xd0
  ? __sb_start_write+0x10a/0x230
  ? vfs_write+0x222/0x240
  vfs_write+0xef/0x240
  SyS_write+0xab/0x130
  ? SyS_read+0x130/0x130
  ? trace_hardirqs_on_caller+0x182/0x280
  ? trace_hardirqs_on_thunk+0x1a/0x1c
  entry_SYSCALL_64_fastpath+0x18/0xad
 RIP: 0033:0x7fe61c157c30
 RSP: 002b:00007ffe87890258 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
 RAX: ffffffffffffffda RBX: ffffffff8114a410 RCX: 00007fe61c157c30
 RDX: 0000000000000010 RSI: 000055814798f5e0 RDI: 0000000000000001
 RBP: ffff8800c9027f98 R08: 00007fe61c422740 R09: 00007fe61ca53700
 R10: 0000000000000073 R11: 0000000000000246 R12: 0000558147a36400
 R13: 00007ffe8788f160 R14: 0000000000000024 R15: 00007ffe8788f15c
  ? trace_hardirqs_off_caller+0xc0/0x110
 ---[ end trace 99fa09b3d9869c2c ]---
 Bad trampoline accounting at: ffffffff81cc3b00 (do_IRQ+0x0/0x150)

Cc: stable@vger.kernel.org
Fixes: 59df055f19 ("ftrace: trace different functions with a different tracer")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-15 17:04:37 -04:00
Steven Rostedt (VMware) d54b6eeb55 tracing: Make sure rcu_irq_enter() can work for trace_*_rcuidle() trace events
Stack tracing discovered that there's a small location inside the RCU
infrastructure where calling rcu_irq_enter() does not work. As trace events
use rcu_irq_enter() it must make sure that it is functionable. A check
against rcu_irq_enter_disabled() is added with a WARN_ON_ONCE() as no trace
event should ever be used in that part of RCU. If the warning is triggered,
then the trace event is ignored.

Restructure the __DO_TRACE() a bit to get rid of the prercu and postrcu,
and just have an rcucheck that does the work from within the _DO_TRACE()
macro. gcc optimization will compile out the rcucheck=0 case.

Link: http://lkml.kernel.org/r/20170405093207.404f8deb@gandalf.local.home

Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-10 15:22:17 -04:00
Steven Rostedt (VMware) 03ecd3f48e rcu/tracing: Add rcu_disabled to denote when rcu_irq_enter() will not work
Tracing uses rcu_irq_enter() as a way to make sure that RCU is watching when
it needs to use rcu_read_lock() and friends. This is because tracing can
happen as RCU is about to enter user space, or about to go idle, and RCU
does not watch for RCU read side critical sections as it makes the
transition.

There is a small location within the RCU infrastructure that rcu_irq_enter()
itself will not work. If tracing were to occur in that section it will break
if it tries to use rcu_irq_enter().

Originally, this happens with the stack_tracer, because it will call
save_stack_trace when it encounters stack usage that is greater than any
stack usage it had encountered previously. There was a case where that
happened in the RCU section where rcu_irq_enter() did not work, and lockdep
complained loudly about it. To fix it, stack tracing added a call to be
disabled and RCU would disable stack tracing during the critical section
that rcu_irq_enter() was inoperable. This solution worked, but there are
other cases that use rcu_irq_enter() and it would be a good idea to let RCU
give a way to let others know that rcu_irq_enter() will not work. For
example, in trace events.

Another helpful aspect of this change is that it also moves the per cpu
variable called in the RCU critical section into a cache locale along with
other RCU per cpu variables used in that same location.

I'm keeping the stack_trace_disable() code, as that still could be used in
the future by places that really need to disable it. And since it's only a
static inline, it wont take up any kernel text if it is not used.

Link: http://lkml.kernel.org/r/20170405093207.404f8deb@gandalf.local.home

Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-10 15:22:03 -04:00
Paul E. McKenney a278d47189 rcu: Fix dyntick-idle tracing
The tracing subsystem started using rcu_irq_entry() and rcu_irq_exit()
(with my blessing) to allow the current _rcuidle alternative tracepoint
name to be dispensed with while still maintaining good performance.
Unfortunately, this causes RCU's dyntick-idle entry code's tracing to
appear to RCU like an interrupt that occurs where RCU is not designed
to handle interrupts.

This commit fixes this problem by moving the zeroing of ->dynticks_nesting
after the offending trace_rcu_dyntick() statement, which narrows the
window of vulnerability to a pair of adjacent statements that are now
marked with comments to that effect.

Link: http://lkml.kernel.org/r/20170405093207.404f8deb@gandalf.local.home
Link: http://lkml.kernel.org/r/20170405193928.GM1600@linux.vnet.ibm.com

Reported-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-10 15:21:57 -04:00
Steven Rostedt (VMware) 8aaf1ee70e tracing: Rename trace_active to disable_stack_tracer and inline its modification
In order to eliminate a function call, make "trace_active" into
"disable_stack_tracer" and convert stack_tracer_disable() and friends into
static inline functions.

Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-10 15:21:47 -04:00
Steven Rostedt (VMware) 5367278cb7 tracing: Add stack_tracer_disable/enable() functions
There are certain parts of the kernel that cannot let stack tracing
proceed (namely in RCU), because the stack tracer uses RCU, and parts of RCU
internals cannot handle having RCU read side locks taken.

Add stack_tracer_disable() and stack_tracer_enable() functions to let RCU
stop stack tracing on the current CPU when it is in those critical sections.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-10 14:34:10 -04:00
Steven Rostedt (VMware) 252babcd52 tracing: Replace the per_cpu() with __this_cpu*() in trace_stack.c
The updates to the trace_active per cpu variable can be updated with the
__this_cpu_*() functions as it only gets updated on the CPU that the variable
is on.

Thanks to Paul McKenney for suggesting __this_cpu_* instead of this_cpu_*.

Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-10 14:33:54 -04:00
Steven Rostedt (VMware) 0598e4f08e ftrace: Add use of synchronize_rcu_tasks() with dynamic trampolines
The function tracer needs to be more careful than other subsystems when it
comes to freeing data. Especially if that data is actually executable code.
When a single function is traced, a trampoline can be dynamically allocated
which is called to jump to the function trace callback. When the callback is
no longer needed, the dynamic allocated trampoline needs to be freed. This
is where the issues arise. The dynamically allocated trampoline must not be
used again. As function tracing can trace all subsystems, including
subsystems that are used to serialize aspects of freeing (namely RCU), it
must take extra care when doing the freeing.

Before synchronize_rcu_tasks() was around, there was no way for the function
tracer to know that nothing was using the dynamically allocated trampoline
when CONFIG_PREEMPT was enabled. That's because a task could be indefinitely
preempted while sitting on the trampoline. Now with synchronize_rcu_tasks(),
it will wait till all tasks have either voluntarily scheduled (not on the
trampoline) or goes into userspace (not on the trampoline). Then it is safe
to free the trampoline even with CONFIG_PREEMPT set.

Acked-by: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-07 09:41:51 -04:00
Alban Crequy 696ced4fb1 tracing/kprobes: expose maxactive for kretprobe in kprobe_events
When a kretprobe is installed on a kernel function, there is a maximum
limit of how many calls in parallel it can catch (aka "maxactive"). A
kernel module could call register_kretprobe() and initialize maxactive
(see example in samples/kprobes/kretprobe_example.c).

But that is not exposed to userspace and it is currently not possible to
choose maxactive when writing to /sys/kernel/debug/tracing/kprobe_events

The default maxactive can be as low as 1 on single-core with a
non-preemptive kernel. This is too low and we need to increase it not
only for recursive functions, but for functions that sleep or resched.

This patch updates the format of the command that can be written to
kprobe_events so that maxactive can be optionally specified.

I need this for a bpf program attached to the kretprobe of
inet_csk_accept, which can sleep for a long time.

This patch includes a basic selftest:

> # ./ftracetest -v  test.d/kprobe/
> === Ftrace unit tests ===
> [1] Kprobe dynamic event - adding and removing	[PASS]
> [2] Kprobe dynamic event - busy event check	[PASS]
> [3] Kprobe dynamic event with arguments	[PASS]
> [4] Kprobes event arguments with types	[PASS]
> [5] Kprobe dynamic event with function tracer	[PASS]
> [6] Kretprobe dynamic event with arguments	[PASS]
> [7] Kretprobe dynamic event with maxactive	[PASS]
>
> # of passed:  7
> # of failed:  0
> # of unresolved:  0
> # of untested:  0
> # of unsupported:  0
> # of xfailed:  0
> # of undefined(test bug):  0

BugLink: https://github.com/iovisor/bcc/issues/1072
Link: http://lkml.kernel.org/r/1491215782-15490-1-git-send-email-alban@kinvolk.io

Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Alban Crequy <alban@kinvolk.io>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-04 10:32:03 -04:00
Steven Rostedt (VMware) b80f0f6c9e ftrace: Have init/main.c call ftrace directly to free init memory
Relying on free_reserved_area() to call ftrace to free init memory proved to
not be sufficient. The issue is that on x86, when debug_pagealloc is
enabled, the init memory is not freed, but simply set as not present. Since
ftrace was uninformed of this, starting function tracing still tries to
update pages that are not present according to the page tables, causing
ftrace to bug, as well as killing the kernel itself.

Instead of relying on free_reserved_area(), have init/main.c call ftrace
directly just before it frees the init memory. Then it needs to use
__init_begin and __init_end to know where the init memory location is.
Looking at all archs (and testing what I can), it appears that this should
work for each of them.

Reported-by: kernel test robot <xiaolong.ye@intel.com>
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-03 14:04:00 -04:00
Steven Rostedt (VMware) 5bd84629a7 ftrace: Create separate t_func_next() to simplify the function / hash logic
I noticed that if I use dd to read the set_ftrace_filter file that the first
hash command is repeated.

 # cd /sys/kernel/debug/tracing
 # echo schedule > set_ftrace_filter
 # echo do_IRQ >> set_ftrace_filter
 # echo schedule:traceoff >> set_ftrace_filter
 # echo do_IRQ:traceoff >> set_ftrace_filter

 # cat set_ftrace_filter
 schedule
 do_IRQ
 schedule:traceoff:unlimited
 do_IRQ:traceoff:unlimited

 # dd if=set_ftrace_filter bs=1
 schedule
 do_IRQ
 schedule:traceoff:unlimited
 schedule:traceoff:unlimited
 do_IRQ:traceoff:unlimited
 98+0 records in
 98+0 records out
 98 bytes copied, 0.00265011 s, 37.0 kB/s

This is due to the way t_start() calls t_next() as well as the seq_file
calls t_next() and the state is slightly different between the two. Namely,
t_start() will call t_next() with a local "pos" variable.

By separating out the function listing from t_next() into its own function,
we can have better control of outputting the functions and the hash of
triggers. This simplifies the code.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-03-31 18:00:45 -04:00
Steven Rostedt (VMware) 43ff926a0c ftrace: Update func_pos in t_start() when all functions are enabled
If all functions are enabled, there's a comment displayed in the file to
denote that:

  # cd /sys/kernel/debug/tracing
  # cat set_ftrace_filter
 #### all functions enabled ####

If a function trigger is set, those are displayed as well:

  # echo schedule:traceoff >> /debug/tracing/set_ftrace_filter
  # cat set_ftrace_filter
 #### all functions enabled ####
 schedule:traceoff:unlimited

But if you read that file with dd, the output can change:

  # dd if=/debug/tracing/set_ftrace_filter bs=1
 #### all functions enabled ####
 32+0 records in
 32+0 records out
 32 bytes copied, 7.0237e-05 s, 456 kB/s

This is because the "pos" variable is updated for the comment, but func_pos
is not. "func_pos" is used by the triggers (or hashes) to know how many
functions were printed and it bases its index from the pos - func_pos.
func_pos should be 1 to count for the comment printed. But since it is not,
t_hash_start() thinks that one trigger was already printed.

The cat gets to t_hash_start() via t_next() and not t_start() which updates
both pos and func_pos.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-03-31 18:00:37 -04:00