Commit Graph

112 Commits

Author SHA1 Message Date
Tom Zanussi bac5fb97a1 tracing: Add and use generic set_trigger_filter() implementation
Add a generic event_command.set_trigger_filter() op implementation and
have the current set of trigger commands use it - this essentially
gives them all support for filters.

Syntactically, filters are supported by adding 'if <filter>' just
after the command, in which case only events matching the filter will
invoke the trigger.  For example, to add a filter to an
enable/disable_event command:

    echo 'enable_event:system:event if common_pid == 999' > \
              .../othersys/otherevent/trigger

The above command will only enable the system:event event if the
common_pid field in the othersys:otherevent event is 999.

As another example, to add a filter to a stacktrace command:

    echo 'stacktrace if common_pid == 999' > \
                   .../somesys/someevent/trigger

The above command will only trigger a stacktrace if the common_pid
field in the event is 999.

The filter syntax is the same as that described in the 'Event
filtering' section of Documentation/trace/events.txt.

Because triggers can now use filters, the trigger-invoking logic needs
to be moved in those cases - e.g. for ftrace_raw_event_calls, if a
trigger has a filter associated with it, the trigger invocation now
needs to happen after the { assign; } part of the call, in order for
the trigger condition to be tested.

There's still a SOFT_DISABLED-only check at the top of e.g. the
ftrace_raw_events function, so when an event is soft disabled but not
because of the presence of a trigger, the original SOFT_DISABLED
behavior remains unchanged.

There's also a bit of trickiness in that some triggers need to avoid
being invoked while an event is currently in the process of being
logged, since the trigger may itself log data into the trace buffer.
Thus we make sure the current event is committed before invoking those
triggers.  To do that, we split the trigger invocation in two - the
first part (event_triggers_call()) checks the filter using the current
trace record; if a command has the post_trigger flag set, it sets a
bit for itself in the return value, otherwise it directly invoks the
trigger.  Once all commands have been either invoked or set their
return flag, event_triggers_call() returns.  The current record is
then either committed or discarded; if any commands have deferred
their triggers, those commands are finally invoked following the close
of the current event by event_triggers_post_call().

To simplify the above and make it more efficient, the TRIGGER_COND bit
is introduced, which is set only if a soft-disabled trigger needs to
use the log record for filter testing or needs to wait until the
current log record is closed.

The syscall event invocation code is also changed in analogous ways.

Because event triggers need to be able to create and free filters,
this also adds a couple external wrappers for the existing
create_filter and free_filter functions, which are too generic to be
made extern functions themselves.

Link: http://lkml.kernel.org/r/7164930759d8719ef460357f143d995406e4eead.1382622043.git.tom.zanussi@linux.intel.com

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-12-21 22:02:17 -05:00
Tom Zanussi f306cc82a9 tracing: Update event filters for multibuffer
The trace event filters are still tied to event calls rather than
event files, which means you don't get what you'd expect when using
filters in the multibuffer case:

Before:

  # echo 'bytes_alloc > 8192' > /sys/kernel/debug/tracing/events/kmem/kmalloc/filter
  # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/filter
  bytes_alloc > 8192
  # mkdir /sys/kernel/debug/tracing/instances/test1
  # echo 'bytes_alloc > 2048' > /sys/kernel/debug/tracing/instances/test1/events/kmem/kmalloc/filter
  # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/filter
  bytes_alloc > 2048
  # cat /sys/kernel/debug/tracing/instances/test1/events/kmem/kmalloc/filter
  bytes_alloc > 2048

Setting the filter in tracing/instances/test1/events shouldn't affect
the same event in tracing/events as it does above.

After:

  # echo 'bytes_alloc > 8192' > /sys/kernel/debug/tracing/events/kmem/kmalloc/filter
  # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/filter
  bytes_alloc > 8192
  # mkdir /sys/kernel/debug/tracing/instances/test1
  # echo 'bytes_alloc > 2048' > /sys/kernel/debug/tracing/instances/test1/events/kmem/kmalloc/filter
  # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/filter
  bytes_alloc > 8192
  # cat /sys/kernel/debug/tracing/instances/test1/events/kmem/kmalloc/filter
  bytes_alloc > 2048

We'd like to just move the filter directly from ftrace_event_call to
ftrace_event_file, but there are a couple cases that don't yet have
multibuffer support and therefore have to continue using the current
event_call-based filters.  For those cases, a new USE_CALL_FILTER bit
is added to the event_call flags, whose main purpose is to keep the
old behavior for those cases until they can be updated with
multibuffer support; at that point, the USE_CALL_FILTER flag (and the
new associated call_filter_check_discard() function) can go away.

The multibuffer support also made filter_current_check_discard()
redundant, so this change removes that function as well and replaces
it with filter_check_discard() (or call_filter_check_discard() as
appropriate).

Link: http://lkml.kernel.org/r/f16e9ce4270c62f46b2e966119225e1c3cca7e60.1382620672.git.tom.zanussi@linux.intel.com

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-11-05 16:50:20 -05:00
Oleg Nesterov e2912b091c tracing: Change event_filter_read/write to verify i_private != NULL
event_filter_read/write() are racy, ftrace_event_call can be already
freed by trace_remove_event_call() callers.

1. Shift mutex_lock(event_mutex) from print/apply_event_filter to
   the callers.

2. Change the callers, event_filter_read() and event_filter_write()
   to read i_private under this mutex and abort if it is NULL.

This fixes nothing, but now we can change debugfs_remove("filter")
callers to nullify ->i_private and fix the the problem.

Link: http://lkml.kernel.org/r/20130726172540.GA3619@redhat.com

Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-07-29 22:56:59 -04:00
zhangwei(Jovi) 146c3442f2 tracing: Use trace_seq_puts()/trace_seq_putc() where possible
For string without format specifiers, use trace_seq_puts()
or trace_seq_putc().

Link: http://lkml.kernel.org/r/51E3B3AC.1000605@huawei.com

Signed-off-by: zhangwei(Jovi) <jovi.zhangwei@huawei.com>
[ fixed a trace_seq_putc(s, " ") to trace_seq_putc(s, ' ') ]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-07-18 21:30:36 -04:00
Steven Rostedt 1a891cf19c tracing: Add binary '&' filter for events
There are some cases when filtering on a set flag of a field of a tracepoint
is useful. But currently the only filtering commands for numbered fields
is ==, !=, <, <=, >, >=. This does not help when you just want to trace if
a specific flag is set. For example:

 > # sudo trace-cmd record -e brcmfmac:brcmf_dbg -f 'level & 0x40000'
 > disable all
 > enable brcmfmac:brcmf_dbg
 > path = /sys/kernel/debug/tracing/events/brcmfmac/brcmf_dbg/enable
 > (level & 0x40000)
 > ^
 > parse_error: Invalid operator
 >

When trying to trace brcmf_dbg when level has its 1 << 18 bit set, the
filter fails to perform.

By allowing a binary '&' operation, this gives the user the ability to
test a bit.

Note, a binary '|' is not added, as it doesn't make sense as fields must
be compared to constants (for now), and ORing a constant will always return
true.

Link: http://lkml.kernel.org/r/1371057385.9844.261.camel@gandalf.local.home

Suggested-by: Arend van Spriel <arend@broadcom.com>
Tested-by: Arend van Spriel <arend@broadcom.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-06-19 23:30:40 -04:00
Steven Rostedt (Red Hat) 60705c8946 tracing: Fix leaks of filter preds
Special preds are created when folding a series of preds that
can be done in serial. These are allocated in an ops field of
the pred structure. But they were never freed, causing memory
leaks.

This was discovered using the kmemleak checker:

unreferenced object 0xffff8800797fd5e0 (size 32):
  comm "swapper/0", pid 1, jiffies 4294690605 (age 104.608s)
  hex dump (first 32 bytes):
    00 00 01 00 03 00 05 00 07 00 09 00 0b 00 0d 00  ................
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<ffffffff814b52af>] kmemleak_alloc+0x73/0x98
    [<ffffffff8111ff84>] kmemleak_alloc_recursive.constprop.42+0x16/0x18
    [<ffffffff81120e68>] __kmalloc+0xd7/0x125
    [<ffffffff810d47eb>] kcalloc.constprop.24+0x2d/0x2f
    [<ffffffff810d4896>] fold_pred_tree_cb+0xa9/0xf4
    [<ffffffff810d3781>] walk_pred_tree+0x47/0xcc
    [<ffffffff810d5030>] replace_preds.isra.20+0x6f8/0x72f
    [<ffffffff810d50b5>] create_filter+0x4e/0x8b
    [<ffffffff81b1c30d>] ftrace_test_event_filter+0x5a/0x155
    [<ffffffff8100028d>] do_one_initcall+0xa0/0x137
    [<ffffffff81afbedf>] kernel_init_freeable+0x14d/0x1dc
    [<ffffffff814b24b7>] kernel_init+0xe/0xdb
    [<ffffffff814d539c>] ret_from_fork+0x7c/0xb0
    [<ffffffffffffffff>] 0xffffffffffffffff

Cc: Tom Zanussi <tzanussi@gmail.com>
Cc: stable@vger.kernel.org # 2.6.39+
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-05-15 13:49:18 -04:00
zhangwei(Jovi) b3a8c6fd7b tracing: Move find_event_field() into trace_events.c
By moving find_event_field() and trace_find_field() into trace_events.c,
the ftrace_common_fields list and trace_get_fields() can become local to
the trace_events.c file.

find_event_field() is renamed to trace_find_event_field() to conform to
the tracing global function names.

Link: http://lkml.kernel.org/r/513D8426.9070109@huawei.com

Signed-off-by: zhangwei(Jovi) <jovi.zhangwei@huawei.com>
[ rostedt: Modified trace_find_field() to trace_find_event_field() ]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-03-15 13:22:07 -04:00
Steven Rostedt ae63b31e4d tracing: Separate out trace events from global variables
The trace events for ftrace are all defined via global variables.
The arrays of events and event systems are linked to a global list.
This prevents multiple users of the event system (what to enable and
what not to).

By adding descriptors to represent the event/file relation, as well
as to which trace_array descriptor they are associated with, allows
for more than one set of events to be defined. Once the trace events
files have a link between the trace event and the trace_array they
are associated with, we can create multiple trace_arrays that can
record separate events in separate buffers.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-03-15 00:34:40 -04:00
Daniel Walter bcd83ea6cb tracing: Replace strict_strto* with kstrto*
* remove old string conversions with kstrto*

Link: http://lkml.kernel.org/r/20120926200838.GC1244@0x90.at

Signed-off-by: Daniel Walter <sahne@0x90.at>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-10-31 16:45:23 -04:00
Jiri Olsa 92d8d4a8b0 tracing/filter: Add missing initialization
Add missing initialization for ret variable. Its initialization
is based on the re_cnt variable, which is being set deep down
in the ftrace_function_filter_re function.

I'm not sure compilers would be smart enough to see this in near
future, so killing the warning this way.

Link: http://lkml.kernel.org/r/1340120894-9465-2-git-send-email-jolsa@redhat.com

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-08-07 09:42:47 -04:00
Jiri Olsa 5500fa5119 ftrace, perf: Add filter support for function trace event
Adding support to filter function trace event via perf
interface. It is now possible to use filter interface
in the perf tool like:

  perf record -e ftrace:function --filter="(ip == mm_*)" ls

The filter syntax is restricted to the the 'ip' field only,
and following operators are accepted '==' '!=' '||', ending
up with the filter strings like:

  ip == f1[, ]f2 ... || ip != f3[, ]f4 ...

with comma ',' or space ' ' as a function separator. If the
space ' ' is used as a separator, the right side of the
assignment needs to be enclosed in double quotes '"', e.g.:

  perf record -e ftrace:function --filter '(ip == do_execve,sys_*,ext*)' ls
  perf record -e ftrace:function --filter '(ip == "do_execve,sys_*,ext*")' ls
  perf record -e ftrace:function --filter '(ip == "do_execve sys_* ext*")' ls

The '==' operator adds trace filter with same effect as would
be added via set_ftrace_filter file.

The '!=' operator adds trace filter with same effect as would
be added via set_ftrace_notrace file.

The right side of the '!=', '==' operators is list of functions
or regexp. to be added to filter separated by space.

The '||' operator is used for connecting multiple filter definitions
together. It is possible to have more than one '==' and '!='
operators within one filter string.

Link: http://lkml.kernel.org/r/1329317514-8131-8-git-send-email-jolsa@redhat.com

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-02-21 11:08:30 -05:00
Jiri Olsa 02aa3162ed ftrace: Allow to specify filter field type for ftrace events
Adding FILTER_TRACE_FN event field type for function tracepoint
event, so it can be properly recognized within filtering code.

Currently all fields of ftrace subsystem events share the common
field type FILTER_OTHER. Since the function trace fields need
special care within the filtering code we need to recognize it
properly, hence adding the FILTER_TRACE_FN event type.

Adding filter parameter to the FTRACE_ENTRY macro, to specify the
filter field type for the event.

Link: http://lkml.kernel.org/r/1329317514-8131-7-git-send-email-jolsa@redhat.com

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-02-21 11:08:29 -05:00
Thomas Meyer 47b0edcb59 tracing/trivial: Use kcalloc instead of kzalloc to allocate array
The advantage of kcalloc is, that will prevent integer overflows which could
result from the multiplication of number of elements and size and it is also
a bit nicer to read.

The semantic patch that makes this change is available
in https://lkml.org/lkml/2011/11/25/107

Link: http://lkml.kernel.org/r/1322600880.1534.347.camel@localhost.localdomain

Signed-off-by: Thomas Meyer <thomas@m3y3r.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-02-13 13:48:11 -05:00
Tejun Heo 38b78eb855 tracing: Factorize filter creation
There are four places where new filter for a given filter string is
created, which involves several different steps.  This patch factors
those steps into create_[system_]filter() functions which in turn make
use of create_filter_{start|finish}() for common parts.

The only functional change is that if replace_filter_string() is
requested and fails, creation fails without any side effect instead of
being ignored.

Note that system filter is now installed after the processing is
complete which makes freeing before and then restoring filter string
on error unncessary.

-v2: Rebased to resolve conflict with 49aa29513e and updated both
     create_filter() functions to always set *filterp instead of
     requiring the caller to clear it to %NULL on entry.

Link: http://lkml.kernel.org/r/1323988305-1469-2-git-send-email-tj@kernel.org

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-12-21 07:27:02 -05:00
Ingo Molnar d6c1c49de5 Merge branch 'perf/urgent' into perf/core
Merge reason: Add these cherry-picked commits so that future changes
              on perf/core don't conflict.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-06 06:43:49 +01:00
Li Zefan 27b14b56af tracing: Restore system filter behavior
Though not all events have field 'prev_pid', it was allowed to do this:

  # echo 'prev_pid == 100' > events/sched/filter

but commit 75b8e98263 (tracing/filter: Swap
entire filter of events) broke it without any reason.

Link: http://lkml.kernel.org/r/4EAF46CF.8040408@cn.fujitsu.com

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-12-05 13:28:45 -05:00
Tejun Heo d3d9acf646 trace_events_filter: Use rcu_assign_pointer() when setting ftrace_event_call->filter
ftrace_event_call->filter is sched RCU protected but didn't use
rcu_assign_pointer().  Use it.

TODO: Add proper __rcu annotation to call->filter and all its users.

-v2: Use RCU_INIT_POINTER() for %NULL clearing as suggested by Eric.

Link: http://lkml.kernel.org/r/20111123164949.GA29639@google.com

Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: stable@kernel.org # (2.6.39+)
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-12-01 22:16:47 -05:00
Steven Rostedt 49aa29513e tracing: Add boiler plate for subsystem filter
The system filter can be used to set multiple event filters that
exist within the system. But currently it displays the last filter
written that does not necessarily correspond to the filters within
the system. The system filter itself is not used to filter any events.
The system filter is just a means to set filters of the events within
it.

Because this causes an ambiguous state when the system filter reads
a filter string but the events within the system have different strings
it is best to just show a boiler plate:

 ### global filter ###
 # Use this to set filters for multiple events.
 # Only events with the given fields will be affected.
 # If no events are modified, an error message will be displayed here.

If an error occurs while writing to the system filter, the system
filter will replace the boiler plate with the error message as it
currently does.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-11-04 21:25:36 -04:00
Li Zefan ed0449af53 tracing: Restore system filter behavior
Though not all events have field 'prev_pid', it was allowed to do this:

  # echo 'prev_pid == 100' > events/sched/filter

but commit 75b8e98263 (tracing/filter: Swap
entire filter of events) broke it without any reason.

Link: http://lkml.kernel.org/r/4EAF46CF.8040408@cn.fujitsu.com

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-11-02 13:56:25 -04:00
Steven Rostedt 86b6ef21b8 tracing: Add preempt disable for filter self test
The self testing for event filters does not really need preemption
disabled as there are no races at the time of testing, but the functions
it calls uses rcu_dereference_sched() which will complain if preemption
is not disabled.

Cc: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-08-30 12:27:07 -04:00
Jiri Olsa 1d0e78e380 tracing/filter: Add startup tests for events filter
Adding automated tests running as late_initcall. Tests are
compiled in with CONFIG_FTRACE_STARTUP_TEST option.

Adding test event "ftrace_test_filter" used to simulate
filter processing during event occurance.

String filters are compiled and tested against several
test events with different values.

Also testing that evaluation of explicit predicates is ommited
due to the lazy filter evaluation.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1313072754-4620-11-git-send-email-jolsa@redhat.com
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-08-19 14:35:59 -04:00
Jiri Olsa f30120fce1 tracing/filter: Change filter_match_preds function to use walk_pred_tree
Changing filter_match_preds function to use unified predicates tree
processing.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1313072754-4620-10-git-send-email-jolsa@redhat.com
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-08-19 14:35:58 -04:00
Jiri Olsa 96bc293a97 tracing/filter: Change fold_pred function to use walk_pred_tree
Changing fold_pred_tree function to use unified predicates tree
processing.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1313072754-4620-9-git-send-email-jolsa@redhat.com
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-08-19 14:35:57 -04:00
Jiri Olsa 1b797fe5aa tracing/filter: Change fold_pred_tree function to use walk_pred_tree
Changing fold_pred_tree function to use unified predicates tree
processing.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1313072754-4620-8-git-send-email-jolsa@redhat.com
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-08-19 14:35:56 -04:00
Jiri Olsa c00b060f36 tracing/filter: Change count_leafs function to use walk_pred_tree
Changing count_leafs function to use unified predicates tree
processing.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1313072754-4620-7-git-send-email-jolsa@redhat.com
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-08-19 14:35:55 -04:00
Jiri Olsa f03f597994 tracing/filter: Unify predicate tree walking, change check_pred_tree function to use it
Adding walk_pred_tree function to be used for walking throught
the filter predicates.

For each predicate the callback function is called, allowing
users to add their own functionality or customize their way
through the filter predicates.

Changing check_pred_tree function to use walk_pred_tree.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1313072754-4620-6-git-send-email-jolsa@redhat.com
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-08-19 14:35:54 -04:00
Jiri Olsa 3f78f935e7 tracing/filter: Simplify tracepoint event lookup
We dont need to perform lookup through the ftrace_events list,
instead we can use the 'tp_event' field.

Each perf_event contains tracepoint event field 'tp_event', which
got initialized during the tracepoint event initialization.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1313072754-4620-5-git-send-email-jolsa@redhat.com
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-08-19 14:35:53 -04:00
Jiri Olsa 61aaef5530 tracing/filter: Remove field_name from filter_pred struct
The field_name was used just for finding event's fields. This way we
don't need to care about field_name allocation/free.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1313072754-4620-4-git-send-email-jolsa@redhat.com
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-08-19 14:35:52 -04:00
Jiri Olsa 9d96cd1743 tracing/filter: Separate predicate init and filter addition
Making the code cleaner by having one function to fully prepare
the predicate (create_pred), and another to add the predicate to
the filter (filter_add_pred).

As a benefit, this way the dry_run flag stays only inside the
replace_preds function and is not passed deeper.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1313072754-4620-3-git-send-email-jolsa@redhat.com
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-08-19 14:35:51 -04:00
Jiri Olsa 81570d9caa tracing/filter: Use static allocation for filter predicates
Don't dynamically allocate filter_pred struct, use static memory.
This way we can get rid of the code managing the dynamic filter_pred
struct object.

The create_pred function integrates create_logical_pred function.
This way the static predicate memory is returned only from
one place.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1313072754-4620-2-git-send-email-jolsa@redhat.com
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-08-19 14:35:51 -04:00
Steven Rostedt e9dbfae53e tracing: Fix bug when reading system filters on module removal
The event system is freed when its nr_events is set to zero. This happens
when a module created an event system and then later the module is
removed. Modules may share systems, so the system is allocated when
it is created and freed when the modules are unloaded and all the
events under the system are removed (nr_events set to zero).

The problem arises when a task opened the "filter" file for the
system. If the module is unloaded and it removed the last event for
that system, the system structure is freed. If the task that opened
the filter file accesses the "filter" file after the system has
been freed, the system will access an invalid pointer.

By adding a ref_count, and using it to keep track of what
is using the event system, we can free it after all users
are finished with the event system.

Cc: <stable@kernel.org>
Reported-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-07-07 11:19:18 -04:00
Ingo Molnar 1ef1d1c235 trace, filters: Initialize the match variable in process_ops() properly
Make sure the 'match' variable always has a value.

Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-03-18 14:41:27 +01:00
Steven Rostedt 4defe682d8 tracing/filter: Remove synchronize_sched() from __alloc_preds()
Because the filters are processed first and then activated
(added to the call), we no longer need to worry about the preds
of the filter in __alloc_preds() being used. As the filter that
is allocating preds is not activated yet.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-02-07 20:56:20 -05:00
Steven Rostedt 75b8e98263 tracing/filter: Swap entire filter of events
When creating a new filter, instead of allocating the filter to the
event call first and then processing the filter, it is easier to
process a temporary filter and then just swap it with the call filter.
By doing this, it simplifies the code.

A filter is allocated and processed, when it is done, it is
swapped with the call filter, synchronize_sched() is called to make
sure all callers are done with the old filter (filters are called
with premption disabled), and then the old filter is freed.

Cc: Tom Zanussi <tzanussi@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-02-07 20:56:20 -05:00
Steven Rostedt 43cd414552 tracing/filter: Optimize filter by folding the tree
There are many cases that a filter will contain multiple ORs or
ANDs together near the leafs. Walking up and down the tree to get
to the next compare can be a waste.

If there are several ORs or ANDs together, fold them into a single
pred and allocate an array of the conditions that they check.
This will speed up the filter by linearly walking an array
and can still break out if a short circuit condition is met.

Cc: Tom Zanussi <tzanussi@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-02-07 20:56:19 -05:00
Steven Rostedt ec126cac23 tracing/filter: Check the created pred tree
Since the filter walks a tree to determine if a match is made or not,
if the tree was incorrectly created, it could cause an infinite loop.

Add a check to walk the entire tree before assigning it as a filter
to make sure the tree is correct.

Cc: Tom Zanussi <tzanussi@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-02-07 20:56:19 -05:00
Steven Rostedt 5571927418 tracing/filter: Optimize short ciruit check
The test if we should break out early for OR and AND operations
can be optimized by comparing the current result with
  (pred->op == OP_OR)

That is if the result is true and the op is an OP_OR, or
if the result is false and the op is not an OP_OR (thus an OP_AND)
we can break out early in either case. Otherwise we continue
processing.

Cc: Tom Zanussi <tzanussi@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-02-07 20:56:19 -05:00
Steven Rostedt 61e9dea20e tracing/filter: Use a tree instead of stack for filter_match_preds()
Currently the filter_match_preds() requires a stack to push
and pop the preds to determine if the filter matches the record or not.
This has two drawbacks:

1) It requires a stack to store state information. As this is done
   in fast paths we can't allocate the storage for this stack, and
   we can't use a global as it must be re-entrant. The stack is stored
   on the kernel stack and this greatly limits how many preds we
   may allow.

2) All conditions are calculated even when a short circuit exists.
   a || b  will always calculate a and b even though a was determined
   to be true.

Using a tree we can walk a constant structure that will save
the state as we go. The algorithm is simply:

  pred = root;
  do {
	switch (move) {
	case MOVE_DOWN:
		if (OR or AND) {
			pred = left;
			continue;
		}
		if (pred == root)
			break;
		match = pred->fn();
		pred = pred->parent;
		move = left child ? MOVE_UP_FROM_LEFT : MOVE_UP_FROM_RIGHT;
		continue;

	case MOVE_UP_FROM_LEFT:
		/* Only OR or AND can be a parent */
		if (match && OR || !match && AND) {
			/* short circuit */
			if (pred == root)
				break;
			pred = pred->parent;
			move = left child ?
				MOVE_UP_FROM_LEFT :
				MOVE_UP_FROM_RIGHT;
			continue;
		}
		pred = pred->right;
		move = MOVE_DOWN;
		continue;

	case MOVE_UP_FROM_RIGHT:
		if (pred == root)
			break;
		pred = pred->parent;
		move = left child ? MOVE_UP_FROM_LEFT : MOVE_UP_FROM_RIGHT;
		continue;
	}
	done = 1;
  } while (!done);

This way there's no strict limit to how many preds we allow
and it also will short circuit the logical operations when possible.

Cc: Tom Zanussi <tzanussi@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-02-07 20:56:19 -05:00
Steven Rostedt f76690afd0 tracing/filter: Free pred array on disabling of filter
When a filter is disabled, free the preds.

Cc: Tom Zanussi <tzanussi@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-02-07 20:56:18 -05:00
Steven Rostedt 74e9e58c35 tracing/filter: Allocate the preds in an array
Currently we allocate an array of pointers to filter_preds, and then
allocate a separate filter_pred for each item in the array.
This adds slight overhead in the filters as it needs to derefernce
twice to get to the op condition.

Allocating the preds themselves in a single array removes a dereference
as well as helps on the cache footprint.

Cc: Tom Zanussi <tzanussi@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-02-07 20:56:18 -05:00
Steven Rostedt 0fc3ca9a10 tracing/filter: Call synchronize_sched() just once for system filters
By separating out the reseting of the filter->n_preds to zero from
the reallocation of preds for the filter, we can reset groups of
filters first, call synchronize_sched() just once, and then reallocate
each of the filters in the system group.

Cc: Tom Zanussi <tzanussi@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-02-07 20:56:18 -05:00
Steven Rostedt c9c53ca03d tracing/filter: Dynamically allocate preds
For every filter that is made, we create predicates to hold every
operation within the filter. We have a max of 32 predicates that we
can hold. Currently, we allocate all 32 even if we only need to
use one.

Part of the reason we do this is that the filter can be used at
any moment by any event. Fortunately, the filter is only used
with preemption disabled. By reseting the count of preds used "n_preds"
to zero, then performing a synchronize_sched(), we can safely
free and reallocate a new array of preds.

Cc: Tom Zanussi <tzanussi@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-02-07 20:56:18 -05:00
Steven Rostedt 58d9a597c4 tracing/filter: Move OR and AND logic out of fn() method
The ops OR and AND act different from the other ops, as they
are the only ones to take other ops as their arguements.
These ops als change the logic of the filter_match_preds.

By removing the OR and AND fn's we can also remove the val1 and val2
that is passed to all other fn's and are unused.

Cc: Tom Zanussi <tzanussi@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-02-07 20:56:18 -05:00
Steven Rostedt 6d54057d76 tracing/filter: Have no filter return a match
The n_preds field of a file can change at anytime, and even can become
zero, just as the filter is about to be processed by an event.
In the case that is zero on entering the filter, return 1, telling
the caller the event matchs and should be trace.

Also use a variable and assign it with ACCESS_ONCE() such that the
count stays consistent within the function.

Cc: Tom Zanussi <tzanussi@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-02-07 20:56:17 -05:00
Li Zefan c9d932cf8a tracing: Remove test of NULL define_fields callback
Every event (or event class) has it's define_fields callback,
so the test is redundant.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
LKML-Reference: <4BFA37BC.8080707@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-06-28 17:12:52 -04:00
Li Zefan 8728fe501e tracing: Don't allocate common fields for every trace events
Every event has the same common fields, so it's a big waste of
memory to have a copy of those fields for every event.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
LKML-Reference: <4BFA3759.30105@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-06-28 17:12:46 -04:00
Steven Rostedt f0218b3e99 Merge branch 'perf/core' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip into trace/tip/tracing/core-6
Conflicts:
	include/trace/ftrace.h
	kernel/trace/trace_kprobe.c

Acked-by: Masami Hiramatsu <mhiramat@redhat.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-05-18 00:35:23 -04:00
Steven Rostedt 553552ce17 tracing: Combine event filter_active and enable into single flags field
The filter_active and enable both use an int (4 bytes each) to
set a single flag. We can save 4 bytes per event by combining the
two into a single integer.

   text	   data	    bss	    dec	    hex	filename
4913961	1088356	 861512	6863829	 68bbd5	vmlinux.orig
4894944	1018052	 861512	6774508	 675eec	vmlinux.id
4894871	1012292	 861512	6768675	 674823	vmlinux.flags

This gives us another 5K in savings.

The modification of both the enable and filter fields are done
under the event_mutex, so it is still safe to combine the two.

Note: Although Mathieu gave his Acked-by, he would like it documented
 that the reads of flags are not protected by the mutex. The way the
 code works, these reads will not break anything, but will have a
 residual effect. Since this behavior is the same even before this
 patch, describing this situation is left to another patch, as this
 patch does not change the behavior, but just brought it to Mathieu's
 attention.

v2: Updated the event trace self test to for this change.

Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: Masami Hiramatsu <mhiramat@redhat.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Tom Zanussi <tzanussi@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-05-14 14:33:22 -04:00
Steven Rostedt 32c0edaeaa tracing: Remove duplicate id information in event structure
Now that the trace_event structure is embedded in the ftrace_event_call
structure, there is no need for the ftrace_event_call id field.
The id field is the same as the trace_event type field.

Removing the id and re-arranging the structure brings down the tracepoint
footprint by another 5K.

   text	   data	    bss	    dec	    hex	filename
4913961	1088356	 861512	6863829	 68bbd5	vmlinux.orig
4895024	1023812	 861512	6780348	 6775bc	vmlinux.print
4894944	1018052	 861512	6774508	 675eec	vmlinux.id

Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: Masami Hiramatsu <mhiramat@redhat.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-05-14 14:33:15 -04:00
Steven Rostedt 2e33af0295 tracing: Move fields from event to class structure
Move the defined fields from the event to the class structure.
Since the fields of the event are defined by the class they belong
to, it makes sense to have the class hold the information instead
of the individual events. The events of the same class would just
hold duplicate information.

After this change the size of the kernel dropped another 3K:

   text	   data	    bss	    dec	    hex	filename
4913961	1088356	 861512	6863829	 68bbd5	vmlinux.orig
4900252	1057412	 861512	6819176	 680d68	vmlinux.regs
4900375	1053380	 861512	6815267	 67fe23	vmlinux.fields

Although the text increased, this was mainly due to the C files
having to adapt to the change. This is a constant increase, where
new tracepoints will not increase the Text. But the big drop is
in the data size (as well as needed allocations to hold the fields).
This will give even more savings as more tracepoints are created.

Note, if just TRACE_EVENT()s are used and not DECLARE_EVENT_CLASS()
with several DEFINE_EVENT()s, then the savings will be lost. But
we are pushing developers to consolidate events with DEFINE_EVENT()
so this should not be an issue.

The kprobes define a unique class to every new event, but are dynamic
so it should not be a issue.

The syscalls however have a single class but the fields for the individual
events are different. The syscalls use a metadata to define the
fields. I moved the fields list from the event to the metadata and
added a "get_fields()" function to the class. This function is used
to find the fields. For normal events and kprobes, get_fields() just
returns a pointer to the fields list_head in the class. For syscall
events, it returns the fields list_head in the metadata for the event.

v2:  Fixed the syscall fields. The syscall metadata needs a list
     of fields for both enter and exit.

Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Tom Zanussi <tzanussi@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-05-14 14:20:23 -04:00