linux/kernel/trace
Steven Rostedt (VMware) a98c81ab17 tracing: Have all levels of checks prevent recursion
commit ed65df63a39a3f6ed04f7258de8b6789e5021c18 upstream.

While writing an email explaining the "bit = 0" logic for a discussion on
making ftrace_test_recursion_trylock() disable preemption, I discovered a
path that makes the "not do the logic if bit is zero" unsafe.

The recursion logic is done in hot paths like the function tracer. Thus,
any code executed causes noticeable overhead. Thus, tricks are done to try
to limit the amount of code executed. This included the recursion testing
logic.

Having recursion testing is important, as there are many paths that can
end up in an infinite recursion cycle when tracing every function in the
kernel. Thus protection is needed to prevent that from happening.

Because it is OK to recurse due to different running context levels (e.g.
an interrupt preempts a trace, and then a trace occurs in the interrupt
handler), a set of bits are used to know which context one is in (normal,
softirq, irq and NMI). If a recursion occurs in the same level, it is
prevented*.

Then there are infrastructure levels of recursion as well. When more than
one callback is attached to the same function to trace, it calls a loop
function to iterate over all the callbacks. Both the callbacks and the
loop function have recursion protection. The callbacks use the
"ftrace_test_recursion_trylock()" which has a "function" set of context
bits to test, and the loop function calls the internal
trace_test_and_set_recursion() directly, with an "internal" set of bits.

If an architecture does not implement all the features supported by ftrace
then the callbacks are never called directly, and the loop function is
called instead, which will implement the features of ftrace.

Since both the loop function and the callbacks do recursion protection, it
was seemed unnecessary to do it in both locations. Thus, a trick was made
to have the internal set of recursion bits at a more significant bit
location than the function bits. Then, if any of the higher bits were set,
the logic of the function bits could be skipped, as any new recursion
would first have to go through the loop function.

This is true for architectures that do not support all the ftrace
features, because all functions being traced must first go through the
loop function before going to the callbacks. But this is not true for
architectures that support all the ftrace features. That's because the
loop function could be called due to two callbacks attached to the same
function, but then a recursion function inside the callback could be
called that does not share any other callback, and it will be called
directly.

i.e.

 traced_function_1: [ more than one callback tracing it ]
   call loop_func

 loop_func:
   trace_recursion set internal bit
   call callback

 callback:
   trace_recursion [ skipped because internal bit is set, return 0 ]
   call traced_function_2

 traced_function_2: [ only traced by above callback ]
   call callback

 callback:
   trace_recursion [ skipped because internal bit is set, return 0 ]
   call traced_function_2

 [ wash, rinse, repeat, BOOM! out of shampoo! ]

Thus, the "bit == 0 skip" trick is not safe, unless the loop function is
call for all functions.

Since we want to encourage architectures to implement all ftrace features,
having them slow down due to this extra logic may encourage the
maintainers to update to the latest ftrace features. And because this
logic is only safe for them, remove it completely.

 [*] There is on layer of recursion that is allowed, and that is to allow
     for the transition between interrupt context (normal -> softirq ->
     irq -> NMI), because a trace may occur before the context update is
     visible to the trace recursion logic.

Link: https://lore.kernel.org/all/609b565a-ed6e-a1da-f025-166691b5d994@linux.alibaba.com/
Link: https://lkml.kernel.org/r/20211018154412.09fcad3c@gandalf.local.home

Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "James E.J. Bottomley" <James.Bottomley@hansenpartnership.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Miroslav Benes <mbenes@suse.cz>
Cc: Joe Lawrence <joe.lawrence@redhat.com>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Jisheng Zhang <jszhang@kernel.org>
Cc: =?utf-8?b?546L6LSH?= <yun.wang@linux.alibaba.com>
Cc: Guo Ren <guoren@kernel.org>
Cc: stable@vger.kernel.org
Fixes: edc15cafcb ("tracing: Avoid unnecessary multiple recursion checks")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-10-27 09:54:29 +02:00
..
Kconfig tracing/kprobes: Do the notrace functions check without kprobes on ftrace 2021-01-19 18:26:12 +01:00
Makefile tracing: Add unified dynamic event framework 2018-12-08 20:54:09 -05:00
blktrace.c blktrace: Fix uaf in blk_trace access after removing by sysfs 2021-09-30 10:09:24 +02:00
bpf_trace.c tracepoint: Add tracepoint_probe_register_may_exist() for BPF tracing 2021-07-14 16:53:08 +02:00
fgraph.c fgraph: Initialize tracing_graph_pause at task creation 2021-02-10 09:25:29 +01:00
ftrace.c tracing: Have all levels of checks prevent recursion 2021-10-27 09:54:29 +02:00
ftrace_internal.h treewide: Rename rcu_dereference_raw_notrace() to _check() 2019-08-01 14:16:21 -07:00
power-traces.c
preemptirq_delay_test.c tracing: Use trace_clock_local() for looping in preemptirq_delay_test.c 2018-10-17 15:35:33 -04:00
ring_buffer.c tracing: Fix bug in rb_per_cpu_empty() that might cause deadloop. 2021-07-28 13:31:00 +02:00
ring_buffer_benchmark.c tracing: Use CONFIG_PREEMPTION 2019-07-31 19:03:35 +02:00
rpm-traces.c
trace.c tracing/histogram: Rename "cpu" to "common_cpu" 2021-07-28 13:31:00 +02:00
trace.h tracing: Have all levels of checks prevent recursion 2021-10-27 09:54:29 +02:00
trace_benchmark.c
trace_benchmark.h tracing: Fix SPDX format headers to use C++ style comments 2018-08-16 19:08:06 -04:00
trace_branch.c x86/uaccess, ftrace: Fix ftrace_likely_update() vs. SMAP 2019-04-03 11:02:24 +02:00
trace_clock.c tracing: Do no increment trace_clock_global() by one 2021-06-23 14:41:28 +02:00
trace_dynevent.c tracing: Add tracing_check_open_get_tr() 2019-10-12 20:44:07 -04:00
trace_dynevent.h tracing/dynevent: Pass extra arguments to match operation 2019-08-31 12:19:38 -04:00
trace_entries.h tracing: Set kernel_stack's caller size properly 2020-10-01 13:17:29 +02:00
trace_event_perf.c tracing: Fix race in perf_trace_buf initialization 2019-10-21 19:38:28 -04:00
trace_events.c tracing: Do not count ftrace events in top level enable output 2021-02-17 10:35:14 +01:00
trace_events_filter.c tracing: Avoid memory leak in process_system_preds() 2020-01-09 10:20:00 +01:00
trace_events_filter_test.h tracing: Fix SPDX format headers to use C++ style comments 2018-08-16 19:08:06 -04:00
trace_events_hist.c tracing / histogram: Fix NULL pointer dereference on strcmp() on NULL event name 2021-08-26 08:36:20 -04:00
trace_events_trigger.c tracing: Fix event trigger to accept redundant spaces 2020-06-30 15:37:10 -04:00
trace_export.c
trace_functions.c tracing: Have all levels of checks prevent recursion 2021-10-27 09:54:29 +02:00
trace_functions_graph.c fgraph: Remove redundant ftrace_graph_notrace_addr() test 2019-07-30 21:50:03 -04:00
trace_hwlat.c tracing: Remove WARN_ON in start_thread() 2020-12-08 10:40:28 +01:00
trace_irqsoff.c The biggest change for this release is in the histogram code. 2019-03-11 17:01:32 -07:00
trace_kdb.c tracing: Silence GCC 9 array bounds warning 2019-05-25 23:04:30 -04:00
trace_kprobe.c tracing/probes: Reject events which have the same name of existing one 2021-09-22 12:26:43 +02:00
trace_kprobe_selftest.c
trace_kprobe_selftest.h tracing: Fix SPDX format headers to use C++ style comments 2018-08-16 19:08:06 -04:00
trace_mmiotrace.c
trace_nop.c
trace_output.c tracing: Make the space reserved for the pid wider 2020-10-07 08:01:27 +02:00
trace_output.h tracing: Fix SPDX format headers to use C++ style comments 2018-08-16 19:08:06 -04:00
trace_preemptirq.c lockdep: fix order in trace_hardirqs_off_caller() 2020-10-01 13:18:14 +02:00
trace_printk.c tracing: Add locked_down checks to the open calls of files created for tracefs 2019-10-12 20:48:06 -04:00
trace_probe.c tracing/probes: Reject events which have the same name of existing one 2021-09-22 12:26:43 +02:00
trace_probe.h tracing/probes: Reject events which have the same name of existing one 2021-09-22 12:26:43 +02:00
trace_probe_tmpl.h tracing/probe: Support user-space dereference 2019-05-25 23:04:42 -04:00
trace_sched_switch.c tracing: Fix sched switch start/stop refcount racy updates 2020-02-11 04:35:07 -08:00
trace_sched_wakeup.c kernel/trace: Fix do not unregister tracepoints when register sched_migrate_task fail 2020-01-14 20:08:22 +01:00
trace_selftest.c ftrace: Handle tracing when switching between context 2020-11-10 12:37:28 +01:00
trace_selftest_dynamic.c
trace_seq.c tracing: Add SPDX License format tags to tracing files 2018-08-16 19:08:06 -04:00
trace_stack.c tracing: Have stack tracer compile when MCOUNT_INSN_SIZE is not defined 2020-01-14 20:08:22 +01:00
trace_stat.c tracing: Fix very unlikely race of registering two stat tracers 2020-02-24 08:36:30 +01:00
trace_stat.h tracing: Fix SPDX format headers to use C++ style comments 2018-08-16 19:08:06 -04:00
trace_syscalls.c syscalls: Remove start and number from syscall_get_arguments() args 2019-04-05 09:26:43 -04:00
trace_uprobe.c tracing/probes: Reject events which have the same name of existing one 2021-09-22 12:26:43 +02:00
tracing_map.c tracing: Have the histogram compare functions convert to u64 first 2020-01-09 10:20:00 +01:00
tracing_map.h tracing: Fix SPDX format headers to use C++ style comments 2018-08-16 19:08:06 -04:00