Commit Graph

768053 Commits

Author SHA1 Message Date
Joel Fernandes (Google) da5b3ebb45 tracing: irqsoff: Account for additional preempt_disable
Recently we tried to make the preemptirqsoff tracer to use irqsoff
tracepoint probes. However this causes issues as reported by Masami:

[2.271078] Testing tracer preemptirqsoff: .. no entries found ..FAILED!
[2.381015] WARNING: CPU: 0 PID: 1 at /home/mhiramat/ksrc/linux/kernel/
trace/trace.c:1512 run_tracer_selftest+0xf3/0x154

This is due to the tracepoint code increasing the preempt nesting count
by calling an additional preempt_disable before calling into the
preemptoff tracer which messes up the preempt_count() check in
tracer_hardirqs_off.

To fix this, make the irqsoff tracer probes balance the additional outer
preempt_disable with a preempt_enable_notrace.

The other way to fix this is to just use SRCU for all tracepoints.
However we can't do that because we can't use NMIs from RCU context.

Link: http://lkml.kernel.org/r/20180806034049.67949-1-joel@joelfernandes.org

Fixes: c3bc8fd637 ("tracing: Centralize preemptirq tracepoints and unify their usage")
Fixes: e6753f23d9 ("tracepoint: Make rcuidle tracepoint callers use SRCU")
Reported-by: Masami Hiramatsu <mhiramat@kernel.org>
Tested-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-08-06 21:55:24 -04:00
Joel Fernandes (Google) da25a672cf trace: Use rcu_dereference_raw for hooks from trace-event subsystem
Since we switched to using SRCU for tracepoints used in the idle path,
we can no longer use rcu_dereference_sched for dereferencing points in
trace-event hooks.

Since tracepoints can now use either SRCU or sched-RCU, just use
rcu_dereference_raw for traceevents just like we're doing when
dereferencing the tracepoint table.

This prevents an RCU warning reported by Masami:

[  282.060593] WARNING: can't dereference registers at 00000000f3c7f62b
[  282.063200] =============================
[  282.064082] WARNING: suspicious RCU usage
[  282.064963] 4.18.0-rc6+ #15 Tainted: G        W
[  282.066048] -----------------------------
[  282.066923] /home/mhiramat/ksrc/linux/kernel/trace/trace_events.c:242
				suspicious rcu_dereference_check() usage!
[  282.068974]
[  282.068974] other info that might help us debug this:
[  282.068974]
[  282.070770]
[  282.070770] RCU used illegally from idle CPU!
[  282.070770] rcu_scheduler_active = 2, debug_locks = 1
[  282.072938] RCU used illegally from extended quiescent state!
[  282.074183] no locks held by swapper/0/0.
[  282.075071]
[  282.075071] stack backtrace:
[  282.076121] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G        W
[  282.077782] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
[  282.079604] Call Trace:
[  282.080212]  <IRQ>
[  282.080755]  dump_stack+0x85/0xcb
[  282.081523]  trace_event_ignore_this_pid+0x66/0x70
[  282.082541]  trace_event_raw_event_preemptirq_template+0xa2/0xb0
[  282.083774]  ? interrupt_entry+0xc4/0xe0
[  282.084665]  ? trace_hardirqs_off_thunk+0x1a/0x1c
[  282.085669]  trace_hardirqs_off_caller+0x90/0xd0
[  282.086597]  trace_hardirqs_off_thunk+0x1a/0x1c
[  282.087433]  ? call_function_interrupt+0xa/0x20
[  282.088201]  interrupt_entry+0xc4/0xe0
[  282.088848]  ? call_function_interrupt+0xa/0x20
[  282.089579]  </IRQ>
[  282.090029]  ? native_safe_halt+0x2/0x10
[  282.090695]  ? default_idle+0x1f/0x160
[  282.091330]  ? default_idle_call+0x24/0x40
[  282.091997]  ? do_idle+0x210/0x250
[  282.092658]  ? cpu_startup_entry+0x6f/0x80
[  282.093338]  ? start_kernel+0x49d/0x4bd
[  282.093987]  ? secondary_startup_64+0xa5/0xb0

Link: http://lkml.kernel.org/r/20180803023407.225852-1-joel@joelfernandes.org

Reported-by: Masami Hiramatsu <mhiramat@kernel.org>
Tested-by: Masami Hiramatsu <mhiramat@kernel.org>
Fixes: e6753f23d9 ("tracepoint: Make rcuidle tracepoint callers use SRCU")
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-08-03 09:38:39 -04:00
Masami Hiramatsu 6bc6c77cfc tracing/kprobes: Fix within_notrace_func() to check only notrace functions
Fix within_notrace_func() to check only notrace functions and to ignore the
kprobe-event which can not solve symbol addresses.

within_notrace_func() returns true if the given kprobe events probe point
seems to be out-of-range. But that is not the correct place to check for it,
it should be done in kprobes afterwards.

kprobe-events allow users to define a probe point on "currently unloaded
module" so that it can trace the event during module load. In this case, the
user will put a probe on a symbol which is not in kallsyms yet and it hits
the within_notrace_func().  As a result, kprobe-events always refuses if
user defines a probe on a "currenly unloaded module".

Fixes: commit 45408c4f92 ("tracing: kprobes: Prohibit probing on notrace function")
Link: http://lkml.kernel.org/r/153319624799.29007.13604430345640129926.stgit@devbox

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-08-02 12:34:41 -04:00
Gustavo A. R. Silva 44ec3ec01f ftrace: Use true and false for boolean values in ops_references_rec()
Return statements in functions returning bool should use true or false
instead of an integer value.

This code was detected with the help of Coccinelle.

Link: http://lkml.kernel.org/r/20180802010056.GA31012@embeddedor.com

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-08-01 21:15:31 -04:00
Steven Rostedt (VMware) d7224c0e12 ring-buffer: Make ring_buffer_record_is_set_on() return bool
The value of ring_buffer_record_is_set_on() is either true or false, so have
its return value be bool.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-08-01 21:09:50 -04:00
Steven Rostedt (VMware) 3ebea280d7 ring-buffer: Make ring_buffer_record_is_on() return bool
The value of ring_buffer_record_is_on() is either true or false, so have its
return value be bool.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-08-01 21:08:30 -04:00
Steven Rostedt (VMware) ec57350883 tracing: Make tracer_tracing_is_on() return bool
There's code that expects tracer_tracing_is_on() to be either true or false,
not some random number. Currently, it should only return one or zero, but
just in case, change its return value to bool, to enforce it.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-08-01 16:08:57 -04:00
Steven Rostedt (VMware) 978defee11 tracing: Do a WARN_ON() if start_thread() in hwlat is called when thread exists
The start function of the hwlat tracer should never be called when the hwlat
thread already exists. If it is called, do a WARN_ON().

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-08-01 16:06:02 -04:00
Erica Bugden 82fbc8c48a ftrace: Add missing check for existing hwlat thread
The hwlat tracer uses a kernel thread to measure latencies. The function
that creates this kernel thread, start_kthread(), can be called when the
tracer is initialized and when the tracer is explicitly enabled.
start_kthread() does not check if there is an existing hwlat kernel
thread and will create a new one each time it is called.

This causes the reference to the previous thread to be lost. Without the
thread reference, the old kernel thread becomes unstoppable and
continues to use CPU time even after the hwlat tracer has been disabled.
This problem can be observed when a system is booted with tracing
enabled and the hwlat tracer is configured like this:

	echo hwlat > current_tracer; echo 1 > tracing_on

Add the missing check for an existing kernel thread in start_kthread()
to prevent this problem. This function and the rest of the hwlat kernel
thread setup and teardown are already serialized because they are called
through the tracer core code with trace_type_lock held.

[
 Note, this only fixes the symptom. The real fix was not to call
 this function when tracing_on was already one. But this still makes
 the code more robust, so we'll add it.
]

Link: http://lkml.kernel.org/r/1533120354-22923-1-git-send-email-erica.bugden@linutronix.de

Signed-off-by: Erica Bugden <erica.bugden@linutronix.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-08-01 16:04:24 -04:00
Steven Rostedt (VMware) f143641bfe tracing: Do not call start/stop() functions when tracing_on does not change
Currently, when one echo's in 1 into tracing_on, the current tracer's
"start()" function is executed, even if tracing_on was already one. This can
lead to strange side effects. One being that if the hwlat tracer is enabled,
and someone does "echo 1 > tracing_on" into tracing_on, the hwlat tracer's
start() function is called again which will recreate another kernel thread,
and make it unable to remove the old one.

Link: http://lkml.kernel.org/r/1533120354-22923-1-git-send-email-erica.bugden@linutronix.de

Cc: stable@vger.kernel.org
Fixes: 2df8f8a6a8 ("tracing: Fix regression with irqsoff tracer and tracing_on file")
Reported-by: Erica Bugden <erica.bugden@linutronix.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-08-01 16:01:02 -04:00
Zubin Mithra 5248ee8560 tracefs: Annotate tracefs_ops with __ro_after_init
tracefs_ops is initialized inside tracefs_create_instance_dir and not
modified after. tracefs_create_instance_dir allows for initialization
only once, and is called from create_trace_instances(marked __init),
which is called from tracer_init_tracefs(marked __init). Also, mark
tracefs_create_instance_dir as __init.

Link: http://lkml.kernel.org/r/20180725171901.4468-1-zsm@chromium.org

Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Zubin Mithra <zsm@chromium.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-07-31 11:32:44 -04:00
Joel Fernandes (Google) c3bc8fd637 tracing: Centralize preemptirq tracepoints and unify their usage
This patch detaches the preemptirq tracepoints from the tracers and
keeps it separate.

Advantages:
* Lockdep and irqsoff event can now run in parallel since they no longer
have their own calls.

* This unifies the usecase of adding hooks to an irqsoff and irqson
event, and a preemptoff and preempton event.
  3 users of the events exist:
  - Lockdep
  - irqsoff and preemptoff tracers
  - irqs and preempt trace events

The unification cleans up several ifdefs and makes the code in preempt
tracer and irqsoff tracers simpler. It gets rid of all the horrific
ifdeferry around PROVE_LOCKING and makes configuration of the different
users of the tracepoints more easy and understandable. It also gets rid
of the time_* function calls from the lockdep hooks used to call into
the preemptirq tracer which is not needed anymore. The negative delta in
lines of code in this patch is quite large too.

In the patch we introduce a new CONFIG option PREEMPTIRQ_TRACEPOINTS
as a single point for registering probes onto the tracepoints. With
this,
the web of config options for preempt/irq toggle tracepoints and its
users becomes:

 PREEMPT_TRACER   PREEMPTIRQ_EVENTS  IRQSOFF_TRACER PROVE_LOCKING
       |                 |     \         |           |
       \    (selects)    /      \        \ (selects) /
      TRACE_PREEMPT_TOGGLE       ----> TRACE_IRQFLAGS
                      \                  /
                       \ (depends on)   /
                     PREEMPTIRQ_TRACEPOINTS

Other than the performance tests mentioned in the previous patch, I also
ran the locking API test suite. I verified that all tests cases are
passing.

I also injected issues by not registering lockdep probes onto the
tracepoints and I see failures to confirm that the probes are indeed
working.

This series + lockdep probes not registered (just to inject errors):
[    0.000000]      hard-irqs-on + irq-safe-A/21:  ok  |  ok  |  ok  |
[    0.000000]      soft-irqs-on + irq-safe-A/21:  ok  |  ok  |  ok  |
[    0.000000]        sirq-safe-A => hirqs-on/12:FAILED|FAILED|  ok  |
[    0.000000]        sirq-safe-A => hirqs-on/21:FAILED|FAILED|  ok  |
[    0.000000]          hard-safe-A + irqs-on/12:FAILED|FAILED|  ok  |
[    0.000000]          soft-safe-A + irqs-on/12:FAILED|FAILED|  ok  |
[    0.000000]          hard-safe-A + irqs-on/21:FAILED|FAILED|  ok  |
[    0.000000]          soft-safe-A + irqs-on/21:FAILED|FAILED|  ok  |
[    0.000000]     hard-safe-A + unsafe-B #1/123:  ok  |  ok  |  ok  |
[    0.000000]     soft-safe-A + unsafe-B #1/123:  ok  |  ok  |  ok  |

With this series + lockdep probes registered, all locking tests pass:

[    0.000000]      hard-irqs-on + irq-safe-A/21:  ok  |  ok  |  ok  |
[    0.000000]      soft-irqs-on + irq-safe-A/21:  ok  |  ok  |  ok  |
[    0.000000]        sirq-safe-A => hirqs-on/12:  ok  |  ok  |  ok  |
[    0.000000]        sirq-safe-A => hirqs-on/21:  ok  |  ok  |  ok  |
[    0.000000]          hard-safe-A + irqs-on/12:  ok  |  ok  |  ok  |
[    0.000000]          soft-safe-A + irqs-on/12:  ok  |  ok  |  ok  |
[    0.000000]          hard-safe-A + irqs-on/21:  ok  |  ok  |  ok  |
[    0.000000]          soft-safe-A + irqs-on/21:  ok  |  ok  |  ok  |
[    0.000000]     hard-safe-A + unsafe-B #1/123:  ok  |  ok  |  ok  |
[    0.000000]     soft-safe-A + unsafe-B #1/123:  ok  |  ok  |  ok  |

Link: http://lkml.kernel.org/r/20180730222423.196630-4-joel@joelfernandes.org

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-07-31 11:32:27 -04:00
Joel Fernandes (Google) e6753f23d9 tracepoint: Make rcuidle tracepoint callers use SRCU
In recent tests with IRQ on/off tracepoints, a large performance
overhead ~10% is noticed when running hackbench. This is root caused to
calls to rcu_irq_enter_irqson and rcu_irq_exit_irqson from the
tracepoint code. Following a long discussion on the list [1] about this,
we concluded that srcu is a better alternative for use during rcu idle.
Although it does involve extra barriers, its lighter than the sched-rcu
version which has to do additional RCU calls to notify RCU idle about
entry into RCU sections.

In this patch, we change the underlying implementation of the
trace_*_rcuidle API to use SRCU. This has shown to improve performance
alot for the high frequency irq enable/disable tracepoints.

Test: Tested idle and preempt/irq tracepoints.

Here are some performance numbers:

With a run of the following 30 times on a single core x86 Qemu instance
with 1GB memory:
hackbench -g 4 -f 2 -l 3000

Completion times in seconds. CONFIG_PROVE_LOCKING=y.

No patches (without this series)
Mean: 3.048
Median: 3.025
Std Dev: 0.064

With Lockdep using irq tracepoints with RCU implementation:
Mean: 3.451   (-11.66 %)
Median: 3.447 (-12.22%)
Std Dev: 0.049

With Lockdep using irq tracepoints with SRCU implementation (this series):
Mean: 3.020   (I would consider the improvement against the "without
	       this series" case as just noise).
Median: 3.013
Std Dev: 0.033

[1] https://patchwork.kernel.org/patch/10344297/

[remove rcu_read_lock_sched_notrace as its the equivalent of
preempt_disable_notrace and is unnecessary to call in tracepoint code]
Link: http://lkml.kernel.org/r/20180730222423.196630-3-joel@joelfernandes.org

Cleaned-up-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
[ Simplified WARN_ON_ONCE() ]
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-07-30 19:13:03 -04:00
Joel Fernandes (Google) 01f38497c6 lockdep: Use this_cpu_ptr instead of get_cpu_var stats
get_cpu_var disables preemption which has the potential to call into the
preemption disable trace points causing some complications. There's also
no need to disable preemption in uses of get_lock_stats anyway since
preempt is already disabled. So lets simplify the code.

Link: http://lkml.kernel.org/r/20180730222423.196630-2-joel@joelfernandes.org

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-07-30 19:06:54 -04:00
Masami Hiramatsu 6fc7c4110c selftests/ftrace: Fix kprobe string testcase to not probe notrace function
Fix kprobe string argument testcase to not probe notrace
function. Instead, it probes tracefs function which must
be available with ftrace.

Link: http://lkml.kernel.org/r/153294607107.32740.1664854684396589624.stgit@devbox

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-07-30 18:41:52 -04:00
Francis Deslauriers d899926f55 selftest/ftrace: Move kprobe selftest function to separate compile unit
Move selftest function to its own compile unit so it can be compiled
with the ftrace cflags (CC_FLAGS_FTRACE) allowing it to be probed
during the ftrace startup tests.

Link: http://lkml.kernel.org/r/153294604271.32740.16490677128630177030.stgit@devbox

Signed-off-by: Francis Deslauriers <francis.deslauriers@efficios.com>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-07-30 18:41:04 -04:00
Masami Hiramatsu 45408c4f92 tracing: kprobes: Prohibit probing on notrace function
Prohibit kprobe-events probing on notrace functions.  Since probing on a
notrace function can cause a recursive event call. In most cases those are just
skipped, but in some cases it falls into an infinite recursive call.

This protection can be disabled by the kconfig
CONFIG_KPROBE_EVENTS_ON_NOTRACE=y, but it is highly recommended to keep it
"n" for normal kernel builds.  Note that this is only available if "kprobes on
ftrace" has been implemented on the target arch and CONFIG_KPROBES_ON_FTRACE=y.

Link: http://lkml.kernel.org/r/153294601436.32740.10557881188933661239.stgit@devbox

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Tested-by: Francis Deslauriers <francis.deslauriers@efficios.com>
[ Slight grammar and spelling fixes ]
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-07-30 18:28:52 -04:00
kbuild test robot 518eeca05c tracing: preemptirq_delay_run() can be static
Automatically found by kbuild test robot.

Fixes: ffdc73a3b2ad ("lib: Add module for testing preemptoff/irqsoff latency tracers")
Signed-off-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-07-27 17:58:34 -04:00
Steven Rostedt (VMware) 87107a25a2 tracing/kprobes: Simplify the logic of enable_trace_kprobe()
The function enable_trace_kprobe() performs slightly differently if the file
parameter is passed in as NULL on non-NULL. Instead of checking file twice,
move the code between the two tests into a static inline helper function to
make the code easier to follow.

Link: http://lkml.kernel.org/r/20180725224728.7b1d5db2@vmware.local.home
Link: http://lkml.kernel.org/r/20180726121152.4dd54330@gandalf.local.home

Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-07-27 09:36:20 -04:00
Masami Hiramatsu 72809cbf67 tracing: Remove orphaned function ftrace_nr_registered_ops()
Remove ftrace_nr_registered_ops() because it is no longer used.

ftrace_nr_registered_ops() has been introduced by commit ea701f11da
("ftrace: Add selftest to test function trace recursion protection"), but
its caller has been removed by commit 05cbbf643b ("tracing: Fix selftest
function recursion accounting"). So it is not called anymore.

Link: http://lkml.kernel.org/r/153260907227.12474.5234899025934963683.stgit@devbox

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-07-26 10:58:43 -04:00
Masami Hiramatsu 7b144b6c79 tracing: Remove orphaned function using_ftrace_ops_list_func().
Remove using_ftrace_ops_list_func() since it is no longer used.

Using ftrace_ops_list_func() has been introduced by commit 7eea4fce02
("tracing/stack_trace: Skip 4 instead of 3 when using ftrace_ops_list_func")
as a helper function, but its caller has been removed by commit 72ac426a5b
("tracing: Clean up stack tracing and fix fentry updates").  So it is not
called anymore.

Link: http://lkml.kernel.org/r/153260904427.12474.9952096317439329851.stgit@devbox

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-07-26 10:53:05 -04:00
Steven Rostedt (VMware) f6b7425cfb tracing: Make unregister_trigger() static
Nothing uses unregister_trigger() outside of trace_events_trigger.c file,
thus it should be static. Not sure why this was ever converted, because
its counter part, register_trigger(), was always static.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-07-26 10:50:18 -04:00
Joel Fernandes (Google) 8bd1369b4c kselftests: Add tests for the preemptoff and irqsoff tracers
Here we add unit tests for the preemptoff and irqsoff tracer by using a
kernel module introduced previously to trigger long preempt or irq
disabled sections in the kernel.

Link: http://lkml.kernel.org/r/20180711063540.91101-3-joel@joelfernandes.org

Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-07-26 10:50:17 -04:00
Joel Fernandes (Google) f96e8577da lib: Add module for testing preemptoff/irqsoff latency tracers
Here we introduce a test module for introducing a long preempt or irq
disable delay in the kernel which the preemptoff or irqsoff tracers can
detect. This module is to be used only for test purposes and is default
disabled.

Following is the expected output (only briefly shown) that can be parsed
to verify that the tracers are working correctly. We will use this from
the kselftests in future patches.

For the preemptoff tracer:

echo preemptoff > /d/tracing/current_tracer
sleep 1
insmod ./preemptirq_delay_test.ko test_mode=preempt delay=500000
sleep 1
bash-4.3# cat /d/tracing/trace
preempt -1066    2...2    0us@: preemptirq_delay_run <-preemptirq_delay_run
preempt -1066    2...2 500002us : preemptirq_delay_run <-preemptirq_delay_run
preempt -1066    2...2 500004us : tracer_preempt_on <-preemptirq_delay_run
preempt -1066    2...2 500012us : <stack trace>
 => kthread
 => ret_from_fork

For the irqsoff tracer:

echo irqsoff > /d/tracing/current_tracer
sleep 1
insmod ./preemptirq_delay_test.ko test_mode=irq delay=500000
sleep 1
bash-4.3# cat /d/tracing/trace
irq dis -1069    1d..1    0us@: preemptirq_delay_run
irq dis -1069    1d..1 500001us : preemptirq_delay_run
irq dis -1069    1d..1 500002us : tracer_hardirqs_on <-preemptirq_delay_run
irq dis -1069    1d..1 500005us : <stack trace>
 => ret_from_fork

Link: http://lkml.kernel.org/r/20180712213611.GA8743@joelaf.mtv.corp.google.com

Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Julia Cartwright <julia@ni.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Thomas Glexiner <tglx@linutronix.de>
Cc: Todd Kjos <tkjos@google.com>
Cc: Tom Zanussi <tom.zanussi@linux.intel.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
[ Erick is a co-developer of this commit ]
Signed-off-by: Erick Reyes <erickreyes@google.com>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-07-26 10:50:17 -04:00
Joel Fernandes (Google) 2b27ece6c5 tracing/irqsoff: Split reset into separate functions
Split reset functions into seperate functions in preparation
of future patches that need to do tracer specific reset.

Link: http://lkml.kernel.org/r/20180628182149.226164-4-joel@joelfernandes.org

Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-07-26 10:50:17 -04:00
Joel Fernandes (Google) 0b764a6e4e srcu: Add notrace variant of srcu_dereference
In the last patch in this series, we are making lockdep register hooks
onto the irq_{disable,enable} tracepoints. These tracepoints use the
_rcuidle tracepoint variant. In this series we switch the _rcuidle
tracepoint callers to use SRCU instead of sched-RCU. Inorder to
dereference the pointer to the probe functions, we could call
srcu_dereference, however this API will call back into lockdep to check
if the lock is held *before* the lockdep probe hooks have a chance to
run and annotate the IRQ enabled/disabled state.

For this reason we need a notrace variant of srcu_dereference since
otherwise we get lockdep splats. This patch adds the needed
srcu_dereference_notrace variant.

Link: http://lkml.kernel.org/r/20180628182149.226164-3-joel@joelfernandes.org

Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-07-26 10:50:16 -04:00
Paul McKenney 1f45a4db36 srcu: Add notrace variants of srcu_read_{lock,unlock}
This is needed for a future tracepoint patch that uses srcu, and to make
sure it doesn't call into lockdep.

tracepoint code already calls notrace variants for rcu_read_lock_sched
so this patch does the same for srcu which will be used in a later
patch. Keeps it consistent with rcu-sched.

[Joel: Added commit message]
Link: http://lkml.kernel.org/r/20180628182149.226164-2-joel@joelfernandes.org

Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Paul McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-07-26 10:50:16 -04:00
Snild Dolkow 3e536e222f kthread, tracing: Don't expose half-written comm when creating kthreads
There is a window for racing when printing directly to task->comm,
allowing other threads to see a non-terminated string. The vsnprintf
function fills the buffer, counts the truncated chars, then finally
writes the \0 at the end.

	creator                     other
	vsnprintf:
	  fill (not terminated)
	  count the rest            trace_sched_waking(p):
	  ...                         memcpy(comm, p->comm, TASK_COMM_LEN)
	  write \0

The consequences depend on how 'other' uses the string. In our case,
it was copied into the tracing system's saved cmdlines, a buffer of
adjacent TASK_COMM_LEN-byte buffers (note the 'n' where 0 should be):

	crash-arm64> x/1024s savedcmd->saved_cmdlines | grep 'evenk'
	0xffffffd5b3818640:     "irq/497-pwr_evenkworker/u16:12"

...and a strcpy out of there would cause stack corruption:

	[224761.522292] Kernel panic - not syncing: stack-protector:
	    Kernel stack is corrupted in: ffffff9bf9783c78

	crash-arm64> kbt | grep 'comm\|trace_print_context'
	#6  0xffffff9bf9783c78 in trace_print_context+0x18c(+396)
	      comm (char [16]) =  "irq/497-pwr_even"

	crash-arm64> rd 0xffffffd4d0e17d14 8
	ffffffd4d0e17d14:  2f71726900000000 5f7277702d373934   ....irq/497-pwr_
	ffffffd4d0e17d24:  726f776b6e657665 3a3631752f72656b   evenkworker/u16:
	ffffffd4d0e17d34:  f9780248ff003231 cede60e0ffffff9b   12..H.x......`..
	ffffffd4d0e17d44:  cede60c8ffffffd4 00000fffffffffd4   .....`..........

The workaround in e09e28671 (use strlcpy in __trace_find_cmdline) was
likely needed because of this same bug.

Solved by vsnprintf:ing to a local buffer, then using set_task_comm().
This way, there won't be a window where comm is not terminated.

Link: http://lkml.kernel.org/r/20180726071539.188015-1-snild@sony.com

Cc: stable@vger.kernel.org
Fixes: bc0c38d139 ("ftrace: latency tracer infrastructure")
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Snild Dolkow <snild@sony.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-07-26 09:59:33 -04:00
Steven Rostedt (VMware) 2519c1bbe3 tracing: Quiet gcc warning about maybe unused link variable
Commit 57ea2a34ad ("tracing/kprobes: Fix trace_probe flags on
enable_trace_kprobe() failure") added an if statement that depends on another
if statement that gcc doesn't see will initialize the "link" variable and
gives the warning:

 "warning: 'link' may be used uninitialized in this function"

It is really a false positive, but to quiet the warning, and also to make
sure that it never actually is used uninitialized, initialize the "link"
variable to NULL and add an if (!WARN_ON_ONCE(!link)) where the compiler
thinks it could be used uninitialized.

Cc: stable@vger.kernel.org
Fixes: 57ea2a34ad ("tracing/kprobes: Fix trace_probe flags on enable_trace_kprobe() failure")
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-07-25 22:33:50 -04:00
Steven Rostedt (VMware) 15cc78644d tracing: Fix possible double free in event_enable_trigger_func()
There was a case that triggered a double free in event_trigger_callback()
due to the called reg() function freeing the trigger_data and then it
getting freed again by the error return by the caller. The solution there
was to up the trigger_data ref count.

Code inspection found that event_enable_trigger_func() has the same issue,
but is not as easy to trigger (requires harder to trigger failures). It
needs to be solved slightly different as it needs more to clean up when the
reg() function fails.

Link: http://lkml.kernel.org/r/20180725124008.7008e586@gandalf.local.home

Cc: stable@vger.kernel.org
Fixes: 7862ad1846 ("tracing: Add 'enable_event' and 'disable_event' event trigger commands")
Reivewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-07-25 21:25:16 -04:00
Artem Savkov 57ea2a34ad tracing/kprobes: Fix trace_probe flags on enable_trace_kprobe() failure
If enable_trace_kprobe fails to enable the probe in enable_k(ret)probe
it returns an error, but does not unset the tp flags it set previously.
This results in a probe being considered enabled and failures like being
unable to remove the probe through kprobe_events file since probes_open()
expects every probe to be disabled.

Link: http://lkml.kernel.org/r/20180725102826.8300-1-asavkov@redhat.com
Link: http://lkml.kernel.org/r/20180725142038.4765-1-asavkov@redhat.com

Cc: Ingo Molnar <mingo@redhat.com>
Cc: stable@vger.kernel.org
Fixes: 41a7dd420c ("tracing/kprobes: Support ftrace_event_file base multibuffer")
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Artem Savkov <asavkov@redhat.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-07-25 11:41:08 -04:00
Masami Hiramatsu 82f4f3e69c selftests/ftrace: Add snapshot and tracing_on test case
Add a testcase for checking snapshot and tracing_on
relationship. This ensures that the snapshotting doesn't
affect current tracing on/off settings.

Link: http://lkml.kernel.org/r/153149932412.11274.15289227592627901488.stgit@devbox

Cc: Tom Zanussi <tom.zanussi@linux.intel.com>
Cc: Hiraku Toyooka <hiraku.toyooka@cybertrust.co.jp>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: linux-kselftest@vger.kernel.org
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-07-25 10:29:42 -04:00
Masami Hiramatsu 73c8d89455 ring_buffer: tracing: Inherit the tracing setting to next ring buffer
Maintain the tracing on/off setting of the ring_buffer when switching
to the trace buffer snapshot.

Taking a snapshot is done by swapping the backup ring buffer
(max_tr_buffer). But since the tracing on/off setting is defined
by the ring buffer, when swapping it, the tracing on/off setting
can also be changed. This causes a strange result like below:

  /sys/kernel/debug/tracing # cat tracing_on
  1
  /sys/kernel/debug/tracing # echo 0 > tracing_on
  /sys/kernel/debug/tracing # cat tracing_on
  0
  /sys/kernel/debug/tracing # echo 1 > snapshot
  /sys/kernel/debug/tracing # cat tracing_on
  1
  /sys/kernel/debug/tracing # echo 1 > snapshot
  /sys/kernel/debug/tracing # cat tracing_on
  0

We don't touch tracing_on, but snapshot changes tracing_on
setting each time. This is an anomaly, because user doesn't know
that each "ring_buffer" stores its own tracing-enable state and
the snapshot is done by swapping ring buffers.

Link: http://lkml.kernel.org/r/153149929558.11274.11730609978254724394.stgit@devbox

Cc: Ingo Molnar <mingo@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Tom Zanussi <tom.zanussi@linux.intel.com>
Cc: Hiraku Toyooka <hiraku.toyooka@cybertrust.co.jp>
Cc: stable@vger.kernel.org
Fixes: debdd57f51 ("tracing: Make a snapshot feature available from userspace")
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
[ Updated commit log and comment in the code ]
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-07-25 10:29:41 -04:00
Steven Rostedt (VMware) 1863c38725 tracing: Fix double free of event_trigger_data
Running the following:

 # cd /sys/kernel/debug/tracing
 # echo 500000 > buffer_size_kb
[ Or some other number that takes up most of memory ]
 # echo snapshot > events/sched/sched_switch/trigger

Triggers the following bug:

 ------------[ cut here ]------------
 kernel BUG at mm/slub.c:296!
 invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC PTI
 CPU: 6 PID: 6878 Comm: bash Not tainted 4.18.0-rc6-test+ #1066
 Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v03.03 07/14/2016
 RIP: 0010:kfree+0x16c/0x180
 Code: 05 41 0f b6 72 51 5b 5d 41 5c 4c 89 d7 e9 ac b3 f8 ff 48 89 d9 48 89 da 41 b8 01 00 00 00 5b 5d 41 5c 4c 89 d6 e9 f4 f3 ff ff <0f> 0b 0f 0b 48 8b 3d d9 d8 f9 00 e9 c1 fe ff ff 0f 1f 40 00 0f 1f
 RSP: 0018:ffffb654436d3d88 EFLAGS: 00010246
 RAX: ffff91a9d50f3d80 RBX: ffff91a9d50f3d80 RCX: ffff91a9d50f3d80
 RDX: 00000000000006a4 RSI: ffff91a9de5a60e0 RDI: ffff91a9d9803500
 RBP: ffffffff8d267c80 R08: 00000000000260e0 R09: ffffffff8c1a56be
 R10: fffff0d404543cc0 R11: 0000000000000389 R12: ffffffff8c1a56be
 R13: ffff91a9d9930e18 R14: ffff91a98c0c2890 R15: ffffffff8d267d00
 FS:  00007f363ea64700(0000) GS:ffff91a9de580000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 000055c1cacc8e10 CR3: 00000000d9b46003 CR4: 00000000001606e0
 Call Trace:
  event_trigger_callback+0xee/0x1d0
  event_trigger_write+0xfc/0x1a0
  __vfs_write+0x33/0x190
  ? handle_mm_fault+0x115/0x230
  ? _cond_resched+0x16/0x40
  vfs_write+0xb0/0x190
  ksys_write+0x52/0xc0
  do_syscall_64+0x5a/0x160
  entry_SYSCALL_64_after_hwframe+0x49/0xbe
 RIP: 0033:0x7f363e16ab50
 Code: 73 01 c3 48 8b 0d 38 83 2c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 83 3d 79 db 2c 00 00 75 10 b8 01 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 1e e3 01 00 48 89 04 24
 RSP: 002b:00007fff9a4c6378 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
 RAX: ffffffffffffffda RBX: 0000000000000009 RCX: 00007f363e16ab50
 RDX: 0000000000000009 RSI: 000055c1cacc8e10 RDI: 0000000000000001
 RBP: 000055c1cacc8e10 R08: 00007f363e435740 R09: 00007f363ea64700
 R10: 0000000000000073 R11: 0000000000000246 R12: 0000000000000009
 R13: 0000000000000001 R14: 00007f363e4345e0 R15: 00007f363e4303c0
 Modules linked in: ip6table_filter ip6_tables snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_seq snd_seq_device i915 snd_pcm snd_timer i2c_i801 snd soundcore i2c_algo_bit drm_kms_helper
86_pkg_temp_thermal video kvm_intel kvm irqbypass wmi e1000e
 ---[ end trace d301afa879ddfa25 ]---

The cause is because the register_snapshot_trigger() call failed to
allocate the snapshot buffer, and then called unregister_trigger()
which freed the data that was passed to it. Then on return to the
function that called register_snapshot_trigger(), as it sees it
failed to register, it frees the trigger_data again and causes
a double free.

By calling event_trigger_init() on the trigger_data (which only ups
the reference counter for it), and then event_trigger_free() afterward,
the trigger_data would not get freed by the registering trigger function
as it would only up and lower the ref count for it. If the register
trigger function fails, then the event_trigger_free() called after it
will free the trigger data normally.

Link: http://lkml.kernel.org/r/20180724191331.738eb819@gandalf.local.home

Cc: stable@vger.kerne.org
Fixes: 93e31ffbf4 ("tracing: Add 'snapshot' event trigger command")
Reported-by: Masami Hiramatsu <mhiramat@kernel.org>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-07-25 10:29:24 -04:00
Linus Torvalds d72e90f33a Linux 4.18-rc6 2018-07-22 14:12:20 -07:00
Linus Torvalds 7441308421 NVMe fixes for 4.18-rc6:
- fix a regression in 4.18 that causes a memory leak on probe failure
    (Keith Bush)
  - fix a deadlock in the passthrough ioctl code (Scott Bauer)
  - don't enable AENs if not supported (Weiping Zhang)
  - fix an old regression in metadata handling in the passthrough ioctl
    code (Roland Dreier)
 -----BEGIN PGP SIGNATURE-----
 
 iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAltUe8YLHGhjaEBsc3Qu
 ZGUACgkQD55TZVIEUYM6TQ//UDrKbhnW6x2Vl7wfSyPjG1lADDXjLrIPoy2+WJNN
 ylgRl0Ezv7bXvj9gdwkDcgeoN0ua6gf88vjrwgem27BySPNeDMWYaaaAbwUaHxJd
 rsW/ogaB3gHrgn0MWn7OPb/WT2bQtoq55ivBP9A1ExRAdZ6RjM8qQc/7dkPFCaLf
 XxUE1+udgFVp5a7nbFb6TRdaZmxzYgkDU1PTgERD8RTmBes7K5uOQtO5whFVHU7b
 tveIXLmybgpB0BDN8R9x1uHRtjRmIdgSrJ6H+ps5cc+LB/wHTWvRd/hdlC++Ug8u
 k3+ifvsOLDdTz0xFW+0256edCyStQvVQYog7EcjxHL2GViSyxUayJWKE3XVI7DFW
 tClP6IW39XqTbYs0LGJmv1POufiQUUD3I6xHgE3R3Yb5CyE4EKrNnBkAK4F2pX6n
 Y9rgSY3cjswi/qn9vKZr2DVkEl1oqGiFVBV6PxMZwIHnIoJfZQ4ZwsPgEaeridil
 +GjyF6j2mI5DtrJ9rN8UYENDVioqb1r+1TXt9k/t4bmaK4IWms2+/w9YHfH+4hUr
 M64CkvQa7/wHhE3oIEzgOWLDhvksNyyZQHR6BkMGlwGg7xvO2FuQZlong6MlTVyc
 bgVNPf71X705xuYfXOHCxkSvviWAJlJtsB7r+R6ez6ikngagt2VOK+yPeSesRnux
 kCo=
 =PvXH
 -----END PGP SIGNATURE-----

Merge tag 'nvme-for-4.18' of git://git.infradead.org/nvme

Pull NVMe fixes from Christoph Hellwig:

 - fix a regression in 4.18 that causes a memory leak on probe failure
   (Keith Bush)

 - fix a deadlock in the passthrough ioctl code (Scott Bauer)

 - don't enable AENs if not supported (Weiping Zhang)

 - fix an old regression in metadata handling in the passthrough ioctl
   code (Roland Dreier)

* tag 'nvme-for-4.18' of git://git.infradead.org/nvme:
  nvme: fix handling of metadata_len for NVME_IOCTL_IO_CMD
  nvme: don't enable AEN if not supported
  nvme: ensure forward progress during Admin passthru
  nvme-pci: fix memory leak on probe failure
2018-07-22 13:21:45 -07:00
Linus Torvalds 165ea0d1c2 Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs fixes from Al Viro:
 "Fix several places that screw up cleanups after failures halfway
  through opening a file (one open-coding filp_clone_open() and getting
  it wrong, two misusing alloc_file()). That part is -stable fodder from
  the 'work.open' branch.

  And Christoph's regression fix for uapi breakage in aio series;
  include/uapi/linux/aio_abi.h shouldn't be pulling in the kernel
  definition of sigset_t, the reason for doing so in the first place had
  been bogus - there's no need to expose struct __aio_sigset in
  aio_abi.h at all"

* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  aio: don't expose __aio_sigset in uapi
  ocxlflash_getfile(): fix double-iput() on alloc_file() failures
  cxl_getfile(): fix double-iput() on alloc_file() failures
  drm_mode_create_lease_ioctl(): fix open-coded filp_clone_open()
2018-07-22 12:04:51 -07:00
Al Viro f88a333b44 alpha: fix osf_wait4() breakage
kernel_wait4() expects a userland address for status - it's only
rusage that goes as a kernel one (and needs a copyout afterwards)

[ Also, fix the prototype of kernel_wait4() to have that __user
  annotation   - Linus ]

Fixes: 92ebce5ac5 ("osf_wait4: switch to kernel_wait4()")
Cc: stable@kernel.org # v4.13+
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-07-22 11:51:30 -07:00
Linus Torvalds 45ae4df922 ARM: SoC fixes for 4.18-rc
- Fix interrupt type on ethernet switch for i.MX-based RDU2
  - GPC on i.MX exposed too large a register window which resulted in
    userspace being able to crash the machine.
  - Fixup of bad merge resolution moving GPIO DT nodes under pinctrl
    on droid4.
 -----BEGIN PGP SIGNATURE-----
 
 iQJDBAABCAAtFiEElf+HevZ4QCAJmMQ+jBrnPN6EHHcFAltToboPHG9sb2ZAbGl4
 b20ubmV0AAoJEIwa5zzehBx3gP4P/3Mntu351wZpaczDQZFQxSvxnmT2Ocr9YKFR
 u5UOoE1hTCxOHcrZO7C/EbIKaqD1QhpxMPevcqpOid0glAiEj6D0c0qewAML2vwH
 gGENHX5z5phwrK7RDJZhBiH2jKCg8ttOn0QSoHxGGZNSPAL2nimMwD0IbiqTI5dx
 SkqecCPBwmizpfltdOCRRhN9RCiIvzcqoyLz0HjZ/sff1Y+t3U+alq227rZkQOki
 bj9uD+XkKYZzgiECd6HfMtPHUSUusSXcpF/TyfdnHeyHpF1E3InPVC7dbTASFnxb
 C6zrX99c2Fu11TV7Kkkn1LTwA0rRuXQmSV7ZWZMOqQBrONqGpy0CPIY+LA1xYGCd
 8VtgP7qj0m7XKkPyEriwNDSKKE+c7cCYn9VpR6Kg5xmw0DUCTohMQmeZRo2sMylT
 UlYMjNKQ53IuPullwRaJVM63kA3CuFo3fyStg18SYcx2lRFO8lcGJYqjqd91KkDF
 ZW/tG9V6v7lz/3J3XUOFTJNWwi1CUKEIMM3ObtfDAZToyS1zbe5kX+kiTcUnvGty
 wv3aWCknQnru++vYhtIYLsqwu/NwoJLTWppEmX4YxoV8fW4Yw95e3zjwzWn7rczQ
 dNv7b4Hz/gpDZk7o3dpBpajTCzhh549bDfY9yxBpkd+otUKvgjKkKvsiqCkFV+j7
 dcs5FMT5
 =VFnX
 -----END PGP SIGNATURE-----

Merge tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc

Pull ARM SoC fixes from Olof Johansson:

 - Fix interrupt type on ethernet switch for i.MX-based RDU2

 - GPC on i.MX exposed too large a register window which resulted in
   userspace being able to crash the machine.

 - Fixup of bad merge resolution moving GPIO DT nodes under pinctrl on
   droid4.

* tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
  ARM: dts: imx6: RDU2: fix irq type for mv88e6xxx switch
  soc: imx: gpc: restrict register range for regmap access
  ARM: dts: omap4-droid4: fix dts w.r.t. pwm
2018-07-21 17:27:42 -07:00
Linus Torvalds ef81e63e17 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fix from Ingo Molnar:
 "A single fix for a MCE-polling regression, which prevented the
  disabling of polling"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/MCE: Remove min interval polling limitation
2018-07-21 17:25:49 -07:00
Linus Torvalds 43227e098c Merge branch 'x86-pti-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 pti fixes from Ingo Molnar:
 "An APM fix, and a BTS hardware-tracing fix related to PTI changes"

* 'x86-pti-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/apm: Don't access __preempt_count with zeroed fs
  x86/events/intel/ds: Fix bts_interrupt_threshold alignment
2018-07-21 17:23:58 -07:00
Linus Torvalds 48b1db7c7a Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar:
 "Two fixes: a stop-machine preemption fix and a SCHED_DEADLINE fix"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/deadline: Fix switched_from_dl() warning
  stop_machine: Disable preemption when waking two stopper threads
2018-07-21 17:21:34 -07:00
Linus Torvalds ea75a2c715 Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull core kernel fixes from Ingo Molnar:
 "This is mostly the copy_to_user_mcsafe() related fixes from Dan
  Williams, and an ORC fix for Clang"

* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/asm/memcpy_mcsafe: Fix copy_to_user_mcsafe() exception handling
  lib/iov_iter: Fix pipe handling in _copy_to_iter_mcsafe()
  lib/iov_iter: Document _copy_to_iter_flushcache()
  lib/iov_iter: Document _copy_to_iter_mcsafe()
  objtool: Use '.strtab' if '.shstrtab' doesn't exist, to support ORC tables on Clang
2018-07-21 16:52:08 -07:00
Linus Torvalds ffb48e7924 powerpc fixes for 4.18 #4
Two regression fixes, one for xmon disassembly formatting and the other to fix
 the E500 build.
 
 Two commits to fix a potential security issue in the VFIO code under obscure
 circumstances.
 
 And finally a fix to the Power9 idle code to restore SPRG3, which is user
 visible and used for sched_getcpu().
 
 Thanks to:
   Alexey Kardashevskiy, David Gibson. Gautham R. Shenoy, James Clarke.
 -----BEGIN PGP SIGNATURE-----
 
 iQIwBAABCAAaBQJbUrFqExxtcGVAZWxsZXJtYW4uaWQuYXUACgkQUevqPMjhpYCW
 mQ//eYpaYIkEthXH0uHUN2wpWZlhbg0wUtjclsT5RUonDwMC2PM9BUuLk61RtIlD
 jbi39GrUISHf13U1Ydhb3e1I5B+TpDe6hgb/dGUMAXPe6rt+jFREogZR+vgIU0ep
 Q8ta1GIgbI6La0zXn5o3apUtqR7bAQ3cWD2T8vN4tQmAQZPw1fV13cZZ2kFs5JFO
 aYX8pD76wkAUz8Im+bweziRSRYdAIi8Oxt1Cdyg9Oti5y4fp4LuQKa/qbfkswkkk
 2ycG3TWr4Ln8RM/GaUPW1UPh1Zd8b6et7vUnhxO1g/JdEXlnm9A+DifJFrUk/+y8
 DsyofdXUj+u+LXX/H8uqqse7ysfvkC53R15Jo8irISsIgYZcv4yKsZKGJR27wHGV
 h9KBDA0ZK+czU2jK4gZAdMTs1IICgXGUVIL+nI8U1sRBep3CI3HguqBVoC/MGes3
 WR2+8i2diL1m2jAHR5Nd3+ArBpI876ZalhDmM1Mv3GRUAvjo9nsehk32/nfBDYUM
 nPQd1vi3F6w1MSlj7aqA3quTeGANwlnOcdC4QB++tz7oO1sY6ZYZtii3jsGSluXL
 vP/Mxvz2AG3v9KPCDQJqzx0n3vMAmGphetddxYl1X1dSwbwLNx0GggKNe0xw6sjp
 6lbin0w/Ot79VoSEncuRdm1hrtIktwubsH5CGB7Gxke5Kgk=
 =Sfmf
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-4.18-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:
 "Two regression fixes, one for xmon disassembly formatting and the
  other to fix the E500 build.

  Two commits to fix a potential security issue in the VFIO code under
  obscure circumstances.

  And finally a fix to the Power9 idle code to restore SPRG3, which is
  user visible and used for sched_getcpu().

  Thanks to: Alexey Kardashevskiy, David Gibson. Gautham R. Shenoy,
  James Clarke"

* tag 'powerpc-4.18-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/powernv: Fix save/restore of SPRG3 on entry/exit from stop (idle)
  powerpc/Makefile: Assemble with -me500 when building for E500
  KVM: PPC: Check if IOMMU page is contained in the pinned physical page
  vfio/spapr: Use IOMMU pageshift rather than pagesize
  powerpc/xmon: Fix disassembly since printf changes
2018-07-21 16:46:53 -07:00
Linus Torvalds 55b636b419 for-4.18-rc5-tag
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAltSwkIACgkQxWXV+ddt
 WDtMBQ//UXMjHaXvFmC0SM6NczuYQR51hLYtJFIKig93XK5goVpUBTNbxO7LX/Tn
 4zKoKyVhkW1884V6mRiC+G23QLbo0BQZA7DExfyJ3jylQdjZMBm+K+r19OtGQf5v
 CII7oUwni03KIiXIqiFAL5dLWebVQpG5EKJbh8GLZsmg6xNcyVaUqZ/fHXajbZiv
 ldEBtHBKIv7WWTJmylMBKMWnRz+jqU91fXPahoU6R5qivODrLt1o/PMuSjVNhaxe
 iDldHfdOaiQmLHB/1kOGyv492oW5mSSVNDE8LjEDZ61tDNlAcUyuKUWIRBxDEDtD
 6D7rlVQXJ/N7sJ6+UYmJKsRpHL+NOkyzSZ0QEU/sm1Xpm8gkhHuuofRPrVCtd3l1
 ZSbwvlrdyjigVEBfM3IbToQ/K6Rc1ZGId20OAs9PCQbb+mj9IxPIncZ7pI1c4hlh
 pPEjcYsp14JbCTjctFalcqTiFY5tHRQsx+GUFnDyOcdL7Mi+CoH+0Jy61Vgz9GQE
 7s934cfEC0ot/f66kAL/PZzxUfC7TePqaa+sDfS5BIkJ4M6lPMxS5De5R4Z0+Nzr
 DXgQAlgXmxfRjpOYMTH9D0EDdSeJaNmVHgk7hFbiYk/KX3oyd4NmgI9Cfao8rQJv
 2yd8wF2httfSJKD4b/Hv9r6Ho/Bw9PK59BvWOKYhSj6IGl32utw=
 =f7eB
 -----END PGP SIGNATURE-----

Merge tag 'for-4.18-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fix from David Sterba:
 "A fix of a corruption regarding fsync and clone, under some very
  specific conditions explained in the patch.

  The fix is marked for stable 3.16+ so I'd like to get it merged now
  given the impact"

* tag 'for-4.18-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  Btrfs: fix file data corruption after cloning a range and fsync
2018-07-21 16:42:03 -07:00
Linus Torvalds 490fc05386 mm: make vm_area_alloc() initialize core fields
Like vm_area_dup(), it initializes the anon_vma_chain head, and the
basic mm pointer.

The rest of the fields end up being different for different users,
although the plan is to also initialize the 'vm_ops' field to a dummy
entry.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-07-21 15:24:03 -07:00
Linus Torvalds 95faf6992d mm: make vm_area_dup() actually copy the old vma data
.. and re-initialize th eanon_vma_chain head.

This removes some boiler-plate from the users, and also makes it clear
why it didn't need use the 'zalloc()' version.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-07-21 14:48:45 -07:00
Linus Torvalds 3928d4f5ee mm: use helper functions for allocating and freeing vm_area structs
The vm_area_struct is one of the most fundamental memory management
objects, but the management of it is entirely open-coded evertwhere,
ranging from allocation and freeing (using kmem_cache_[z]alloc and
kmem_cache_free) to initializing all the fields.

We want to unify this in order to end up having some unified
initialization of the vmas, and the first step to this is to at least
have basic allocation functions.

Right now those functions are literally just wrappers around the
kmem_cache_*() calls.  This is a purely mechanical conversion:

    # new vma:
    kmem_cache_zalloc(vm_area_cachep, GFP_KERNEL) -> vm_area_alloc()

    # copy old vma
    kmem_cache_alloc(vm_area_cachep, GFP_KERNEL) -> vm_area_dup(old)

    # free vma
    kmem_cache_free(vm_area_cachep, vma) -> vm_area_free(vma)

to the point where the old vma passed in to the vm_area_dup() function
isn't even used yet (because I've left all the old manual initialization
alone).

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-07-21 13:48:51 -07:00
Linus Torvalds 191a3afa98 Merge branch 'akpm' (patches from Andrew)
Merge fixes from Andrew Morton:
 "5 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  mm: memcg: fix use after free in mem_cgroup_iter()
  mm/huge_memory.c: fix data loss when splitting a file pmd
  fat: fix memory allocation failure handling of match_strdup()
  MAINTAINERS: Peter has moved
  mm/memblock: add missing include <linux/bootmem.h>
2018-07-21 13:14:17 -07:00
Jing Xia 9f15bde671 mm: memcg: fix use after free in mem_cgroup_iter()
It was reported that a kernel crash happened in mem_cgroup_iter(), which
can be triggered if the legacy cgroup-v1 non-hierarchical mode is used.

Unable to handle kernel paging request at virtual address 6b6b6b6b6b6b8f
......
Call trace:
  mem_cgroup_iter+0x2e0/0x6d4
  shrink_zone+0x8c/0x324
  balance_pgdat+0x450/0x640
  kswapd+0x130/0x4b8
  kthread+0xe8/0xfc
  ret_from_fork+0x10/0x20

  mem_cgroup_iter():
      ......
      if (css_tryget(css))    <-- crash here
	    break;
      ......

The crashing reason is that mem_cgroup_iter() uses the memcg object whose
pointer is stored in iter->position, which has been freed before and
filled with POISON_FREE(0x6b).

And the root cause of the use-after-free issue is that
invalidate_reclaim_iterators() fails to reset the value of iter->position
to NULL when the css of the memcg is released in non- hierarchical mode.

Link: http://lkml.kernel.org/r/1531994807-25639-1-git-send-email-jing.xia@unisoc.com
Fixes: 6df38689e0 ("mm: memcontrol: fix possible memcg leak due to interrupted reclaim")
Signed-off-by: Jing Xia <jing.xia.mail@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: <chunyan.zhang@unisoc.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-07-21 12:50:46 -07:00