This patch adds a fallback to cat for the pager. This is useful
on environments, such as Android, where less does not exist.
It is better to default to cat than to abort.
Signed-off-by: Michael Lentine <mlentine@google.com>
Reviewed-by: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1400579330-5043-2-git-send-email-eranian@google.com
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Now we moved to the perf_hpp_[_sort]_list so no need to keep the old
hist_entry__sort_list and sort__first_dimension. Also the
hist_entry__sort_snprintf() can be gone as hist_entry__snprintf()
provides the functionality.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/1400480762-22852-18-git-send-email-namhyung@kernel.org
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
The --fields option is to allow user setup output field in any order.
It can receive any sort keys and following (hpp) fields:
overhead, overhead_sys, overhead_us, sample and period
If guest profiling is enabled, overhead_guest_{sys,us} will be
available too.
More more information, please see previous patch "perf report:
Add -F option to specify output fields"
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/1400480762-22852-15-git-send-email-namhyung@kernel.org
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
The hists__filter_entries() function is called when down arrow key is
pressed for navigating through the entries in TUI. It has a check for
filtering out entries that have very small overhead (under min_pcnt).
However it just assumed the entries are sorted by the overhead so when
it saw such a small overheaded entry, it just stopped navigating as an
optimization. But it's not true anymore due to new --fields and
--sort optoin behavior and this case users cannot go down to a next
entry if ther's an entry with small overhead in-between.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/1400480762-22852-14-git-send-email-namhyung@kernel.org
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Currently, what the sort_entry does is just identifying hist entries
so that they can be grouped properly. However, with -F option
support, it indeed needs to sort entries appropriately to be shown to
users. So add ->sort() member to do it.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ingo Molnar <mingo@kernel.org>
Link: http://lkml.kernel.org/r/1400480762-22852-13-git-send-email-namhyung@kernel.org
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
The -F/--fields option is to allow user setup output field in any
order. It can receive any sort keys and following (hpp) fields:
overhead, overhead_sys, overhead_us, sample and period
If guest profiling is enabled, overhead_guest_{sys,us} will be
available too.
The output fields also affect sort order unless you give -s/--sort
option. And any keys specified on -s option, will also be added to
the output field list automatically.
$ perf report -F sym,sample,overhead
...
# Symbol Samples Overhead
# .......................... ............ ........
#
[.] __cxa_atexit 2 2.50%
[.] __libc_csu_init 4 5.00%
[.] __new_exitfn 3 3.75%
[.] _dl_check_map_versions 1 1.25%
[.] _dl_name_match_p 4 5.00%
[.] _dl_setup_hash 1 1.25%
[.] _dl_sysdep_start 1 1.25%
[.] _init 5 6.25%
[.] _setjmp 6 7.50%
[.] a 8 10.00%
[.] b 8 10.00%
[.] brk 1 1.25%
[.] c 8 10.00%
Note that, the example output above is captured after applying next
patch which fixes sort/comparing behavior.
Requested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ingo Molnar <mingo@kernel.org>
Link: http://lkml.kernel.org/r/1400480762-22852-12-git-send-email-namhyung@kernel.org
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
So that it can be set properly prior to set up output fields. That
makes easy to handle/warn errors during the setup since it doesn't
need to be bothered with the GUI.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/1400480762-22852-11-git-send-email-namhyung@kernel.org
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
The perf uses different default sort orders for different use-cases,
and this was scattered throughout the code. Add get_default_sort_
order() function to handle this and change initial value of sort_order
to NULL to distinguish it from user-given one.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1400480762-22852-10-git-send-email-namhyung@kernel.org
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
The callback was used by TUI for determining color of folded sign
using percent of first field/column. But it cannot be used anymore
since it now support dynamic reordering of output field.
So move the logic to the hist_browser__show_entry().
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ingo Molnar <mingo@kernel.org>
Link: http://lkml.kernel.org/r/1400480762-22852-8-git-send-email-namhyung@kernel.org
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Until now the hpp and sort functions do similar jobs different ways.
Since the sort functions converted/wrapped to hpp formats it can do
the job in a uniform way.
The perf_hpp__sort_list has a list of hpp formats to sort entries and
the perf_hpp__list has a list of hpp formats to print output result.
To have a backward compatibility, it automatically adds 'overhead'
field in front of sort list. And then all of fields in sort list
added to the output list (if it's not already there).
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ingo Molnar <mingo@kernel.org>
Link: http://lkml.kernel.org/n/tip-7g3h86woz2sckg3h1lj42ygj@git.kernel.org
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Move logic of hist_entry__sort_on_period to __hpp__sort() in order to
support event group report.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ingo Molnar <mingo@kernel.org>
Link: http://lkml.kernel.org/r/1400480762-22852-5-git-send-email-namhyung@kernel.org
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Those function pointers will be used to sort report output based on
the selected fields. This is a preparation of later change.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ingo Molnar <mingo@kernel.org>
Link: http://lkml.kernel.org/r/1400480762-22852-2-git-send-email-namhyung@kernel.org
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
This patch fixes a bug in precise_store_data_hsw() whereby
it would set the data source memory level to the wrong value.
As per the the SDM Vol 3b Table 18-41 (Layout of Data Linear
Address Information in PEBS Record), when status bit 0 is set
this is a L1 hit, otherwise this is a L1 miss.
This patch encodes the memory level according to the specification.
In V2, we added the filtering on the store events.
Only the following events produce L1 information:
* MEM_UOPS_RETIRED.STLB_MISS_STORES
* MEM_UOPS_RETIRED.LOCK_STORES
* MEM_UOPS_RETIRED.SPLIT_STORES
* MEM_UOPS_RETIRED.ALL_STORES
Cc: mingo@elte.hu
Cc: acme@ghostprotocols.net
Cc: jolsa@redhat.com
Cc: jmario@redhat.com
Cc: ak@linux.intel.com
Tested-and-Reviewed-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20140515155644.GA3884@quad
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Vince noticed that we test the (unsigned long) flags field against an
(unsigned int) constant. This would allow setting the high bits on 64bit
platforms and not get an error.
There is nothing that uses the high bits, so it should be entirely
harmless, but we don't want userspace to accidentally set them anyway,
so fix the constants.
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Tested-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20140423102254.GL11096@twins.programming.kicks-ass.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Adding libdw DWARF post unwind support, which is part
of elfutils-devel/libdw-dev package from version 0.158.
The new code is contained in unwin-libdw.c object, and
implements unwind__get_entries unwind interface function.
Signed-off-by: Jean Pihet <jean.pihet@linaro.org>
Reviewed-by: Will Deacon <will.deacon@arm.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1400229672-16104-4-git-send-email-jean.pihet@linaro.org
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Adding dwarf unwind test, that setups live machine data over
the perf test thread and does the remote unwind.
Need to use -fno-optimize-sibling-calls for test compilation,
otherwise 'krava_*' function calls are optimized into jumps
and omitted from the stack unwind.
So far it was enabled only for x86.
Signed-off-by: Jean Pihet <jean.pihet@linaro.org>
Reviewed-by: Will Deacon <will.deacon@arm.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1400229672-16104-3-git-send-email-jean.pihet@linaro.org
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Introducing perf_regs_load function, which is going
to be used for dwarf unwind test in following patches.
It takes single argument as a pointer to the regs dump
buffer and populates it with current registers values.
Signed-off-by: Jean Pihet <jean.pihet@linaro.org>
Reviewed-by: Will Deacon <will.deacon@arm.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1400229672-16104-2-git-send-email-jean.pihet@linaro.org
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
cppcheck detected following warning:
[tools/perf/util/session.c:1628] -> [tools/perf/util/session.c:1632]:
(warning) Possible null pointer dereference: session - otherwise it
is redundant to check it against null.
In order to avoide null pointer, check the pointer before use.
Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Link: http://lkml.kernel.org/r/1400087618-13628-1-git-send-email-standby24x7@gmail.com
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
As we do not use .success in sched_wakeup event any more, then
we can not guarantee that the task when wakeup event happen is
out of run queue. So the message of nr_state_machine_bugs is
not correct.
Signed-off-by: Dongsheng Yang <yangds.fnst@cn.fujitsu.com>
Link: http://lkml.kernel.org/r/1399945101-21736-1-git-send-email-yangds.fnst@cn.fujitsu.com
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
trace_sched_wakeup(.success) is a dead argument and has been for ages,
the only reason its still there is because of brain dead software, which
apparently includes perf tools
There's a few more instances in pearly snake shit, but that's not
supported as far as I care anyhow, so let that bitrot.
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20140512181946.GG13467@laptop.programming.kicks-ass.net
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
. Propagate exit status of a command line workload for
record command (Namhyung Kim)
. Use tid for finding thread (Namhyung Kim)
. Clarify the output of perf sched map plus small sched
command fixies (Dongsheng Yang)
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJTcJFuAAoJEPZqUSBWB3s9hjkP/08VG633605baNo554GFCe3l
jUDg1gXOmCB82FIj8zUrdoF7VCD2Yz5KNaDSWLihJlUOCDkyOliufMufNYhtVz5G
XKjxzd8+wq+y4yu6fZBg4SDs4LrPCGrktIuwCgnX2Jns5Ps9CpRxUgMNIA1/Ee3J
y6ELc3zdYQzOVcuiqH0FemVckzGTSRCQ26JbMrdFFuBCaPu8iEAXQQis8InRTyyA
RDcQb54TfjnQQAFj7WX8QF2X9wXhv7xAA2547ICLBJ5wb+Tf26K8vEsxTLHGBKvM
0bGd1BkMp52WY2ZEVzdPz4kVnMUGwSrjhQJqCnnv+DErko3UaBSH+u6xVaNUymWY
UgJsywEVGYtanHOjrQ6u9ESi9tdsY4d6ykTtZHfydXRUJgwfW194cYwGZz+fGXbs
3tdAksax6TrytXzPhC2tIfUElvHeu7BpmefDIe8+fwyLsSJhuCyM+DE7yBe+65Jl
a4g8BFkVSCx8nLhKsjQZoieXO51GD9zQHkOGzsJGkYCFqSlnXPXULEuESNrAlZjK
CnTUfJJo+Oc2n5KyqzLDLwMq09dF8orVCZI0k9V56AZJwBYXlBA31dBdK220zFlM
SGQBGOZI5I+lV1cMNECvGnYgi+m0rCxpOsjsoJdHRKxwoAIGUzezWpf44Vl1QRZv
76aFWLxv9DWXhhwRylD0
=felX
-----END PGP SIGNATURE-----
Merge tag 'perf-core-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf into perf/core
Pull perf/core improvements and fixes from Jiri Olsa:
* Propagate exit status of a command line workload for
record command (Namhyung Kim)
* Use tid for finding thread (Namhyung Kim)
* Clarify the output of perf sched map plus small sched
command fixies (Dongsheng Yang)
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
I believe that passing pid (instead of tid) as the 3rd arg of the
machine__find*_thread() was to find a main thread so that it can
search proper map group for symbols. However with the map sharing
patch applied, it now can do it in any thread.
It fixes a bug when each thread has different name, it only reports a
main thread for samples in other threads.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: David Ahern <dsahern@gmail.com>
Acked-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/1399856202-26221-1-git-send-email-namhyung@kernel.org
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
The on_exit() function was only used in perf record but it's gone in
previous patch.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Stephane Eranian <eranian@google.com>
Cc: Bernhard Rosenkraenzer <Bernhard.Rosenkranzer@linaro.org>
Cc: Irina Tirdea <irina.tirdea@intel.com>
Link: http://lkml.kernel.org/r/1399855645-25815-2-git-send-email-namhyung@kernel.org
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Currently perf record doesn't propagate the exit status of a workload
given by the command line. But sometimes it'd useful if it's
propagated so that a monitoring script can handle errors
appropriately.
To do that, it moves most of logic out of the exit handlers and run
them directly in the __cmd_record(). The only thing needs to be done
in the handler is propagating terminating signal so that the shell can
terminate its loop properly when Ctrl-C was pressed. Also it cleaned
up the resource management code in record__exit().
With this change, perf record returns the child exit status in case of
normal termination and send signal to itself when terminated by signal.
Example run of Stephane's case:
$ perf record true && echo yes || echo no
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.013 MB perf.data (~589 samples) ]
yes
$ perf record false && echo yes || echo no
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.013 MB perf.data (~589 samples) ]
no
Jiri's case (error in parent):
$ perf record -m 10G true && echo yes || echo no
rounding mmap pages size to 17179869184 bytes (4194304 pages)
failed to mmap with 12 (Cannot allocate memory)
no
$ ulimit -n 6
$ perf record sleep 1 && echo yes || echo no
failed to create 'go' pipe: Too many open files
Couldn't run the workload!
no
And Peter's case (interrupted by signal):
$ while :; do perf record sleep 1; done
^C[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.014 MB perf.data (~593 samples) ]
Reported-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Stephane Eranian <eranian@google.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Ingo Molnar <mingo@kernel.org>
Link: http://lkml.kernel.org/r/1399855645-25815-1-git-send-email-namhyung@kernel.org
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
In output of perf sched map, any shortname of thread will be explained
at the first time when it appear.
Example:
*A0 228836.978985 secs A0 => perf:23032
*. A0 228836.979016 secs B0 => swapper:0
. *C0 228836.979099 secs C0 => migration/3:22
*A0 . C0 228836.979115 secs
A0 . *. 228836.979115 secs
But B0, which is explained as swapper:0 did not appear in the
left part of output. Instead, we use '.' as the shortname of
swapper:0. So the comment of "B0 => swapper:0" is not easy to
understand.
This patch clarify the output of perf sched map with not allocating
one letter-number shortname for swapper:0 and print ". => swapper:0"
as the explanation for swapper:0.
Example:
*A0 228836.978985 secs A0 => perf:23032
* . A0 228836.979016 secs . => swapper:0
. *B0 228836.979099 secs B0 => migration/3:22
*A0 . B0 228836.979115 secs
A0 . * . 228836.979115 secs
A0 *C0 . 228836.979225 secs C0 => ksoftirqd/2:18
A0 *D0 . 228836.979236 secs D0 => rcu_sched:7
Signed-off-by: Dongsheng <yangds.fnst@cn.fujitsu.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Link: http://lkml.kernel.org/r/1399354741-19522-1-git-send-email-yangds.fnst@cn.fujitsu.com
[ small style fixes to make checkpatch happy ]
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
We should record and process sched:sched_wakeup_new event in
perf sched tool, but currently, there is the process function
for it, without recording it in record subcommand.
This patch add -e sched:sched_wakeup_new to perf sched record.
Signed-off-by: Dongsheng <yangds.fnst@cn.fujitsu.com>
Link: http://lkml.kernel.org/r/710c6edd2162b2cea1711443f54de47c0210d9fd.1399273302.git.yangds.fnst@cn.fujitsu.com
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Instead of jumping through hoops to make sure to find (and exit) each
event, do it the simple straight fwd way.
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-tij931199thfkys8vbnokdpf@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Primarily make perf_event_release_kernel() into put_event(), this will
allow kernel space to create per-task inherited events, and is safer
in general.
Also, document the free_event() assumptions.
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-rk9pvr6e1d0559lxstltbztc@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Commit 38b435b16c ("perf: Fix tear-down of inherited group events")
states that we need to destroy groups for inherited events, but it
doesn't make any sense to not also destroy groups for normal events.
And while it usually makes no difference (the normal events won't
leak, and its very likely all the group events will die in quick
succession) it does make the code more consistent and closes a
potential hole for trouble.
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-426egt8zmsm12d2q8k2xz4tt@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Make sure all events in a group have the same inherit state. It was
possible for group leaders to have inherit set while sibling events
would not have inherit set.
In this case we'd still inherit the siblings, leading to some
non-fatal weirdness.
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-r32tt8yldvic3jlcghd3g35u@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Event 0x013c is not the same as fixed counter2, remove it from
Silvermont's event constraints.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1398755081-12471-1-git-send-email-zheng.z.yan@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
When removing a (sibling) event we do:
raw_spin_lock_irq(&ctx->lock);
perf_group_detach(event);
raw_spin_unlock_irq(&ctx->lock);
<hole>
perf_remove_from_context(event);
raw_spin_lock_irq(&ctx->lock);
...
raw_spin_unlock_irq(&ctx->lock);
Now, assuming the event is a sibling, it will be 'unreachable' for
things like ctx_sched_out() because that iterates the
groups->siblings, and we just unhooked the sibling.
So, if during <hole> we get ctx_sched_out(), it will miss the event
and not call event_sched_out() on it, leaving it programmed on the
PMU.
The subsequent perf_remove_from_context() call will find the ctx is
inactive and only call list_del_event() to remove the event from all
other lists.
Hereafter we can proceed to free the event; while still programmed!
Close this hole by moving perf_group_detach() inside the same
ctx->lock region(s) perf_remove_from_context() has.
The condition on inherited events only in __perf_event_exit_task() is
likely complete crap because non-inherited events are part of groups
too and we're tearing down just the same. But leave that for another
patch.
Most-likely-Fixes: e03a9a55b4 ("perf: Change close() semantics for group events")
Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Tested-by: Vince Weaver <vincent.weaver@maine.edu>
Much-staring-at-traces-by: Vince Weaver <vincent.weaver@maine.edu>
Much-staring-at-traces-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20140505093124.GN17778@laptop.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Merge misc fixes from Andrew Morton:
"13 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
agp: info leak in agpioc_info_wrap()
fs/affs/super.c: bugfix / double free
fanotify: fix -EOVERFLOW with large files on 64-bit
slub: use sysfs'es release mechanism for kmem_cache
revert "mm: vmscan: do not swap anon pages just because free+file is low"
autofs: fix lockref lookup
mm: filemap: update find_get_pages_tag() to deal with shadow entries
mm/compaction: make isolate_freepages start at pageblock boundary
MAINTAINERS: zswap/zbud: change maintainer email address
mm/page-writeback.c: fix divide by zero in pos_ratio_polynom
hugetlb: ensure hugepage access is denied if hugepages are not supported
slub: fix memcg_propagate_slab_attrs
drivers/rtc/rtc-pcf8523.c: fix month definition