Commit Graph

505704 Commits

Author SHA1 Message Date
Yunlong Song 5ef803ee02 perf list: Extend raw-dump to certain kind of events
Extend 'perf list --raw-dump' to 'perf list --raw-dump [hw|sw|cache
|tracepoint|pmu|event_glob]' in order to show the raw-dump of a certain
kind of events rather than all of the events.

Example:

Before this patch:

 $ perf list --raw-dump hw
 branch-instructions branch-misses bus-cycles cache-misses
 cache-references cpu-cycles instructions stalled-cycles-backend
 stalled-cycles-frontend
 alignment-faults context-switches cpu-clock cpu-migrations
 emulation-faults major-faults minor-faults page-faults task-clock
 ...
 ...
 writeback:writeback_thread_start writeback:writeback_thread_stop
 writeback:writeback_wait_iff_congested
 writeback:writeback_wake_background writeback:writeback_wake_thread

As shown above, all of the events are printed.

After this patch:

 $ perf list --raw-dump hw
 branch-instructions branch-misses bus-cycles cache-misses
 cache-references cpu-cycles instructions stalled-cycles-backend
 stalled-cycles-frontend

As shown above, only the hw events are printed.

Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1425032491-20224-5-git-send-email-yunlong.song@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-02-27 15:52:24 -03:00
Yunlong Song 705750f2d6 perf list: Clean up the printing functions of hardware/software events
Do not need print_events_type or __print_events_type for listing hw/sw
events, let print_symbol_events do its job instead. Moreover,
print_symbol_events can also handle event_glob and name_only.

Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1425032491-20224-4-git-send-email-yunlong.song@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-02-27 15:52:18 -03:00
Yunlong Song 3ef1e65c82 perf tools: Remove the '--(null)' long_name for --list-opts
If the long_name of a 'struct option' is defined as NULL, --list-opts
will incorrectly print '--(null)' in its output. As a result, '--(null)'
will finally appear in the case of bash completion, e.g. 'perf record
--'.

Example:

Before this patch:

 $ perf record --list-opts

 --event --filter --pid --tid --realtime --no-buffering --raw-samples
 --all-cpus --cpu --count --output --no-inherit --freq --mmap-pages
 --group --(null) --call-graph --verbose --quiet --stat --data
 --timestamp --period --no-samples --no-buildid-cache --no-buildid
 --cgroup --delay --uid --branch-any --branch-filter --weight
 --transaction --per-thread --intr-regs

After this patch:

 $ perf record --list-opts

 --event --filter --pid --tid --realtime --no-buffering --raw-samples
 --all-cpus --cpu --count --output --no-inherit --freq --mmap-pages
 --group --call-graph --verbose --quiet --stat --data --timestamp
 --period --no-samples --no-buildid-cache --no-buildid --cgroup --delay
 --uid --branch-any --branch-filter --weight --transaction --per-thread
 --intr-regs

Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1425032491-20224-7-git-send-email-yunlong.song@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-02-27 15:52:14 -03:00
Yunlong Song ed45752061 perf list: Avoid confusion of perf output and the next command prompt
Distinguish the output of 'perf list --list-opts' or 'perf --list-cmds'
with the next command prompt, which also happens in other cases (e.g.
record, report ...).

Example:

Before this patch:

 $perf list --list-opts
 --raw-dump $          <-- the output and the next command prompt are at
                           the same line

After this patch:

 $perf list --list-opts
 --raw-dump
 $                     <-- the new line

Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1425032491-20224-6-git-send-email-yunlong.song@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-02-27 15:52:09 -03:00
Yunlong Song 161149513b perf list: Allow listing events with 'tracepoint' prefix
If somebody happens to name an event with the beginning of 'tracepoint'
(e.g. tracepoint_foo), then it will never be showed with perf list
event_glob, thus we parse the argument 'tracepoint' more carefully for
accuracy.

Example:

Before this patch:

 $ perf list tracepoint_foo:*

   jbd2:jbd2_start_commit                             [Tracepoint event]
   jbd2:jbd2_commit_locking                           [Tracepoint event]
   jbd2:jbd2_run_stats                                [Tracepoint event]
   block:block_rq_issue                               [Tracepoint event]
   block:block_bio_complete                           [Tracepoint event]
   block:block_bio_backmerge                          [Tracepoint event]
   block:block_getrq                                  [Tracepoint event]
   ...                                                ...

As shown above, all of the tracepoint events are printed. In fact, the
command's real intention is to print the events of tracepoint_foo.

After this patch:

 $ perf list tracepoint_foo:*

   tracepoint_foo:tp_foo_enter                        [Tracepoint event]
   tracepoint_foo:tp_foo_exit                         [Tracepoint event]

As shown above, only the events of tracepoint_foo are printed.

Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1425032491-20224-3-git-send-email-yunlong.song@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-02-27 15:51:51 -03:00
Yunlong Song ab0e48002d perf list: Sort the output of 'perf list' to view more clearly
Sort the output according to ASCII character list (using strcmp), which
supports both number sequence and alphabet sequence.

Example:

Before this patch:

 $ perf list

 List of pre-defined events (to be used in -e):
   cpu-cycles OR cycles                               [Hardware event]
   instructions                                       [Hardware event]
   cache-references                                   [Hardware event]
   cache-misses                                       [Hardware event]
   branch-instructions OR branches                    [Hardware event]
   branch-misses                                      [Hardware event]
   bus-cycles                                         [Hardware event]
   ...                                                ...

   jbd2:jbd2_start_commit                             [Tracepoint event]
   jbd2:jbd2_commit_locking                           [Tracepoint event]
   jbd2:jbd2_run_stats                                [Tracepoint event]
   block:block_rq_issue                               [Tracepoint event]
   block:block_bio_complete                           [Tracepoint event]
   block:block_bio_backmerge                          [Tracepoint event]
   block:block_getrq                                  [Tracepoint event]
   ...                                                ...

After this patch:

 $ perf list

 List of pre-defined events (to be used in -e):
   branch-instructions OR branches                    [Hardware event]
   branch-misses                                      [Hardware event]
   bus-cycles                                         [Hardware event]
   cache-misses                                       [Hardware event]
   cache-references                                   [Hardware event]
   cpu-cycles OR cycles                               [Hardware event]
   instructions                                       [Hardware event]
   ...                                                ...

   block:block_bio_backmerge                          [Tracepoint event]
   block:block_bio_complete                           [Tracepoint event]
   block:block_getrq                                  [Tracepoint event]
   block:block_rq_issue                               [Tracepoint event]
   jbd2:jbd2_commit_locking                           [Tracepoint event]
   jbd2:jbd2_run_stats                                [Tracepoint event]
   jbd2:jbd2_start_commit                             [Tracepoint event]
   ...                                                ...

Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1425032491-20224-2-git-send-email-yunlong.song@huawei.com
[ Don't forget closedir({sys,evt}_dir) when handling errors ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-02-27 15:51:44 -03:00
Yunlong Song 1f924c29b5 perf data: Fix sentinel setting for data_cmds array
The recent new patch "perf tools: Add new 'perf data' command" (commit
2245bf14 in acme's git repo perf/core) has caused a building error when
compiling the source code of perf:

 cc1: warnings being treated as errors
 builtin-data.c:89: error: missing initializer
 builtin-data.c:89: error: (near initialization for ‘data_cmds[1].summary’)
 make[2]: *** [builtin-data.o] Error 1
 make[2]: *** Waiting for unfinished jobs....
   LD       bench/perf-in.o
   LD       tests/perf-in.o
 make[1]: *** [perf-in.o] Error 2
 make: *** [all] Error 2

This patch fixes the building error above.

Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1425038026-27604-1-git-send-email-yunlong.song@huawei.com
[ .name == NULL ends the loop, use it instead of seting all fields to NULL ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-02-27 10:43:18 -03:00
He Kuang f56847c2e9 perf probe: Fix a precedence bug
The minus operator has higher precedence than ?: Add parentheses around
?: fix this.

Before this patch:

  $ echo 'p:myprobe do_sys_open' > /sys/kernel/debug/tracing/kprobe_events
  $ perf probe -l -k ../vmlinux
    kprobes:myprobe      (on do_sys_open)

After this patch:

  $ echo 'p:myprobe do_sys_open' > /sys/kernel/debug/tracing/kprobe_events
  $ perf probe -l -k ../vmlinux
    kprobes:myprobe      (on do_sys_open@linux.git/fs/open.c)

Signed-off-by: He Kuang <hekuang@huawei.com>
Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1425034373-14511-1-git-send-email-hekuang@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-02-27 10:31:09 -03:00
Kan Liang 94ba462d69 perf diff: Support for different binaries
Currently, the perf diff only works with same binaries. That's because
it compares the symbol start address. It doesn't work if the perf.data
comes from different binaries. This patch matches the symbol names.

Actually, perf diff once intended to compare the symbol names.  The
commit as below can look for a pair by name.

604c5c9297 (perf diff: Change the default sort order to "dso,symbol")
However, at that time, perf diff used a global list of dsos. That means
the binaries which has same name can only be loaded once. That's a
problem for comparing different binaries.

For example, we have an old binary and an updated binary. They very
likely have same name and most of the functions, so only dsos from old
binary will be loaded. When processing the data from updated binary,
perf still use the symbol information from old binary. That's wrong.

Then the commit as below used IP to replace symbol name.
9c443dfdd3 ("perf diff: Fix support for all --sort combinations")
>From that time, perf diff starts to compare the symbol address.

The global dsos is discarded from a patch in 2010.
a1645ce12a ("perf: 'perf kvm' tool for monitoring guest performance
from host")
However, at that time, perf diff already compared by address. So perf
diff cannot work for different binaries as well.

This patch actually rolls back the perf diff to original design. The
document is also changed, so everybody knows the original design is to
compare the symbol names.

Here are some examples:

The only difference between example_v1.c and example_v2.c is the
location of f2 and f3. There is no change in behavior, but the previous
perf diff display the wrong differential profile.

example_v1.c
noinline void f3(void)
{
        volatile int i;
        for (i = 0; i < 10000;) {

                if(i%2)
                        i++;
                else
                        i++;
        }
}

noinline void f2(void)
{
        volatile int a = 100, b, c;
        for (b = 0; b < 10000; b++)
                c = a * b;

}

noinline void f1(void)
{
                f2();
                f3();
}

int main()
{
        int i;
        for (i = 0; i < 100000; i++)
                f1();
}

example_v2.c
noinline void f2(void)
{
        volatile int a = 100, b, c;
        for (b = 0; b < 10000; b++)
                c = a * b;
}

noinline void f3(void)
{
        volatile int i;
        for (i = 0; i < 10000;) {
                if(i%2)
                        i++;
                else
                        i++;
        }
}

noinline void f1(void)
{
                f2();
                f3();
}

int main()
{
        int i;
        for (i = 0; i < 100000; i++)
                f1();
}

[lk@localhost perf_diff]$ gcc example_v1.c -o example
[lk@localhost perf_diff]$ perf record -o example_v1.data ./example
[ perf record: Woken up 4 times to write data ]
[ perf record: Captured and wrote 0.813 MB example_v1.data (~35522 samples) ]

[lk@localhost perf_diff]$ gcc example_v2.c -o example
[lk@localhost perf_diff]$ perf record -o example_v2.data ./example
[ perf record: Woken up 4 times to write data ]
[ perf record: Captured and wrote 0.824 MB example_v2.data (~36015 samples) ]

Old perf diff result:

[lk@localhost perf_diff]$ perf diff example_v1.data example_v2.data
 Event 'cycles'
 Baseline    Delta  Shared Object     Symbol
 ........  .......  ................  ...............................

                     [kernel.vmlinux]  [k] __perf_event_task_sched_out
     0.00%           [kernel.vmlinux]  [k] apic_timer_interrupt
                     [kernel.vmlinux]  [k] idle_cpu
                     [kernel.vmlinux]  [k] intel_pstate_timer_func
                     [kernel.vmlinux]  [k] native_read_msr_safe
     0.00%           [kernel.vmlinux]  [k] native_read_tsc
     0.00%           [kernel.vmlinux]  [k] native_write_msr_safe
                     [kernel.vmlinux]  [k] ntp_tick_length
     0.00%           [kernel.vmlinux]  [k] rb_erase
     0.00%           [kernel.vmlinux]  [k] tick_sched_timer
     0.00%           [kernel.vmlinux]  [k] unmap_single_vma
     0.00%           [kernel.vmlinux]  [k] update_wall_time
     0.00%           example           [.] f1
    46.24%           example           [.] f2
    53.71%   -7.55%  example           [.] f3
            +53.81%  example           [.] f3
     0.02%           example           [.] main

New perf diff result:

[lk@localhost perf_diff]$ perf diff example_v1.data example_v2.data
                     [kernel.vmlinux]  [k] __perf_event_task_sched_out
     0.00%           [kernel.vmlinux]  [k] apic_timer_interrupt
                     [kernel.vmlinux]  [k] idle_cpu
                     [kernel.vmlinux]  [k] intel_pstate_timer_func
                     [kernel.vmlinux]  [k] native_read_msr_safe
     0.00%           [kernel.vmlinux]  [k] native_read_tsc
     0.00%           [kernel.vmlinux]  [k] native_write_msr_safe
                     [kernel.vmlinux]  [k] ntp_tick_length
     0.00%           [kernel.vmlinux]  [k] rb_erase
     0.00%           [kernel.vmlinux]  [k] tick_sched_timer
     0.00%           [kernel.vmlinux]  [k] unmap_single_vma
     0.00%           [kernel.vmlinux]  [k] update_wall_time
     0.00%           example           [.] f1
    46.24%   -0.08%  example           [.] f2
    53.71%   +0.11%  example           [.] f3
     0.02%           example           [.] main

Signed-off-by: Kan Liang <kan.liang@intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Link: http://lkml.kernel.org/r/1423460384-11645-1-git-send-email-kan.liang@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-02-27 10:08:38 -03:00
Masami Hiramatsu a50d11a10c perf buildid-cache: Add new buildid cache if update target is not cached
Add new buildid cache if the update target file is not cached.

This can happen when an old binary is replaced by new one after caching
the old one. In this case, user sees his operation just failed.

But it does not look straight, since user just pass the binary "path",
not "build-id".

  ----
  # ./perf buildid-cache --add ./perf
  (update ./perf to new binary)
  # ./perf buildid-cache --update ./perf
  ./perf wasn't in the cache
  #
  ----

This patch adds given new binary to cache if the new binary is
not cached. So we'll not see the above error.

  ----
  # ./perf buildid-cache --add ./perf
  (update ./perf to new binary)
  # ./perf buildid-cache --update ./perf
  #
  ----

Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Hemant Kumar <hemant@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20150226065440.23912.1494.stgit@localhost.localdomain
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-02-27 10:08:37 -03:00
Arnaldo Carvalho de Melo 38ae502b1d perf probe: Handle strdup() failure
We could end up returning 0 (Ok) with a NULL raw_path. Fix it.

Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Naohiro Aota <naota@elisp.net>
Link: http://lkml.kernel.org/n/tip-l0kcbcg5f4nnzqt01cv42vec@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-02-27 10:08:29 -03:00
Masami Hiramatsu eb47cb2eb2 perf probe: Fix get_real_path to free allocated memory in error path
Fix get_real_path to free allocated memory when comp_dir is used for
complementing path and getting an error.

Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Naohiro Aota <naota@elisp.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20150226082504.28125.74506.stgit@localhost.localdomain
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-02-26 11:59:05 -03:00
Masami Hiramatsu 9aaf5a5f47 perf probe: Check kprobes blacklist when adding new events
Recent linux kernel provides a blacklist of the functions which can not
be probed. perf probe can now check this blacklist before setting new
events and indicate better error message for users.

Without this patch,
  ----
  # perf probe --add vmalloc_fault
  Added new event:
  Failed to write event: Invalid argument
    Error: Failed to add events.
  ----
With this patch
  ----
  # perf probe --add vmalloc_fault
  Added new event:
  Warning: Skipped probing on blacklisted function: vmalloc_fault
  ----

Reported-by: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20150219143113.14434.5387.stgit@localhost.localdomain
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-02-26 11:59:05 -03:00
David Ahern 55d43bcafe perf trace: Fix SIGBUS failures due to misaligned accesses
On Sparc64 perf-trace is failing in many spots due to extended load
instructions being used on misaligned accesses.

(gdb) run trace ls
Starting program: /tmp/perf/perf trace ls
[Thread debugging using libthread_db enabled]
Detaching after fork from child process 169460.

<ls output removed>

Program received signal SIGBUS, Bus error.
0x000000000014f4dc in tp_field__u64 (field=0x4cc700, sample=0x7feffffa098) at builtin-trace.c:61
warning: Source file is more recent than executable.
61      TP_UINT_FIELD(64);

(gdb) bt
 0  0x000000000014f4dc in tp_field__u64 (field=0x4cc700, sample=0x7feffffa098) at builtin-trace.c:61
 1  0x0000000000156ad4 in trace__sys_exit (trace=0x7feffffc268, evsel=0x4cc580, event=0xfffffc0104912000,
    sample=0x7feffffa098) at builtin-trace.c:1701
 2  0x0000000000158c14 in trace__run (trace=0x7feffffc268, argc=1, argv=0x7fefffff360) at builtin-trace.c:2160
 3  0x000000000015b78c in cmd_trace (argc=1, argv=0x7fefffff360, prefix=0x0) at builtin-trace.c:2609
 4  0x0000000000107d94 in run_builtin (p=0x4549c8, argc=2, argv=0x7fefffff360) at perf.c:341
 5  0x0000000000108140 in handle_internal_command (argc=2, argv=0x7fefffff360) at perf.c:400
 6  0x0000000000108308 in run_argv (argcp=0x7feffffef2c, argv=0x7feffffef20) at perf.c:444
 7  0x0000000000108728 in main (argc=2, argv=0x7fefffff360) at perf.c:559

(gdb) p *sample
$1 = {ip = 4391276, pid = 169472, tid = 169472, time = 6303014583281250, addr = 0, id = 72082,
  stream_id = 18446744073709551615, period = 1, weight = 0, transaction = 0, cpu = 73, raw_size = 36,
  data_src = 84410401, flags = 0, insn_len = 0, raw_data = 0xfffffc010491203c, callchain = 0x0,
  branch_stack = 0x0, user_regs = {abi = 0, mask = 0, regs = 0x0, cache_regs = 0x7feffffa098, cache_mask = 0},
  intr_regs = {abi = 0, mask = 0, regs = 0x0, cache_regs = 0x7feffffa098, cache_mask = 0}, user_stack = {
    offset = 0, size = 0, data = 0x0}, read = {time_enabled = 0, time_running = 0, {group = {nr = 0,
        values = 0x0}, one = {value = 0, id = 0}}}}
(gdb) p *field
$2 = {offset = 16, {integer = 0x14f4a8 <tp_field__u64>, pointer = 0x14f4a8 <tp_field__u64>}}

sample->raw_data is guaranteed to not be 8-byte aligned because it is preceded
by the size as a u3. So accessing raw data with an extended load instruction causes
the SIGBUS. Resolve by using memcpy to a temporary variable of appropriate size.

Signed-off-by: David Ahern <david.ahern@oracle.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1424376022-140608-1-git-send-email-david.ahern@oracle.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-02-26 11:59:04 -03:00
Ingo Molnar 0afb170401 perf/core improvements and fixes:
New user selectable features:
 
 - Support recording running/enabled time in 'perf record' (Andi Kleen)
 
 - New tool: 'perf data' for converting perf.data to other formats,
   initially for the CTF (Common Trace Format) from LTTng (Jiri Olsa, Sebastian Siewior)
 
 User visible:
 
 - Only insert blank duration bracket when tracing syscalls in 'perf trace' (Arnaldo Carvalho de Melo)
 
 - Filter out the trace pid when no threads are specified in 'perf trace' (Arnaldo Carvalho de Melo)
 
 - Add 'perf trace' man page entry for --event (Arnaldo Carvalho de Melo)
 
 - Dump stack on segfaults in 'perf trace' (Arnaldo Carvalho de Melo)
 
 Infrastructure:
 
 - Introduce set_filter_pid and set_filter_pids methods in the evlist class (Arnaldo Carvalho de Melo)
 
 - Some perf_session untanglement patches, removing the need to pass a
   perf_session instance for things that are related to evlists, so that
   tools that don't deal with perf.data files like trace in live mode can
   make use of the ordered_events class (Arnaldo Carvalho de Melo)
 
 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJU7j2lAAoJEBpxZoYYoA71cxcH/2dqKGTtqucEgKwpOdo8uf5v
 uWFtLzHNxJXYYm1kpwxOj3aMIoEfyfTngLgumgzMVdeQ8+rZGWgF7FMz2Gdn0HJi
 5CWrT/keZz8iHxA6BI8NBAa4uO8ct4MHjHSsYuO8Fr4zQmQy9vprfl8D44BQFQr9
 6C35dTjP8chIUFLxL2Y5Wr1SATeZf+zJJVoGUES1xgk7iMCfgivg6vSQcGmgBbe6
 iJBqDpO+10j55X578d5WRV0UMKrHk25RdGBRaB9eA0KowJfKYva744coF7AHOoQR
 snc5Ti93w+ssTL6PSp+FFr81kJVOqQk2GaweMcnNpOP5A8SSS+47yicPUZ25Nng=
 =wb9I
 -----END PGP SIGNATURE-----

Merge tag 'perf-core-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core

Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:

New user selectable features:

  - Support recording running/enabled time in 'perf record' (Andi Kleen)

  - New tool: 'perf data' for converting perf.data to other formats,
    initially for the CTF (Common Trace Format) from LTTng (Jiri Olsa, Sebastian Siewior)

User visible changes:

  - Only insert blank duration bracket when tracing syscalls in 'perf trace' (Arnaldo Carvalho de Melo)

  - Filter out the trace pid when no threads are specified in 'perf trace' (Arnaldo Carvalho de Melo)

  - Add 'perf trace' man page entry for --event (Arnaldo Carvalho de Melo)

  - Dump stack on segfaults in 'perf trace' (Arnaldo Carvalho de Melo)

Infrastructure changes:

  - Introduce set_filter_pid and set_filter_pids methods in the evlist class (Arnaldo Carvalho de Melo)

  - Some perf_session untanglement patches, removing the need to pass a
    perf_session instance for things that are related to evlists, so that
    tools that don't deal with perf.data files like trace in live mode can
    make use of the ordered_events class (Arnaldo Carvalho de Melo)

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-26 12:25:20 +01:00
Ingo Molnar e9e4e44309 Linux 34.0-rc1
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJU6pFJAAoJEHm+PkMAQRiG2OwH/24nDK+l9zkaRs0xJsVh+qiW
 8A2N1od0ickz43iMk48jfeWGkFOkd4izyvan/daJshJOE1Y5lCdSs7jq/OXVOv9L
 G0+KQUoC5NL0hqYKn1XJPFluNQ1yqMvrDwQt99grDGzruNGBbwHuBhAQmgzpj1nU
 do8KrGjr7ft1Rzm4mOAdET/ExWiF+mRSJSxxOv598HbsIRdM5wgn0hHjPlqDxmLN
 KH4r3YYEm0cHyjf4Krse0+YdhqdamRGJlmYxJgEsYNwCoMwkmHlLTc71diseUhrg
 r/VYIYQvpAA6Yvgw8rJ0N5gk/sJJig+WyyPhfQuc2bD5sbL9eO7mPnz2UP7z7ss=
 =vXB6
 -----END PGP SIGNATURE-----

Merge tag 'v4.0-rc1' into perf/core, to refresh the tree

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-26 12:24:50 +01:00
Sebastian Andrzej Siewior 54cf776a9c perf data: Add a 'perf' prefix to the generic fields
Some of the tracers bring their own id or pid fields and we can end up
having two of them. This patch adds a "perf_" prefix to the 'generic'
fields so we avoid a clash of the member names.

The change is visible in the babeltrace output:

Before:
  $ babeltrace ./ctf-data/
  [03:19:13.962131936] (+0.000001935) cycles: { }, { ip = 0xFFFFFFFF8105443A, tid = 20714, pid = 20714, period = 8 }
  [03:19:13.962133732] (+0.000001796) cycles: { }, { ip = 0xFFFFFFFF8105443A, tid = 20714, pid = 20714, period = 114 }
  ...

Now:
  $ babeltrace ./ctf-data/
  [03:19:13.962131936] (+0.000001935) cycles: { }, { perf_ip = 0xFFFFFFFF8105443A, perf_tid = 20714, perf_pid = 20714, perf_period = 8 }
  [03:19:13.962133732] (+0.000001796) cycles: { }, { perf_ip = 0xFFFFFFFF8105443A, perf_tid = 20714, perf_pid = 20714, perf_period = 114 }
  ...

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Reviewed-by: David Ahern <dsahern@gmail.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jeremie Galarneau <jgalar@efficios.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tom Zanussi <tzanussi@gmail.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1424470628-5969-5-git-send-email-jolsa@kernel.org
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-02-25 16:14:33 -03:00
Jiri Olsa edbe9817ae perf data: Add perf data to CTF conversion support
Adding 'perf data convert' to convert perf data file into different
format. This patch adds support for CTF format conversion.

To convert perf.data into CTF run:
  $ perf data convert --to-ctf=./ctf-data/
  [ perf data convert: Converted 'perf.data' into CTF data './ctf-data/' ]
  [ perf data convert: Converted and wrote 11.268 MB (100230 samples) ]

The command will create CTF metadata out of perf.data file (or one
specified via -i option) and then convert all sample events into single
CTF stream.

Each sample_type bit is translated into separated CTF event field apart
from following exceptions:

  PERF_SAMPLE_RAW          - added in next patch
  PERF_SAMPLE_READ         - TODO
  PERF_SAMPLE_CALLCHAIN    - TODO
  PERF_SAMPLE_BRANCH_STACK - TODO
  PERF_SAMPLE_REGS_USER    - TODO
  PERF_SAMPLE_STACK_USER   - TODO

  $ perf --debug=data-convert=2 data convert ...

The converted CTF data could be analyzed by CTF tools, like babletrace
or tracecompass [1].

  $ babeltrace ./ctf-data/
  [03:19:13.962125533] (+?.?????????) cycles: { }, { ip = 0xFFFFFFFF8105443A, tid = 20714, pid = 20714, period = 1 }
  [03:19:13.962130001] (+0.000004468) cycles: { }, { ip = 0xFFFFFFFF8105443A, tid = 20714, pid = 20714, period = 1 }
  [03:19:13.962131936] (+0.000001935) cycles: { }, { ip = 0xFFFFFFFF8105443A, tid = 20714, pid = 20714, period = 8 }
  [03:19:13.962133732] (+0.000001796) cycles: { }, { ip = 0xFFFFFFFF8105443A, tid = 20714, pid = 20714, period = 114 }
  [03:19:13.962135557] (+0.000001825) cycles: { }, { ip = 0xFFFFFFFF8105443A, tid = 20714, pid = 20714, period = 2087 }
  [03:19:13.962137627] (+0.000002070) cycles: { }, { ip = 0xFFFFFFFF81361938, tid = 20714, pid = 20714, period = 37582 }
  [03:19:13.962161091] (+0.000023464) cycles: { }, { ip = 0xFFFFFFFF8124218F, tid = 20714, pid = 20714, period = 600246 }
  [03:19:13.962517569] (+0.000356478) cycles: { }, { ip = 0xFFFFFFFF811A75DB, tid = 20714, pid = 20714, period = 1325731 }
  [03:19:13.969518008] (+0.007000439) cycles: { }, { ip = 0x34080917B2, tid = 20714, pid = 20714, period = 1144298 }

The following members to the ctf-environment were decided to be added to
distinguish and specify perf CTF data:

  - domain

    It says "kernel" because it contains a kernel trace (not to be
    confused with a user space like lttng-ust does)

  - tracer_name

    It says perf. This can be used to distinguish between lttng and perf
    CTF based trace.

  - version

    The kernel version from stream. In addition to release, this is what
    it looks like on a Debian kernel:

      release = "3.14-1-amd64";
      version = "3.14.0";

[1] http://projects.eclipse.org/projects/tools.tracecompass

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Reviewed-by: David Ahern <dsahern@gmail.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jeremie Galarneau <jgalar@efficios.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Tom Zanussi <tzanussi@gmail.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1424470628-5969-4-git-send-email-jolsa@kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-02-25 16:13:12 -03:00
Jiri Olsa 2245bf1410 perf tools: Add new 'perf data' command
Adding new 'perf data' command to provide operations over data files.

The 'perf data convert' sub command is coming in following patch, but
there's possibility for other useful commands like 'perf data ls' (to
display perf data file in directory in ls style).

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Reviewed-by: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jeremie Galarneau <jgalar@efficios.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Tom Zanussi <tzanussi@gmail.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1424470628-5969-3-git-send-email-jolsa@kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-02-25 12:42:25 -03:00
Jiri Olsa 53d0a57343 perf tools: Add feature check for libbabeltrace
Adding feature check for babeltrace library [1], which will be used for
perf data file CTF [2] conversion in following patches.

The babeltrace library is now automatically detected as standard
feature. It's possible to specify LIBBABELTRACE_DIR make variable to
specify location of installed libbabeltrace, like:

  $ make LIBBABELTRACE_DIR=/opt/libbabeltrace/
    BUILD:   Doing 'make -j4' parallel build

  Auto-detecting system features:
  ...                         dwarf: [ on  ]
  ...                         glibc: [ on  ]
  ...                          gtk2: [ on  ]
  ...                      libaudit: [ on  ]
  ...                        libbfd: [ on  ]
  ...                        libelf: [ on  ]
  ...                       libnuma: [ on  ]
  ...                       libperl: [ on  ]
  ...                     libpython: [ on  ]
  ...                      libslang: [ on  ]
  ...                     libunwind: [ on  ]
  ...                 libbabeltrace: [ on  ]
  ...            libdw-dwarf-unwind: [ on  ]
  ...                          zlib: [ on  ]
  ...     DWARF post unwind library: libunwind

NOTE The installation of the [1] to to used by above make:
     $ git clone git://git.efficios.com/babeltrace.git
     $ cd babeltrace
     $ vim README
     $ ./bootstrap
     $ ./configure --prefix=/opt/libbabeltrace
     $ make prefix=/opt/libbabeltrace
     $ sudo make install prefix=/opt/libbabeltrace

Please make sure that the /opt/libbabeltrace/lib directory is in your
LD_LIBRARY_PATH:

 $ export LD_LIBRARY_PATH=/opt/libbabeltrace/lib

[1] babeltrace - http://www.efficios.com/babeltrace
[2] Common Trace Format - http://www.efficios.com/ctf

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Reviewed-by: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jeremie Galarneau <jgalar@efficios.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Tom Zanussi <tzanussi@gmail.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1424470628-5969-2-git-send-email-jolsa@kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
[ Added missing babeltrace build instructions ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-02-25 12:42:24 -03:00
Andi Kleen 85c273d2b6 perf record: Support recording running/enabled time
Add an option to perf record to record running/enabled time for read
events, similar to what stat does.

This is useful to understand multiplexing problems.

Right now the report support is not great, but at least report -D
already supports it.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/1424819620-16043-1-git-send-email-andi@firstfloor.org
[ Fixed the Documentation entry to match the OPT_BOOLEAN one ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-02-25 12:42:23 -03:00
Arnaldo Carvalho de Melo 506740654d perf tools: Print the thread's tid on PERF_RECORD_COMM events when -D is asked
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: David Ahern <dsahern@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-fmto8ft6jrtwz09dxn5d4z8w@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-02-24 17:34:00 -03:00
Arnaldo Carvalho de Melo 4d08cb80ef perf trace: Dump stack on segfaults
[root@ssdandy ~]# perf trace --filter-pids 16348
     0.000 ( 0.000 ms): tuned/1027  ... [continued]: select()) = 0 Timeout
   793.770 ( 0.000 ms): lsmd/895  ... [continued]: select()) = 0 Timeout
   793.775 (793.724 ms): tuned/1027 select(tvp: 0x7f7655556e50) ...
  perf: Segmentation fault
  Obtained 15 stack frames.
  perf(dump_stack+0x2e) [0x4ed330]
  perf(sighandler_dump_stack+0x2e) [0x4ed40f]
  /lib64/libc.so.6(+0x35640) [0x7fa2d5b69640]
  perf() [0x4c2d35]
  perf(machine__findnew_thread+0x39) [0x4c2ed6]
  perf() [0x454a4d]
  perf() [0x455f87]
  perf() [0x456556]
  perf(cmd_trace+0xa7e) [0x4580af]
  perf() [0x4867bd]
  perf() [0x486a1c]
  perf() [0x486b68]
  perf(main+0x23b) [0x486ec9]
  /lib64/libc.so.6(__libc_start_main+0xf5) [0x7fa2d5b55af5]
  perf() [0x41bd91]
[  root@ssdandy ~]#

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: David Ahern <dsahern@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-v38cbxcnm2yf5qn9u4y4n9ab@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-02-24 15:37:28 -03:00
Arnaldo Carvalho de Melo 07c1a0dadf perf tools: Introduce dump_stack signal helper
To use in stdio based tools, like 'trace'.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: David Ahern <dsahern@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-79kjmerlw6d88csyx1afzwvn@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-02-24 15:34:23 -03:00
Arnaldo Carvalho de Melo 280836812f perf ordered_events: Stop using tool->ordered_events
To figure out if ordered_events are being used when doing a flush
operation, it is enough to check if there were in fact some events
queued, i.e. look at oe->nr_events.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: David Ahern <dsahern@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-1c5r404vy766kt5nflv88uag@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-02-23 11:39:38 -03:00
Linus Torvalds c517d838eb Linux 4.0-rc1
.. after extensive statistical analysis of my G+ polling, I've come to
the inescapable conclusion that internet polls are bad.

Big surprise.

But "Hurr durr I'ma sheep" trounced "I like online polls" by a 62-to-38%
margin, in a poll that people weren't even supposed to participate in.
Who can argue with solid numbers like that? 5,796 votes from people who
can't even follow the most basic directions?

In contrast, "v4.0" beat out "v3.20" by a slimmer margin of 56-to-44%,
but with a total of 29,110 votes right now.

Now, arguably, that vote spread is only about 3,200 votes, which is less
than the almost six thousand votes that the "please ignore" poll got, so
it could be considered noise.

But hey, I asked, so I'll honor the votes.
2015-02-22 18:21:14 -08:00
Linus Torvalds feaf222925 Ext4 bug fixes for 3.20. We also reserved code points for encryption
and read-only images (for which the implementation is mostly just the
 reserved code point for a read-only feature :-)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJU6lssAAoJENNvdpvBGATwpsEQAOcpCqj0gp/istbsoFpl5v5K
 +BU2aPvR5CPLtUQz9MqrVF5/6zwDbHGN+GIB6CEmh/qHIVQAhhS4XR+opSc7qqUr
 fAQ1AhL5Oh8Dyn9DRy5Io8oRv+wo5lRdD7aG7SPiizCMRQ34JwJ2sWIAwbP2Ea7W
 Xg51v3LWEu+UpqpgY3YWBoJKHj4hXwFvTVOCHs94239Y2zlcg2c4WwbKPzkvPcV/
 TvvZOOctty+l3FOB2bqFj3VnvywQmNv8/OixKjSprxlR7nuQlhKaLTWCtRjFbND4
 J/rk2ls5Bl79dnMvyVfV5ghpmGYBf5kkXCP716YsQkRCZUfNVrTOPJrNHZtYilAb
 opRo2UjAyTWxZBvyssnCorHJZUdxlYeIuSTpaG0zUbR0Y6p/7qd31F5k41GbBCFf
 B0lV3IaiVnXk23S2jFVHGhrzoKdFqu30tY7LMaO4xyGVMigOZJyBu8TZ7Utj9HmW
 /4GfjlvYqlfB7p+6yBkDv/87hjdmfMWIw48A7xWCiIeguQhB79gwTV7uAHVtgfng
 h5RF2EH/fx5klbAZx9vlaAh3pGFBHbh9fkeBmW9qNm7glz7aMUuxQaSo6X8HrCAJ
 LrECgDGbuiOHnMYuzZRERZiqwLB7JT82C1xopGzefsE/i0kN1eMjITkfggjQ5whu
 caLPn49tAb9U8P6TsPeE
 =PF+t
 -----END PGP SIGNATURE-----

Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 fixes from Ted Ts'o:
 "Ext4 bug fixes.

  We also reserved code points for encryption and read-only images (for
  which the implementation is mostly just the reserved code point for a
  read-only feature :-)"

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: fix indirect punch hole corruption
  ext4: ignore journal checksum on remount; don't fail
  ext4: remove duplicate remount check for JOURNAL_CHECKSUM change
  ext4: fix mmap data corruption in nodelalloc mode when blocksize < pagesize
  ext4: support read-only images
  ext4: change to use setup_timer() instead of init_timer()
  ext4: reserve codepoints used by the ext4 encryption feature
  jbd2: complain about descriptor block checksum errors
2015-02-22 18:05:13 -08:00
Linus Torvalds be5e6616dd Merge branch 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull more vfs updates from Al Viro:
 "Assorted stuff from this cycle.  The big ones here are multilayer
  overlayfs from Miklos and beginning of sorting ->d_inode accesses out
  from David"

* 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (51 commits)
  autofs4 copy_dev_ioctl(): keep the value of ->size we'd used for allocation
  procfs: fix race between symlink removals and traversals
  debugfs: leave freeing a symlink body until inode eviction
  Documentation/filesystems/Locking: ->get_sb() is long gone
  trylock_super(): replacement for grab_super_passive()
  fanotify: Fix up scripted S_ISDIR/S_ISREG/S_ISLNK conversions
  Cachefiles: Fix up scripted S_ISDIR/S_ISREG/S_ISLNK conversions
  VFS: (Scripted) Convert S_ISLNK/DIR/REG(dentry->d_inode) to d_is_*(dentry)
  SELinux: Use d_is_positive() rather than testing dentry->d_inode
  Smack: Use d_is_positive() rather than testing dentry->d_inode
  TOMOYO: Use d_is_dir() rather than d_inode and S_ISDIR()
  Apparmor: Use d_is_positive/negative() rather than testing dentry->d_inode
  Apparmor: mediated_filesystem() should use dentry->d_sb not inode->i_sb
  VFS: Split DCACHE_FILE_TYPE into regular and special types
  VFS: Add a fallthrough flag for marking virtual dentries
  VFS: Add a whiteout dentry type
  VFS: Introduce inode-getting helpers for layered/unioned fs environments
  Infiniband: Fix potential NULL d_inode dereference
  posix_acl: fix reference leaks in posix_acl_create
  autofs4: Wrong format for printing dentry
  ...
2015-02-22 17:42:14 -08:00
Arnaldo Carvalho de Melo 9fa8727aa4 perf session: Remove perf_session from dump_event
All it wants is session->evlist.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: David Ahern <dsahern@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-6w9663gka3jb1j1rfxxd5jcq@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-02-22 22:23:46 -03:00
Arnaldo Carvalho de Melo 313e53b08e perf session: Remove perf_session from some deliver event routines
Further untangling perf_session from plain event delivery routines.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: David Ahern <dsahern@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-cvz8e6pwyogs4w14582iis9w@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-02-22 22:23:40 -03:00
Arnaldo Carvalho de Melo ccda068f96 perf session: Remove perf_session from warn_errors signature
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: David Ahern <dsahern@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-pxxm1liohog3d6i826x8sud8@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-02-22 22:23:23 -03:00
Arnaldo Carvalho de Melo 75be989a7a perf evlist: Adopt events_stats from perf_session
For tools that don't deal with perf.data files, thus do not need to
use perf_session.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: David Ahern <dsahern@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-kglq67gvauq9tak02a4se00r@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-02-22 22:22:57 -03:00
Arnaldo Carvalho de Melo 54245fdc35 perf session: Remove wrappers to machines__find
Start to untangle session from delivering samples, as there are
tools that want to use ordered_events and don't use perf_session at all.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: David Ahern <dsahern@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-rn4pk3pjxd78sgzrkn19tktp@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-02-22 22:22:41 -03:00
Arnaldo Carvalho de Melo ddbb1b1310 perf trace: Separate routine that handles an event from the one that reads it
Because we need to use ordered_events in some cases, so we will need to
first have them in a queue, order that queue, and then process the
event.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: David Ahern <dsahern@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-cmkw9zgoh0z4r218957ftp1a@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-02-22 22:22:26 -03:00
Arnaldo Carvalho de Melo 77c92582a5 perf trace: Add man page entry for --event
Forgot to do it when adding the feature.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: David Ahern <dsahern@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-mx152b6x9cgknhw91vsyjlnd@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-02-22 22:22:07 -03:00
Arnaldo Carvalho de Melo f078c3852c perf trace: Introduce --filter-pids
When tracing in X we get event loops due to the tracing activity, i.e.
updates to a gnome-terminal that generate syscalls for X.org, etc.

To get a more useful view of what is happening, syscall wise, system
wide, we need to filter those, like in:

 # ps ax|egrep '981|2296|1519' | grep -v egrep
   981 tty1 Ss+ 5:40 /usr/bin/Xorg :0 -background none ...
  1519 ?    Sl  2:22 /usr/bin/gnome-shell
  2296 ?    Sl  4:16 /usr/libexec/gnome-terminal-server
 #

 # trace -e write --filter-pids 981,2296,1519
    0.385 ( 0.021 ms): goa-daemon/2061 write(fd: 1</dev/null>, buf: 0x7fbeb017b000, count: 136) = 136
    0.922 ( 0.014 ms): goa-daemon/2061 write(fd: 1</dev/null>, buf: 0x7fbeb017b000, count: 140) = 140
 5006.525 ( 0.029 ms): goa-daemon/2061 write(fd: 1</dev/null>, buf: 0x7fbeb017b000, count: 136) = 136
 5007.235 ( 0.023 ms): goa-daemon/2061 write(fd: 1</dev/null>, buf: 0x7fbeb017b000, count: 140) = 140
 5177.646 ( 0.018 ms): rtkit-daemon/782 write(fd: 5<anon_inode:[eventfd]>, buf: 0x7f7eea70be88, count: 8) = 8
 8314.497 ( 0.004 ms): gsd-locate-poi/2084 write(fd: 5<anon_inode:[eventfd]>, buf: 0x7fffe96af7b0, count: 8) = 8
 8314.518 ( 0.002 ms): gsd-locate-poi/2084 write(fd: 5<anon_inode:[eventfd]>, buf: 0x7fffe96af0e0, count: 8) = 8
 ^C#

When this option is used the tracer pid is also filtered.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: David Ahern <dsahern@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-f5qmiyy7c0uxdm21ncatpeek@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-02-22 22:21:52 -03:00
Arnaldo Carvalho de Melo be199ada4f perf evlist: Introduce set_filter_pids method
We need to filter multiple pids in trace, i.e. trace itself,
gnome-terminal, X.org, etc.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: David Ahern <dsahern@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-frtpkg7qapqwf7asa35wf8am@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-02-22 22:21:27 -03:00
Arnaldo Carvalho de Melo 241b057ce5 perf trace: Filter out the trace pid when no threads are specified
To avoid tracing the tracer.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: David Ahern <dsahern@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-shmwd1khzpaobr3i0j1ygapg@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-02-22 22:14:48 -03:00
Arnaldo Carvalho de Melo cfd70a26aa perf evlist: Introduce set_filter_pid method
To filter out events for a certain pid, for instance, when tracing
system wide, so that the tracer itself doesn't creates an event loop.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: David Ahern <dsahern@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-byoia9dzu4gmkdv87etnd9zf@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-02-22 22:14:25 -03:00
Arnaldo Carvalho de Melo 0808921a14 perf trace: Only insert blank duration bracket when tracing syscalls
When printing just events, i.e. '--no-sys --ev some:events' it makes no
sense to waste screen space.

Before:

 # trace --no-sys --ev probe:*
 84481.704 (         ): probe:vfs_getname:(ffffffff811ed023) pathname="/etc/services")
 84481.892 (         ): probe:vfs_getname:(ffffffff811ed023) pathname="/etc/services")
 84482.230 (         ): probe:vfs_getname:(ffffffff811ed023) pathname="/etc/resolv.conf")
 84482.481 (         ): probe:vfs_getname:(ffffffff811ed023) pathname="/etc/hosts")
 85097.725 (         ): probe:vfs_getname:(ffffffff811ed023) pathname="/root"
 #

After:

 # trace --no-sys --ev probe:*
 0.000 probe:vfs_getname:(ffffffff811ed023) pathname="/root")
 1.711 probe:vfs_getname:(ffffffff811ed023) pathname="/etc/localtime")
 2.103 probe:vfs_getname:(ffffffff811ed023) pathname="/etc/localtime")
^C#

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: David Ahern <dsahern@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-jhryxgnam8zecq0q0wsy6pyb@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-02-22 22:13:54 -03:00
Linus Torvalds 90c453ca22 Merge branch 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm
Pull ARM fix from Russell King:
 "Just one fix this time around.  __iommu_alloc_buffer() can cause a
  BUG() if dma_alloc_coherent() is called with either __GFP_DMA32 or
  __GFP_HIGHMEM set.  The patch from Alexandre addresses this"

* 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm:
  ARM: 8305/1: DMA: Fix kzalloc flags in __iommu_alloc_buffer()
2015-02-22 09:57:16 -08:00
Al Viro 0a280962dc autofs4 copy_dev_ioctl(): keep the value of ->size we'd used for allocation
X-Coverup: just ask spender
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-02-22 11:43:34 -05:00
Al Viro 7e0e953bb0 procfs: fix race between symlink removals and traversals
use_pde()/unuse_pde() in ->follow_link()/->put_link() resp.

Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-02-22 11:43:12 -05:00
Al Viro 0db59e5929 debugfs: leave freeing a symlink body until inode eviction
As it is, we have debugfs_remove() racing with symlink traversals.
Supply ->evict_inode() and do freeing there - inode will remain
pinned until we are done with the symlink body.

And rip the idiocy with checking if dentry is positive right after
we'd verified debugfs_positive(), which is a stronger check...

Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-02-22 11:38:43 -05:00
Al Viro dca111782c Documentation/filesystems/Locking: ->get_sb() is long gone
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-02-22 11:38:43 -05:00
Konstantin Khlebnikov eb6ef3df4f trylock_super(): replacement for grab_super_passive()
I've noticed significant locking contention in memory reclaimer around
sb_lock inside grab_super_passive(). Grab_super_passive() is called from
two places: in icache/dcache shrinkers (function super_cache_scan) and
from writeback (function __writeback_inodes_wb). Both are required for
progress in memory allocator.

Grab_super_passive() acquires sb_lock to increment sb->s_count and check
sb->s_instances. It seems sb->s_umount locked for read is enough here:
super-block deactivation always runs under sb->s_umount locked for write.
Protecting super-block itself isn't a problem: in super_cache_scan() sb
is protected by shrinker_rwsem: it cannot be freed if its slab shrinkers
are still active. Inside writeback super-block comes from inode from bdi
writeback list under wb->list_lock.

This patch removes locking sb_lock and checks s_instances under s_umount:
generic_shutdown_super() unlinks it under sb->s_umount locked for write.
New variant is called trylock_super() and since it only locks semaphore,
callers must call up_read(&sb->s_umount) instead of drop_super(sb) when
they're done.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-02-22 11:38:42 -05:00
David Howells 54f2a2f427 fanotify: Fix up scripted S_ISDIR/S_ISREG/S_ISLNK conversions
Fanotify probably doesn't want to watch autodirs so make it use d_can_lookup()
rather than d_is_dir() when checking a dir watch and give an error on fake
directories.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-02-22 11:38:42 -05:00
David Howells ce40fa78ef Cachefiles: Fix up scripted S_ISDIR/S_ISREG/S_ISLNK conversions
Fix up the following scripted S_ISDIR/S_ISREG/S_ISLNK conversions (or lack
thereof) in cachefiles:

 (1) Cachefiles mostly wants to use d_can_lookup() rather than d_is_dir() as
     it doesn't want to deal with automounts in its cache.

 (2) Coccinelle didn't find S_IS* expressions in ASSERT() statements in
     cachefiles.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-02-22 11:38:41 -05:00
David Howells e36cb0b89c VFS: (Scripted) Convert S_ISLNK/DIR/REG(dentry->d_inode) to d_is_*(dentry)
Convert the following where appropriate:

 (1) S_ISLNK(dentry->d_inode) to d_is_symlink(dentry).

 (2) S_ISREG(dentry->d_inode) to d_is_reg(dentry).

 (3) S_ISDIR(dentry->d_inode) to d_is_dir(dentry).  This is actually more
     complicated than it appears as some calls should be converted to
     d_can_lookup() instead.  The difference is whether the directory in
     question is a real dir with a ->lookup op or whether it's a fake dir with
     a ->d_automount op.

In some circumstances, we can subsume checks for dentry->d_inode not being
NULL into this, provided we the code isn't in a filesystem that expects
d_inode to be NULL if the dirent really *is* negative (ie. if we're going to
use d_inode() rather than d_backing_inode() to get the inode pointer).

Note that the dentry type field may be set to something other than
DCACHE_MISS_TYPE when d_inode is NULL in the case of unionmount, where the VFS
manages the fall-through from a negative dentry to a lower layer.  In such a
case, the dentry type of the negative union dentry is set to the same as the
type of the lower dentry.

However, if you know d_inode is not NULL at the call site, then you can use
the d_is_xxx() functions even in a filesystem.

There is one further complication: a 0,0 chardev dentry may be labelled
DCACHE_WHITEOUT_TYPE rather than DCACHE_SPECIAL_TYPE.  Strictly, this was
intended for special directory entry types that don't have attached inodes.

The following perl+coccinelle script was used:

use strict;

my @callers;
open($fd, 'git grep -l \'S_IS[A-Z].*->d_inode\' |') ||
    die "Can't grep for S_ISDIR and co. callers";
@callers = <$fd>;
close($fd);
unless (@callers) {
    print "No matches\n";
    exit(0);
}

my @cocci = (
    '@@',
    'expression E;',
    '@@',
    '',
    '- S_ISLNK(E->d_inode->i_mode)',
    '+ d_is_symlink(E)',
    '',
    '@@',
    'expression E;',
    '@@',
    '',
    '- S_ISDIR(E->d_inode->i_mode)',
    '+ d_is_dir(E)',
    '',
    '@@',
    'expression E;',
    '@@',
    '',
    '- S_ISREG(E->d_inode->i_mode)',
    '+ d_is_reg(E)' );

my $coccifile = "tmp.sp.cocci";
open($fd, ">$coccifile") || die $coccifile;
print($fd "$_\n") || die $coccifile foreach (@cocci);
close($fd);

foreach my $file (@callers) {
    chomp $file;
    print "Processing ", $file, "\n";
    system("spatch", "--sp-file", $coccifile, $file, "--in-place", "--no-show-diff") == 0 ||
	die "spatch failed";
}

[AV: overlayfs parts skipped]

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-02-22 11:38:41 -05:00
David Howells 2c616d4d88 SELinux: Use d_is_positive() rather than testing dentry->d_inode
Use d_is_positive() rather than testing dentry->d_inode in SELinux to get rid
of direct references to d_inode outside of the VFS.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-02-22 11:38:40 -05:00