Commit Graph

564040 Commits

Author SHA1 Message Date
Jiri Olsa 936d120d5f tools build feature: Use value assignment form for FEATURE-DUMP file
Changing the contents of the FEATURE-DUMP file, so it looks like:

  feature-backtrace=1
  feature-dwarf=0
  feature-fortify-source=1
  feature-sync-compare-and-swap=0

This way it could get included in sub projects, so they won't be forced
to redo features detection.

Also now storing the complete set of features.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Wang Nan <wangnan0@huawei.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama <pi3orama@163.com>
Link: http://lkml.kernel.org/r/1450893514-9158-5-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-01-06 20:11:14 -03:00
Jiri Olsa c6a5f88f33 tools build feature: Introduce feature_assign macro
The feature_assign macro generates feature value
assignment for name, like:

  $(call feature_assign,dwarf) == feature-dwarf=1

This will be used more in following patches.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Wang Nan <wangnan0@huawei.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama <pi3orama@163.com>
Link: http://lkml.kernel.org/r/1450893514-9158-4-git-send-email-jolsa@kernel.org
[ Rename it to feature_assign, the original shorter name was misleading, to say the least ;-) ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-01-06 20:11:13 -03:00
Jiri Olsa 76ee2ff342 tools build feature: Move dwarf post unwind choice output into perf
We decide what dwarf unwind to choose way after the Makefile.feature
makefile is included. The $(dwarf-post-unwind) is not even set at that
time. For the same reason it was never included in FEATURE-DUMP file.

Moving it into perf VF=1 verbose display.

  $ make VF=1
    BUILD:   Doing 'make -j4' parallel build
  Auto-detecting system features:
  ...                         dwarf: [ on  ]
  ...
  ...                 LIBUNWIND_DIR:
  ...                     LIBDW_DIR:
  ...     DWARF post unwind library: libunwind
  ...

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Wang Nan <wangnan0@huawei.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama <pi3orama@163.com>
Link: http://lkml.kernel.org/r/1450893514-9158-3-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-01-06 20:11:13 -03:00
Jiri Olsa d0018b495c tools build feature: Fix feature_check_display_code typo
This function is cursed.. ;-)

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Wang Nan <wangnan0@huawei.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama <pi3orama@163.com>
Link: http://lkml.kernel.org/r/1450893514-9158-2-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-01-06 20:11:13 -03:00
Namhyung Kim d49dadea78 perf tools: Make 'trace' or 'trace_fields' sort key default for tracepoint events
When an evlist contains tracepoint events only, use 'trace' sort key as
default.  If --raw-trace option was given, use 'trace_fields' instead.
This will make users more convenient to see trace result.

Suggested-and-Acked-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1450804030-29193-14-git-send-email-namhyung@kernel.org
[ Check evlist in get_default_sort_order() fixing a segfault in 'perf test hists' reported by Jiri Olsa ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-01-06 20:11:13 -03:00
Namhyung Kim 2e422fd1e4 perf tools: Add 'trace_fields' dynamic sort key
The 'trace_fields' sort key is similar as 'trace' sort key, but it shows
each fields separately.  Each event will get different columns as their
fields.

  $ perf report -s trace_fields --stdio
  # To display the perf.data header info, please use --header/--header-only options.
  #
  #
  # Total Lost Samples: 0
  #
  # Samples: 20K of event 'kmem:kmalloc'
  # Event count (approx.): 20533
  #
  # Overhead  Command           call_site                 ptr  bytes_req  bytes_alloc            gfp_flags
  # ........  .......  ..................  ..................  .........  ...........  ...................
  #
      99.89%  perf       ffffffffa01d4396  0xffff8803ffb79720         96           96    GFP_NOFS|GFP_ZERO
       0.06%  sleep      ffffffff8114e1cd  0xffff8803d228a000       4096         4096           GFP_KERNEL
       0.03%  perf       ffffffff811d6ae6  0xffff8803f7678f00        240          256  GFP_KERNEL|GFP_ZERO
       0.00%  perf       ffffffff812263c1  0xffff880406172380        128          128           GFP_KERNEL
       0.00%  perf       ffffffff812264b9  0xffff8803ffac1600        504          512           GFP_KERNEL
       0.00%  perf       ffffffff81226634  0xffff880401dc5280         28           32           GFP_KERNEL
       0.00%  sleep      ffffffff81226da9  0xffff8803ffac3a00        392          512           GFP_KERNEL

  # Samples: 20K of event 'kmem:kfree'
  # Event count (approx.): 20597
  #
  # Overhead           call_site                 ptr
  # ........  ..................  ..................
  #
      99.58%    ffffffffa01d85ad  0xffff8803ffb79720
       0.07%    ffffffff81443f5c  0xffff8803f7669400
       0.02%    ffffffff811d5753  0xffff8803f7678f00
       0.01%    ffffffff81443f5c  0xffff8803f766be00
       0.01%    ffffffff8114e359  0xffff8803d228a000
       0.01%    ffffffff81443f5c  0xffff8800d156dc00
       0.01%    ffffffff81443f5c  0xffff8803f7669400
       0.01%    ffffffff8114e359  0xffff8803d228a000
       0.01%    ffffffff8114e359  0xffff8803d228a000
       0.01%    ffffffff8114e359  0xffff8803d228a000

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1450804030-29193-13-git-send-email-namhyung@kernel.org
[ Combined with "perf tools: Fix segfault when using -s trace_fields" ]
Link: http://lkml.kernel.org/r/1451991518-25673-1-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-01-06 20:11:13 -03:00
Namhyung Kim 361459f163 perf tools: Skip dynamic fields not defined for current event
When there are multiple events, each dynamic sort key is defined just
for one event.  In this case other events will always show "N/A" for
those fields.  But they are meaningless and consume precious screen
width.

Let's skip those undefined dynamic fields.

  $ perf record -e kmem:kmalloc,kmem:kfree -a sleep 1

  $ perf report -s 'comm,kmalloc.*' --stdio
  # To display the perf.data header info, please use --header/--header-only options.
  #
  #
  # Total Lost Samples: 0
  #
  # Samples: 20K of event 'kmem:kmalloc'
  # Event count (approx.): 20533
  #
  # Overhead  Command           call_site                 ptr  bytes_req  bytes_alloc            gfp_flags
  # ........  .......  ..................  ..................  .........  ...........  ...................
  #
      99.89%  perf       ffffffffa01d4396  0xffff8803ffb79720         96           96    GFP_NOFS|GFP_ZERO
       0.06%  sleep      ffffffff8114e1cd  0xffff8803d228a000       4096         4096           GFP_KERNEL
       0.03%  perf       ffffffff811d6ae6  0xffff8803f7678f00        240          256  GFP_KERNEL|GFP_ZERO
       0.00%  perf       ffffffff812263c1  0xffff880406172380        128          128           GFP_KERNEL
       0.00%  perf       ffffffff812264b9  0xffff8803ffac1600        504          512           GFP_KERNEL
       0.00%  perf       ffffffff81226634  0xffff880401dc5280         28           32           GFP_KERNEL
       0.00%  sleep      ffffffff81226da9  0xffff8803ffac3a00        392          512           GFP_KERNEL

  # Samples: 20K of event 'kmem:kfree'
  # Event count (approx.): 20597
  #
  # Overhead  Command
  # ........  ..............
  #
      99.63%  perf
       0.14%  sleep
       0.11%  irq/36-iwlwifi
       0.11%  kworker/u16:0
       0.01%  Xorg
       0.00%  firefox

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1450804030-29193-12-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-01-06 20:11:12 -03:00
Namhyung Kim 3b099bf589 perf tools: Support '<event>.*' dynamic sort key
Support '*' character for field name to add all (non-common) fields as
sort keys easily.

  $ perf report -s 'switch.*' --stdio
  ...
  # Overhead    prev_comm  prev_pid   prev_prio  prev_state     next_comm  next_pid  next_prio
  # ........  ...........  .........  .........  ..........  ............  ........  .........
  #
       3.82%    swapper/0         0         120           0   netctl-auto     18711        120
       3.75%  netctl-auto     18711         120           1     swapper/0         0        120
       2.24%    swapper/1         0         120           0   netctl-auto     18709        120
       2.24%  netctl-auto     18709         120           1     swapper/1         0        120
       1.80%    swapper/2         0         120           0   rcu_preempt         7        120
       1.80%    swapper/2         0         120           0   netctl-auto     18711        120
       1.80%  rcu_preempt         7         120           1     swapper/2         0        120
       1.80%  netctl-auto     18711         120           1     swapper/2         0        120
  ...

Suggested-and-acked-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1450804030-29193-11-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-01-06 20:11:12 -03:00
Namhyung Kim 5d0cff93bb perf tools: Support shortcuts for events in dynamic sort keys
The dynamic sort key requires event name but specifying full event name
is rather inconvenient.  This patch adds more ways to identify the event
in a more compact way.

  1. If session has just one event, event name can be omitted.
  2. Events can be accessed by index preceded by a percent sign.
  3. A part of the name can be used, if it's not ambiguous.  The partial
     name should not contain ':' in it.
  4. Full system + event name is still used, it should contain ':'.

So in the below example all does same thing:

  $ perf record -e sched:sched_switch -a sleep 1

  $ perf report -s next_pid,next_comm
  $ perf report -s %1.next_pid,%1.next_comm
  $ perf report -s switch.next_pid,switch.next_comm
  $ perf report -s sched:sched_switch.next_pid,sched:sched_switch.next_comm

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1450804030-29193-10-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-01-06 20:11:12 -03:00
Namhyung Kim 053a3989e1 perf report/top: Add --raw-trace option
The --raw-trace option allows disabling pretty printing by the event's
print_fmt or plugin.  Besides that, each dynamic sort key now can
receive a 'raw' suffix separated by '/' to ask for the raw trace of a
specific field.

  $ perf report -s comm,kmem:kmalloc.gfp_flags
  ...
  # Overhead  Command            gfp_flags
  # ........  .......  ...................
  #
      99.89%  perf       GFP_NOFS|GFP_ZERO
       0.06%  sleep             GFP_KERNEL
       0.03%  perf     GFP_KERNEL|GFP_ZERO
       0.01%  perf              GFP_KERNEL

Now

  $ perf report -s comm,kmem:kmalloc.gfp_flags --raw-trace
or
  $ perf report -s comm,kmem:kmalloc.gfp_flags/raw
  ...
  # Overhead  Command   gfp_flags
  # ........  .......  ..........
  #
      99.89%  perf          32848
       0.06%  sleep           208
       0.03%  perf          32976
       0.01%  perf            208

Suggested-and-Acked-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1450804030-29193-9-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-01-06 20:11:12 -03:00
Namhyung Kim a34bb6a08d perf tools: Add 'trace' sort key
The 'trace' sort key is to show tracepoint event output using either
print fmt or plugin.  For example sched_switch event (using plugin) will
show output like below:

  # perf record -e sched:sched_switch -a usleep 10
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.197 MB perf.data (69 samples) ]
  #

  $ perf report -s trace --stdio
  ...
  # Overhead  Trace output
  # ........  ...................................................
  #
       9.48%  swapper/0:0 [120] R ==> transmission-gt:17773 [120]
       9.48%  transmission-gt:17773 [120] S ==> swapper/0:0 [120]
       9.04%  swapper/2:0 [120] R ==> transmission-gt:17773 [120]
       8.92%  transmission-gt:17773 [120] S ==> swapper/2:0 [120]
       5.25%  swapper/0:0 [120] R ==> kworker/0:1H:109 [100]
       5.21%  kworker/0:1H:109 [100] S ==> swapper/0:0 [120]
       1.78%  swapper/3:0 [120] R ==> transmission-gt:17773 [120]
       1.78%  transmission-gt:17773 [120] S ==> swapper/3:0 [120]
       1.53%  Xephyr:6524 [120] S ==> swapper/0:0 [120]
       1.53%  swapper/0:0 [120] R ==> Xephyr:6524 [120]
       1.17%  swapper/2:0 [120] R ==> irq/33-iwlwifi:233 [49]
       1.13%  irq/33-iwlwifi:233 [49] S ==> swapper/2:0 [120]

Note that the 'trace' sort key works only for tracepoint events.  If
it's used to other type of events, just "N/A" will be printed.

Suggested-and-acked-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1450804030-29193-8-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-01-06 20:11:12 -03:00
Namhyung Kim 60517d28fb perf tools: Try to show pretty printed output for dynamic sort keys
Each tracepoint event has format string for print to improve
readability.  Try to parse the output and match the field name.  If it
finds one, use that for the result.  If not, fallbacks to the original
output.

For example, sort on kmem:kmalloc.gfp_flags looks like below:
(Note: libtraceevent plugins are not installed on my system.  They might
affect the output below)

Before:
  # Overhead  Command   gfp_flags
  # ........  .......  ..........
  #
      99.89%  perf          32848
       0.06%  sleep           208
       0.03%  perf          32976
       0.01%  perf            208

After:
  # Overhead  Command            gfp_flags
  # ........  .......  ...................
  #
      99.89%  perf       GFP_NOFS|GFP_ZERO
       0.06%  sleep             GFP_KERNEL
       0.03%  perf     GFP_KERNEL|GFP_ZERO
       0.01%  perf              GFP_KERNEL

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1450804030-29193-7-git-send-email-namhyung@kernel.org
[ Fixed clash with earlier, updated patch in this patchkit ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-01-06 20:11:11 -03:00
Namhyung Kim c7c2a5e40f perf tools: Add dynamic sort key for tracepoint events
The existing sort keys are less useful for tracepoint events in that
they are always sampled at the same place, the function where the
tracepoint is located.

For example, a 'perf report' on sched:sched_switch event looks like the
following:

  # Overhead  Command          Shared Object     Symbol
  # ........  ...............  ................  ..............
  #
      47.22%  swapper          [kernel.vmlinux]  [k] __schedule
      21.67%  transmission-gt  [kernel.vmlinux]  [k] __schedule
       8.23%  netctl-auto      [kernel.vmlinux]  [k] __schedule
       5.53%  kworker/0:1H     [kernel.vmlinux]  [k] __schedule
       1.98%  Xephyr           [kernel.vmlinux]  [k] __schedule
       1.33%  irq/33-iwlwifi   [kernel.vmlinux]  [k] __schedule
       1.17%  wpa_cli          [kernel.vmlinux]  [k] __schedule
       1.13%  rcu_preempt      [kernel.vmlinux]  [k] __schedule
       0.85%  ksoftirqd/0      [kernel.vmlinux]  [k] __schedule
       0.77%  Timer            [kernel.vmlinux]  [k] __schedule

In fact, tracepoints have meaningful information in their fields but
there's no way to use in 'perf report' currently.  The dynamic sort keys
are introduced in this patc to overcome this limitation.

The sched:sched_switch events have following fields:

  # sudo cat /sys/kernel/debug/tracing/events/sched/sched_switch/format
  name: sched_switch
  ID: 268
  format:
	field:unsigned short common_type;         offset:0; size:2; signed:0;
	field:unsigned char common_flags;         offset:2; size:1; signed:0;
	field:unsigned char common_preempt_count; offset:3; size:1; signed:0;
	field:int common_pid;                     offset:4; size:4; signed:1;

	field:char prev_comm[16]; offset:8;  size:16; signed:1;
	field:pid_t prev_pid;     offset:24; size:4;  signed:1;
	field:int prev_prio;      offset:28; size:4;  signed:1;
	field:long prev_state;    offset:32; size:8;  signed:1;
	field:char next_comm[16]; offset:40; size:16; signed:1;
	field:pid_t next_pid;     offset:56; size:4;  signed:1;
	field:int next_prio;      offset:60; size:4;  signed:1;

  print fmt: "prev_comm=%s prev_pid=%d prev_prio=%d prev_state=%s%s ==>
              next_comm=%s next_pid=%d next_prio=%d",
    REC->prev_comm, REC->prev_pid, REC->prev_prio,
    REC->prev_state & (2048-1) ? __print_flags(REC->prev_state & (2048-1),
    "|", { 1, "S"} , { 2, "D" }, { 4, "T" }, { 8, "t" }, { 16, "Z" }, { 32, "X" },
    { 64, "x" }, { 128, "K"}, { 256, "W" }, { 512, "P" }, { 1024, "N" }) : "R",
    REC->prev_state & 2048 ? "+" : "", REC->next_comm, REC->next_pid, REC->next_prio

With dynamic sort keys, you can use <event.field> as a sort key.  Those
dynamic keys are checked and created on demand.  For instance, below is
to sort by next_pid field output on the same data file:

  $ perf report -s comm,sched:sched_switch.next_pid --stdio
  ...
  # Overhead  Command            next_pid
  # ........  ...............  ..........
  #
      21.23%  transmission-gt           0
      20.86%  swapper               17773
       6.62%  netctl-auto               0
       5.25%  swapper                 109
       5.21%  kworker/0:1H              0
       1.98%  Xephyr                    0
       1.98%  swapper                6524
       1.98%  swapper               27478
       1.37%  swapper               27476
       1.17%  swapper                 233

Multiple dynamic sort keys are also supported:

  $ perf report -s comm,sched:sched_switch.next_pid,sched:sched_switch.next_comm --stdio
  ...
  # Overhead  Command            next_pid         next_comm
  # ........  ...............  ..........  ................
  #
      20.86%  swapper               17773   transmission-gt
       9.64%  transmission-gt           0         swapper/0
       9.16%  transmission-gt           0         swapper/2
       5.25%  swapper                 109      kworker/0:1H
       5.21%  kworker/0:1H              0         swapper/0
       2.14%  netctl-auto               0         swapper/2
       1.98%  netctl-auto               0         swapper/0
       1.98%  swapper                6524            Xephyr
       1.98%  swapper               27478       netctl-auto
       1.78%  transmission-gt           0         swapper/3
       1.53%  Xephyr                    0         swapper/0
       1.29%  netctl-auto               0         swapper/1
       1.29%  swapper               27476       netctl-auto
       1.21%  netctl-auto               0         swapper/3
       1.17%  swapper                 233    irq/33-iwlwifi

Note that pid 0 exists for each cpu so have comm of 'swapper/N'.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1450804030-29193-6-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-01-06 20:11:11 -03:00
Namhyung Kim 40184c46a3 perf tools: Pass evlist to setup_sorting()
This is a preparation to support dynamic sort keys for tracepoint
events.  Dynamic sort keys can be created for specific fields in trace
events so it needs the event information.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1450804030-29193-5-git-send-email-namhyung@kernel.org
[ Moving the evlist creation earlier in top was split to a previous patch ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-01-06 20:11:11 -03:00
Namhyung Kim 54f8f40384 perf top: Create the evlist sooner
This is a preparation to support dynamic sort keys for tracepoint
events.  Dynamic sort keys can be created for specific fields in trace
events so it needs the event information, so we need to pass the evlist
to the sort routines, create it sooner so that the next patch can do
that.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1450804030-29193-5-git-send-email-namhyung@kernel.org
[ Split from the patch passing the evlist to the sort routines ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-01-06 20:11:11 -03:00
Namhyung Kim be45d40efe tools lib traceevent: Factor out and export print_event_field[s]()
The print_event_field() and print_event_fields() functions print basic
information of a given field or event without the print format.  They'll
be used by dynamic sort keys later.

Committer note:

Rename it to pevent_print_field[s]() to get proper namespacing, as
discussed with Steven Rostedt.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1450876121-22494-1-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-01-06 20:11:11 -03:00
Namhyung Kim 723928340c perf hist: Save raw_data/size for tracepoint events
The raw_data and raw_size fields are to provide tracepoint specific
information.  They will be used by dynamic sort keys later.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1450923377-18641-1-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-01-06 20:11:10 -03:00
Namhyung Kim fd36f3dd79 perf hist: Pass struct sample to __hists__add_entry()
This is a preparation to add more info into the hist_entry.  Also it
already passes too many argument, so passing sample directly will reduce
the overhead of the function call.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1450804030-29193-2-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-01-06 20:11:10 -03:00
Yuchung Cheng 8b8a321ff7 tcp: fix zero cwnd in tcp_cwnd_reduction
Patch 3759824da8 ("tcp: PRR uses CRB mode by default and SS mode
conditionally") introduced a bug that cwnd may become 0 when both
inflight and sndcnt are 0 (cwnd = inflight + sndcnt). This may lead
to a div-by-zero if the connection starts another cwnd reduction
phase by setting tp->prior_cwnd to the current cwnd (0) in
tcp_init_cwnd_reduction().

To prevent this we skip PRR operation when nothing is acked or
sacked. Then cwnd must be positive in all cases as long as ssthresh
is positive:

1) The proportional reduction mode
   inflight > ssthresh > 0

2) The reduction bound mode
  a) inflight == ssthresh > 0

  b) inflight < ssthresh
     sndcnt > 0 since newly_acked_sacked > 0 and inflight < ssthresh

Therefore in all cases inflight and sndcnt can not both be 0.
We check invalid tp->prior_cwnd to avoid potential div0 bugs.

In reality this bug is triggered only with a sequence of less common
events.  For example, the connection is terminating an ECN-triggered
cwnd reduction with an inflight 0, then it receives reordered/old
ACKs or DSACKs from prior transmission (which acks nothing). Or the
connection is in fast recovery stage that marks everything lost,
but fails to retransmit due to local issues, then receives data
packets from other end which acks nothing.

Fixes: 3759824da8 ("tcp: PRR uses CRB mode by default and SS mode conditionally")
Reported-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-06 16:39:56 -05:00
Shrikrishna Khare 58caf63736 Driver: Vmxnet3: Fix regression caused by 5738a09
Reported-by: Bingkuo Liu <bingkuol@vmware.com>
Signed-off-by: Shrikrishna Khare <skhare@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-06 16:20:13 -05:00
Kristian Evensen e439bd4a4f net: qmi_wwan: Add WeTelecom-WPD600N
The WeTelecom-WPD600N is an LTE module that, in addition to supporting most
"normal" bands, also supports LTE over 450MHz. Manual testing showed that
only interface number three replies to QMI messages.

Cc: Bjørn Mork <bjorn@mork.no>
Signed-off-by: Kristian Evensen <kristian.evensen@gmail.com>
Acked-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-06 15:50:00 -05:00
Alan fde55c45d2 mkiss: fix scribble on freed memory
commit d79f16c046 fixed a user triggerable
scribble on free memory but added a new one which allows the user to
scribble even more and user controlled data into freed space.

As with 6pack we need to halt the queue before we free the buffers, because
the transmit logic is not protected by the semaphore.

Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-06 15:06:27 -05:00
Francesco Ruggeri 07a5d38453 net: possible use after free in dst_release
dst_release should not access dst->flags after decrementing
__refcnt to 0. The dst_entry may be in dst_busy_list and
dst_gc_task may dst_destroy it before dst_release gets a chance
to access dst->flags.

Fixes: d69bbf88c8 ("net: fix a race in dst_release()")
Fixes: 27b75c95f1 ("net: avoid RCU for NOCACHE dst")
Signed-off-by: Francesco Ruggeri <fruggeri@arista.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-06 15:00:27 -05:00
Takashi Iwai 3f37b26f8d ASoC: Last minute fixes for v4.4
A few final fixes for v4.4, the main one being the two patches to the
 new Sky Lake drivers which fix a previous incorrect fix that went in
 during an earlier -rc.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJWjUsZAAoJECTWi3JdVIfQ5eYH/3od9mB5GiX8hVwcbdeozdoa
 jov83C8roMBB5/ebRhIHXf1VI64axp2/Zv2hPjlHdoEjcVPjmdFRn0mno7w9NZqC
 271VdCpjXyB/U9PrFi0GK0ByeO+Ru33bqfzL25HpFgD0TQDYFB8N/533Qp4bZV24
 D/a/D4e3tUUhtKwIKDf1KfVp2hOKBEiD0Tyai2YIXBCszC8xltCowTE2yZ38aYA0
 f6Q+xPkCkgvCw7cE+n+PSQy7EoVH62Wol3ysrxk6anlGoSIH8ut3ZfMlncfgUCFm
 izJuiWKogm0SXHJh78MmgBFY0Xg4Fot3mJN6OaVzo8/TrYD4ERVhG/IBXrS/K30=
 =SaxY
 -----END PGP SIGNATURE-----

Merge tag 'asoc-fix-v4.4-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus

ASoC: Last minute fixes for v4.4

A few final fixes for v4.4, the main one being the two patches to the
new Sky Lake drivers which fix a previous incorrect fix that went in
during an earlier -rc.
2016-01-06 20:53:28 +01:00
Dmitry Monakhov a1c6f05733 fs: use block_device name vsprintf helper
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-01-06 13:03:18 -05:00
Dmitry Monakhov 1031bc5892 lib/vsprintf: add %*pg format specifier
This allow to directly print block_device name.
Currently one should use bdevname() with temporal char buffer.
This is very ineffective because bloat stack usage for deep IO call-traces

Example:
	%pg  ->    sda, sda1 or loop0p1

[AV: fixed a minor braino - position updates should not be dependent
upon having reached the of buffer]

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-01-06 12:55:29 -05:00
Dmitry Monakhov 424081f3c8 fs: use gendisk->disk_name where possible
gendisk with part==0 is obviously gendisk->disk_name.

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-01-06 12:42:11 -05:00
Tony Lindgren e7b11dc7b7 ARM: OMAP2+: Fix onenand rate detection to avoid filesystem corruption
Commit 63aa945b10 ("memory: omap-gpmc: Add Kconfig option for debug")
unified the GPMC debug for the SoCs with GPMC. The commit also left out
the option for HWMOD_INIT_NO_RESET as we now require proper timings for
GPMC to be able to remap GPMC devices out of address 0.

Unfortunately on Nokia N900, onenand now only partially works with the
device tree provided timings. It works enough to get detected but the
clock rate supported by the onenand chip gets misdetected. This in turn
causes the GPMC timings to be miscalculated and this leads into file
system corruption on N900.

Looks like onenand needs CS_CONFIG1 bit 27 WRITETYPE set for for sync
write. This is needed also for async timings when we write to onenand
with omap2_onenand_set_async_mode(). Without sync write bit set, the
async read for the onenand ONENAND_REG_VERSION_ID will return 0xfff.

Let's exit with an error if onenand rate is not detected. And let's
remove the extra call to omap2_onenand_set_async_mode() as we only need
to do this once at the end of omap2_onenand_setup_async().

Fixes: 63aa945b10 ("memory: omap-gpmc: Add Kconfig option for debug")
Cc: stable@vger.kernel.org # v4.2+
Reported-by: Ivaylo Dimitrov <ivo.g.dimitrov.75@gmail.com>
Tested-by: Ivaylo Dimitrov <ivo.g.dimitrov.75@gmail.com>
Tested-by: Aaro Koskinen <aaro.koskinen@iki.fi>
Signed-off-by: Tony Lindgren <tony@atomide.com>
2016-01-06 09:21:09 -08:00
Mark Rutland 2a803c4db6 arm64: head.S: use memset to clear BSS
Currently we use an open-coded memzero to clear the BSS. As it is a
trivial implementation, it is sub-optimal.

Our optimised memset doesn't use the stack, is position-independent, and
for the memzero case can use of DC ZVA to clear large blocks
efficiently. In __mmap_switched the MMU is on and there are no live
caller-saved registers, so we can safely call an uninstrumented memset.

This patch changes __mmap_switched to use memset when clearing the BSS.
We use the __pi_memset alias so as to avoid any instrumentation in all
kernel configurations.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-01-06 16:00:56 +00:00
Ard Biesheuvel b523e185bb efi: stub: define DISABLE_BRANCH_PROFILING for all architectures
This moves the DISABLE_BRANCH_PROFILING define from the x86 specific
to the general CFLAGS definition for the stub. This fixes build errors
when building for arm64 with CONFIG_PROFILE_ALL_BRANCHES_ENABLED.

Reviewed-by: Matt Fleming <matt@codeblueprint.co.uk>
Reported-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-01-06 15:42:12 +00:00
Mark Rutland ee03353bc0 arm64: entry: remove pointless SPSR mode check
In work_pending, we may skip work if the stacked SPSR value represents
anything other than an EL0 context. We then immediately invoke the
kernel_exit 0 macro as part of ret_to_user, assuming a return to EL0.
This is somewhat confusing.

We use work_pending as part of the ret_to_user/ret_fast_syscall state
machine. We only use ret_fast_syscall in the return from an SVC issued
from EL0. We use ret_to_user for return from EL0 exception handlers and
also for return from ret_from_fork in the case the task was not a kernel
thread (i.e. it is a user task).

Thus in all cases the stacked SPSR value must represent an EL0 context,
and the check is redundant. This patch removes it, along with the now
unused no_work_pending label.

Cc: Chris Metcalf <cmetcalf@ezchip.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-01-06 15:40:38 +00:00
Mateusz Guzik ccec5ee302 poll: plug an unused argument to do_poll
Number of fds is already known based on passed list.

No functional changes.

Signed-off-by: Mateusz Guzik <mguzik@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-01-06 08:26:52 -05:00
Al Viro 8f1d57c172 amdkfd: don't open-code memdup_user()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-01-06 08:25:25 -05:00
Al Viro abb0f6a79f cdrom: don't open-code memdup_user()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-01-06 08:25:24 -05:00
Al Viro 820351f05b rsxx: don't open-code memdup_user()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-01-06 08:25:24 -05:00
Al Viro 8ed6010d50 mtip32xx: don't open-code memdup_user()
[folded a fix by Dan Carpenter]

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-01-06 08:24:39 -05:00
Ingo Molnar 3104fb3dd4 Merge branch 'for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu
Pull RCU changes from Paul E. McKenney:

 - Adding transitivity uniformly to rcu_node structure ->lock
   acquisitions.  (This is implemented by the first two commits
   on top of v4.4-rc2 due to the pervasive nature of this change.)

 - Documentation updates, including RCU requirements.

 - Expedited grace-period changes.

 - Miscellaneous fixes.

 - Linked-list fixes, courtesy of KTSAN.

 - Torture-test updates.

 - Late-breaking fix to sysrq-generated crash.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-06 11:41:48 +01:00
Sekhar Nori d3b421cd07 irqchip/omap-intc: Add support for spurious irq handling
Under some conditions, irq sorting procedure used by INTC can go wrong
resulting in a spurious irq getting reported.

If this condition is not handled, it results in endless stream of:

    unexpected IRQ trap at vector 00

messages from ack_bad_irq()

Handle the spurious interrupt condition in omap-intc driver to prevent this.

Measurements using kernel function profiler on AM335x EVM running at 720MHz
show that after this patch omap_intc_handle_irq() takes about 37.4us against
34us before this patch.

Signed-off-by: Sekhar Nori <nsekhar@ti.com>
Acked-by: Tony Lindgren <tony@atomide.com>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: Felipe Balbi <balbi@ti.com>
Cc: Jason Cooper <jason@lakedaemon.net>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Link: http://lkml.kernel.org/r/9c78a6db02ac55f7af7371b417b6e414d2c3095b.1450188128.git.nsekhar@ti.com
Cc: stable@vger.kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-01-06 11:35:13 +01:00
Vince Weaver 9cc2617de5 perf/x86/amd: Remove l1-dcache-stores event for AMD
This is a long standing bug with the l1-dcache-stores generic event on
AMD machines.  My perf_event testsuite has been complaining about this
for years and I'm finally getting around to trying to get it fixed.

The data_cache_refills:system event does not make sense for l1-dcache-stores.
Maybe this was a typo and it was meant to be for l1-dcache-store-misses?

In any case, the values returned are nowhere near correct for l1-dcache-stores
and in fact the umask values for the event have completely changed with
fam15h so it makes even less sense than ever.  So just remove it.

Signed-off-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1512091134350.24311@vincent-weaver-1.umelst.maine.edu
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-06 11:15:39 +01:00
Harish Chegondi 77af0037de perf/x86/intel/uncore: Add Knights Landing uncore PMU support
Knights Landing uncore performance monitoring (perfmon) is derived from
Haswell-EP uncore perfmon with several differences. One notable difference
is in PCI device IDs. Knights Landing uses common PCI device ID for
multiple instances of an uncore PMU device type. In Haswell-EP, each
instance of a PMU device type has a unique device ID.

Knights Landing uncore components that have performance monitoring units
are UBOX, CHA, EDC, MC, M2PCIe, IRP and PCU. Perfmon registers in EDC, MC,
IRP, and M2PCIe reside in the PCIe configuration space. Perfmon registers
in UBOX, CHA and PCU are accessed via the MSR interface.

For more details, please refer to the public document:

  https://software.intel.com/sites/default/files/managed/15/8d/IntelXeonPhi%E2%84%A2x200ProcessorPerformanceMonitoringReferenceManual_Volume1_Registers_v0%206.pdf

Signed-off-by: Harish Chegondi <harish.chegondi@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Harish Chegondi <harish.chegondi@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Lukasz Anaczkowski <lukasz.anaczkowski@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/8ac513981264c3eb10343a3f523f19cc5a2d12fe.1449470704.git.harish.chegondi@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-06 11:15:38 +01:00
Harish Chegondi dae25530a4 perf/x86/intel/uncore: Remove hard coding of PMON box control MSR offset
Call uncore_pci_box_ctl() function to get the PMON box control MSR offset
instead of hard coding the offset. This would allow us to use this
snbep_uncore_pci_init_box() function for other PCI PMON devices whose box
control MSR offset is different from SNBEP_PCI_PMON_BOX_CTL.

Signed-off-by: Harish Chegondi <harish.chegondi@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Harish Chegondi <harish.chegondi@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Lukasz Anaczkowski <lukasz.anaczkowski@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/872e8ef16cfc38e5ff3b45fac1094e6f1722e4ad.1449470704.git.harish.chegondi@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-06 11:15:37 +01:00
Harish Chegondi 1e7b939062 perf/x86/intel: Add perf core PMU support for Intel Knights Landing
Knights Landing core is based on Silvermont core with several differences.
Like Silvermont, Knights Landing has 8 pairs of LBR MSRs. However, the
LBR MSRs addresses match those of the Xeon cores' first 8 pairs of LBR MSRs
Unlike Silvermont, Knights Landing supports hyperthreading. Knights Landing
offcore response events config register mask is different from that of the
Silvermont.

This patch was developed based on a patch from Andi Kleen.

For more details, please refer to the public document:

  https://software.intel.com/sites/default/files/managed/15/8d/IntelXeonPhi%E2%84%A2x200ProcessorPerformanceMonitoringReferenceManual_Volume1_Registers_v0%206.pdf

Signed-off-by: Harish Chegondi <harish.chegondi@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Harish Chegondi <harish.chegondi@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Lukasz Anaczkowski <lukasz.anaczkowski@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/d14593c7311f78c93c9cf6b006be843777c5ad5c.1449517401.git.harish.chegondi@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-06 11:15:37 +01:00
Kan Liang d6980ef325 perf/x86/intel/uncore: Add Broadwell-EP uncore support
The uncore subsystem for Broadwell-EP is similar to Haswell-EP.
There are some differences in pci device IDs, box number and
constraints. This patch extends the Broadwell-DE codes to support
Broadwell-EP.

Signed-off-by: Kan Liang <kan.liang@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/1449176411-9499-1-git-send-email-kan.liang@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-06 11:15:36 +01:00
Huang Rui d3bcd64bbc perf/x86/rapl: Use unified perf_event_sysfs_show instead of special interface
Actually, rapl_sysfs_show is a duplicate of perf_event_sysfs_show. We
prefer to use the unified interface.

Signed-off-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Dasaratharaman Chandramouli<dasaratharaman.chandramouli@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Robert Richter <rric@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/1449223661-2437-1-git-send-email-ray.huang@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-06 11:15:35 +01:00
Stephane Eranian 673d188ba5 perf/x86: Enable cycles:pp for Intel Atom
This patch updates the PEBS support for Intel Atom to provide
an alias for the cycles:pp event used by perf record/top by default
nowadays.

On Atom, only INST_RETIRED:ANY supports PEBS, so we use this event
instead with a large cmask to count cycles. Given that Core2 has
the same issue, we use the intel_pebs_aliases_core2() function for Atom
as well.

Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: kan.liang@intel.com
Link: http://lkml.kernel.org/r/1449172990-30183-3-git-send-email-eranian@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-06 11:15:34 +01:00
Stephane Eranian 1424a09a9e perf/x86: fix PEBS issues on Intel Atom/Core2
This patch fixes broken PEBS support on Intel Atom and Core2
due to wrong pointer arithmetic in intel_pmu_drain_pebs_core().

The get_next_pebs_record_by_bit() was called on PEBS format fmt0
which does not use the pebs_record_nhm layout.

Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: kan.liang@intel.com
Fixes: 21509084f9 ("perf/x86/intel: Handle multiple records in the PEBS buffer")
Link: http://lkml.kernel.org/r/1449182000-31524-3-git-send-email-eranian@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-06 11:15:34 +01:00
Stephane Eranian 6fc2e83077 perf/x86: Fix LBR related crashes on Intel Atom
This patches fixes the LBR kernel crashes on Intel Atom.

The kernel was assuming that if the CPU supports 64-bit format
LBR, then it has an LBR_SELECT MSR. Atom uses 64-bit LBR format
but does not have LBR_SELECT. That was causing NULL pointer
dereferences in a couple of places.

Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: kan.liang@intel.com
Fixes: 96f3eda67f ("perf/x86/intel: Fix static checker warning in lbr enable")
Link: http://lkml.kernel.org/r/1449182000-31524-2-git-send-email-eranian@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-06 11:15:33 +01:00
Stephane Eranian 61b87cae63 perf/x86: Fix filter_events() bug with event mappings
This patch fixes a bug in the filter_events() function.

The patch fixes the bug whereby if some mappings did not
exist, e.g., STALLED_CYCLES_FRONTEND, then any event after it
in the attrs array would disappear from the published list of
events in /sys/devices/cpu/events. This could be verified
easily on any system post SNB (which do not publish
STALLED_CYCLES_FRONTEND):

	$ ./perf stat -e cycles,ref-cycles true
	Performance counter stats for 'true':
              1,217,348      cycles
	<not supported>      ref-cycles

The problem is that in filter_events() there is an assumption
that the argument (attrs) is organized in increasing continuous
event indexes related to the event_map(). But if we remove the
non-supported events by shifing the position in the array, then
the lookup x86_pmu.event_map() needs to compensate for it, otherwise
we are looking up the wrong index. This patch corrects this problem
by compensating for the deleted events and with that ref-cycles
reappears (here shown on Haswell):

	$ perf stat -e ref-cycles,cycles true
	Performance counter stats for 'true':
         4,525,910      ref-cycles
         1,064,920      cycles
       0.002943888 seconds time elapsed

Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: jolsa@kernel.org
Cc: kan.liang@intel.com
Fixes: 8300daa267 ("perf/x86: Filter out undefined events from sysfs events attribute")
Link: http://lkml.kernel.org/r/1449516805-6637-1-git-send-email-eranian@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-06 11:15:33 +01:00
Andi Kleen 724697648e perf/x86: Use INST_RETIRED.PREC_DIST for cycles: ppp
Add a new 'three-p' precise level, that uses INST_RETIRED.PREC_DIST as
base. The basic mechanism of abusing the inverse cmask to get all
cycles works the same as before.

PREC_DIST is available on Sandy Bridge or later. It had some problems
on Sandy Bridge, so we only use it on IvyBridge and later. I tested it
on Broadwell and Skylake.

PREC_DIST has special support for avoiding shadow effects, which can
give better results compare to UOPS_RETIRED. The drawback is that
PREC_DIST can only schedule on counter 1, but that is ok for cycle
sampling, as there is normally no need to do multiple cycle sampling
runs in parallel. It is still possible to run perf top in parallel, as
that doesn't use precise mode. Also of course the multiplexing can
still allow parallel operation.

:pp stays with the previous event.

Example:

Sample a loop with 10 sqrt with old cycles:pp

	  0.14 │10:   sqrtps %xmm1,%xmm0     <--------------
	  9.13 │      sqrtps %xmm1,%xmm0
	 11.58 │      sqrtps %xmm1,%xmm0
	 11.51 │      sqrtps %xmm1,%xmm0
	  6.27 │      sqrtps %xmm1,%xmm0
	 10.38 │      sqrtps %xmm1,%xmm0
	 12.20 │      sqrtps %xmm1,%xmm0
	 12.74 │      sqrtps %xmm1,%xmm0
	  5.40 │      sqrtps %xmm1,%xmm0
	 10.14 │      sqrtps %xmm1,%xmm0
	 10.51 │    ↑ jmp    10

We expect all 10 sqrt to get roughly the sample number of samples.

But you can see that the instruction directly after the JMP is
systematically underestimated in the result, due to sampling shadow
effects.

With the new PREC_DIST based sampling this problem is gone and all
instructions show up roughly evenly:

	  9.51 │10:   sqrtps %xmm1,%xmm0
	 11.74 │      sqrtps %xmm1,%xmm0
	 11.84 │      sqrtps %xmm1,%xmm0
	  6.05 │      sqrtps %xmm1,%xmm0
	 10.46 │      sqrtps %xmm1,%xmm0
	 12.25 │      sqrtps %xmm1,%xmm0
	 12.18 │      sqrtps %xmm1,%xmm0
	  5.26 │      sqrtps %xmm1,%xmm0
	 10.13 │      sqrtps %xmm1,%xmm0
	 10.43 │      sqrtps %xmm1,%xmm0
	  0.16 │    ↑ jmp    10

Even with PREC_DIST there is still sampling skid and the result is not
completely even, but systematic shadow effects are significantly
reduced.

The improvements are mainly expected to make a difference in high IPC
code. With low IPC it should be similar.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: hpa@zytor.com
Link: http://lkml.kernel.org/r/1448929689-13771-2-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-06 11:15:32 +01:00
Andi Kleen 442f5c74cb perf/x86: Use INST_RETIRED.TOTAL_CYCLES_PS for cycles:pp for Skylake
I added UOPS_RETIRED.ALL by mistake to the Skylake PEBS event list for
cycles:pp. But the event is not documented for Skylake, and has some
issues.

The recommended replacement for cycles:pp is to use
INST_RETIRED.ANY+pebs as a base, similar to what CPUs before Sandy
Bridge did. This new event is called INST_RETIRED.TOTAL_CYCLES_PS. The
event is not really new, but has been already used by perf before
Sandy Bridge for the original cycles:p

Note the SDM doesn't document that event either, but it's being
documented in the latest version of the event list on:

  https://download.01.org/perfmon/SKL

This patch does:

 - Remove UOPS_RETIRED.ALL from the Skylake PEBS event list

 - Add INST_RETIRED.ANY to the Skylake PEBS event list, and an table entry to
   allow cmask=16,inv=1 for cycles:pp

 - We don't need an extra entry for the base INST_RETIRED event,
   because it is already covered by the catch-all PEBS table entry.

 - Switch Skylake to use the Core2 PEBS alias (which is
   INST_RETIRED.TOTAL_CYCLES_PS)

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: hpa@zytor.com
Link: http://lkml.kernel.org/r/1448929689-13771-1-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-06 11:15:32 +01:00