linux/arch
Kim Phillips 0e3b74e262 perf/x86/amd: Update generic hardware cache events for Family 17h
Add a new amd_hw_cache_event_ids_f17h assignment structure set
for AMD families 17h and above, since a lot has changed.  Specifically:

L1 Data Cache

The data cache access counter remains the same on Family 17h.

For DC misses, PMCx041's definition changes with Family 17h,
so instead we use the L2 cache accesses from L1 data cache
misses counter (PMCx060,umask=0xc8).

For DC hardware prefetch events, Family 17h breaks compatibility
for PMCx067 "Data Prefetcher", so instead, we use PMCx05a "Hardware
Prefetch DC Fills."

L1 Instruction Cache

PMCs 0x80 and 0x81 (32-byte IC fetches and misses) are backward
compatible on Family 17h.

For prefetches, we remove the erroneous PMCx04B assignment which
counts how many software data cache prefetch load instructions were
dispatched.

LL - Last Level Cache

Removing PMCs 7D, 7E, and 7F assignments, as they do not exist
on Family 17h, where the last level cache is L3.  L3 counters
can be accessed using the existing AMD Uncore driver.

Data TLB

On Intel machines, data TLB accesses ("dTLB-loads") are assigned
to counters that count load/store instructions retired.  This
is inconsistent with instruction TLB accesses, where Intel
implementations report iTLB misses that hit in the STLB.

Ideally, dTLB-loads would count higher level dTLB misses that hit
in lower level TLBs, and dTLB-load-misses would report those
that also missed in those lower-level TLBs, therefore causing
a page table walk.  That would be consistent with instruction
TLB operation, remove the redundancy between dTLB-loads and
L1-dcache-loads, and prevent perf from producing artificially
low percentage ratios, i.e. the "0.01%" below:

        42,550,869      L1-dcache-loads
        41,591,860      dTLB-loads
             4,802      dTLB-load-misses          #    0.01% of all dTLB cache hits
         7,283,682      L1-dcache-stores
         7,912,392      dTLB-stores
               310      dTLB-store-misses

On AMD Families prior to 17h, the "Data Cache Accesses" counter is
used, which is slightly better than load/store instructions retired,
but still counts in terms of individual load/store operations
instead of TLB operations.

So, for AMD Families 17h and higher, this patch assigns "dTLB-loads"
to a counter for L1 dTLB misses that hit in the L2 dTLB, and
"dTLB-load-misses" to a counter for L1 DTLB misses that caused
L2 DTLB misses and therefore also caused page table walks.  This
results in a much more accurate view of data TLB performance:

        60,961,781      L1-dcache-loads
             4,601      dTLB-loads
               963      dTLB-load-misses          #   20.93% of all dTLB cache hits

Note that for all AMD families, data loads and stores are combined
in a single accesses counter, so no 'L1-dcache-stores' are reported
separately, and stores are counted with loads in 'L1-dcache-loads'.

Also note that the "% of all dTLB cache hits" string is misleading
because (a) "dTLB cache": although TLBs can be considered caches for
page tables, in this context, it can be misinterpreted as data cache
hits because the figures are similar (at least on Intel), and (b) not
all those loads (technically accesses) technically "hit" at that
hardware level.  "% of all dTLB accesses" would be more clear/accurate.

Instruction TLB

On Intel machines, 'iTLB-loads' measure iTLB misses that hit in the
STLB, and 'iTLB-load-misses' measure iTLB misses that also missed in
the STLB and completed a page table walk.

For AMD Family 17h and above, for 'iTLB-loads' we replace the
erroneous instruction cache fetches counter with PMCx084
"L1 ITLB Miss, L2 ITLB Hit".

For 'iTLB-load-misses' we still use PMCx085 "L1 ITLB Miss,
L2 ITLB Miss", but set a 0xff umask because without it the event
does not get counted.

Branch Predictor (BPU)

PMCs 0xc2 and 0xc3 continue to be valid across all AMD Families.

Node Level Events

Family 17h does not have a PMCx0e9 counter, and corresponding counters
have not been made available publicly, so for now, we mark them as
unsupported for Families 17h and above.

Reference:

  "Open-Source Register Reference For AMD Family 17h Processors Models 00h-2Fh"
  Released 7/17/2018, Publication #56255, Revision 3.03:
  https://www.amd.com/system/files/TechDocs/56255_OSRR.pdf

[ mingo: tidied up the line breaks. ]
Signed-off-by: Kim Phillips <kim.phillips@amd.com>
Cc: <stable@vger.kernel.org> # v4.9+
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Janakarajan Natarajan <Janakarajan.Natarajan@amd.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Liška <mliska@suse.cz>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Pu Wen <puwen@hygon.cn>
Cc: Stephane Eranian <eranian@google.com>
Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Thomas Lendacky <Thomas.Lendacky@amd.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: linux-kernel@vger.kernel.org
Cc: linux-perf-users@vger.kernel.org
Fixes: e40ed1542d ("perf/x86: Add perf support for AMD family-17h processors")
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-05-02 18:28:12 +02:00
..
alpha arch: add pidfd and io_uring syscalls everywhere 2019-04-23 13:34:17 -07:00
arc ARC updates for 5.1 final 2019-05-01 13:40:30 -07:00
arm A small number of ARM fixes 2019-04-28 10:50:57 -07:00
arm64 arm64 fixes: 2019-04-26 11:26:53 -07:00
c6x syscalls: Remove start and number from syscall_set_arguments() args 2019-04-05 09:27:23 -04:00
csky syscalls: Remove start and number from syscall_set_arguments() args 2019-04-05 09:27:23 -04:00
h8300 syscalls: Remove start and number from syscall_get_arguments() args 2019-04-05 09:26:43 -04:00
hexagon syscalls: Remove start and number from syscall_get_arguments() args 2019-04-05 09:26:43 -04:00
ia64 arch: add pidfd and io_uring syscalls everywhere 2019-04-23 13:34:17 -07:00
m68k arch: add pidfd and io_uring syscalls everywhere 2019-04-23 13:34:17 -07:00
microblaze arch: add pidfd and io_uring syscalls everywhere 2019-04-23 13:34:17 -07:00
mips arch: add pidfd and io_uring syscalls everywhere 2019-04-23 13:34:17 -07:00
nds32 syscalls: Remove start and number from syscall_set_arguments() args 2019-04-05 09:27:23 -04:00
nios2 syscalls: Remove start and number from syscall_set_arguments() args 2019-04-05 09:27:23 -04:00
openrisc syscalls: Remove start and number from syscall_set_arguments() args 2019-04-05 09:27:23 -04:00
parisc arch: add pidfd and io_uring syscalls everywhere 2019-04-23 13:34:17 -07:00
powerpc powerpc fixes for 5.1 #6 2019-04-28 10:43:15 -07:00
riscv RISC-V: Fix Maximum Physical Memory 2GiB option for 64bit systems 2019-04-10 09:41:40 -07:00
s390 arch: add pidfd and io_uring syscalls everywhere 2019-04-23 13:34:17 -07:00
sh arch: add pidfd and io_uring syscalls everywhere 2019-04-23 13:34:17 -07:00
sparc arch: add pidfd and io_uring syscalls everywhere 2019-04-23 13:34:17 -07:00
um syscalls: Remove start and number from syscall_set_arguments() args 2019-04-05 09:27:23 -04:00
unicore32 KVM: export <linux/kvm_para.h> and <asm/kvm_para.h> iif KVM is supported 2019-03-28 17:27:42 +01:00
x86 perf/x86/amd: Update generic hardware cache events for Family 17h 2019-05-02 18:28:12 +02:00
xtensa arch: add pidfd and io_uring syscalls everywhere 2019-04-23 13:34:17 -07:00
.gitignore
Kconfig