If a PMC is about to overflow on a counter that's on an active perf event
(ie. less than 256 from the end) and a _different_ PMC overflows just at this
time (a PMC that's not on an active perf event), we currently mark the event as
found, but in reality it's not as it's likely the other PMC that caused the
IRQ. Since we mark it as found the second catch all for overflows doesn't run,
and we don't reset the overflowing PMC ever. Hence we keep hitting that same
PMC IRQ over and over and don't reset the actual overflowing counter.
This is a rewrite of the perf interrupt handler for book3s to get around this.
We now check to see if any of the PMCs have actually overflowed (ie >=
0x80000000). If yes, record it for active counters and just reset it for
inactive counters. If it's not overflowed, then we check to see if it's one of
the buggy power7 counters and if it is, record it and continue. If none of the
PMCs match this, then we make note that we couldn't find the PMC that caused
the IRQ.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Reviewed-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
cc: Paul Mackerras <paulus@samba.org>
cc: Anton Blanchard <anton@samba.org>
cc: Linux PPC dev <linuxppc-dev@ozlabs.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Support for stalled-cycles-frontend and stalled-cycles-backend is
added for e500-based processors.
The following mappings are used:
stalled-cycles-frontend or idle-cycles-frontend:
Com:18 Cycles decode stalled
stalled-cycles-backend or idle-cycles-backend
Com:19 cycles issue stalled
Signed-off-by: Chris Freehill <chrisf@freescale.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
If we have two cache events that require different settings of the L2SEL
bits in MMCR1 then we can not schedule those events simultaneously. Add
logic to the constraint handling to express that.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
powerpc/perf: Sample only if SIAR-Valid bit is set in P7+
On POWER7+ two new bits (mmcra[35] and mmcra[36]) indicate whether the
contents of SIAR and SDAR are valid.
For marked instructions on P7+, we must save the contents of SIAR and
SDAR registers only if these new bits are set.
This code/check for the SIAR-Valid bit is specific to P7+, so rather than
waste a CPU-feature bit use the PVR flag.
Note that Carl Love proposed a similar change for oprofile:
https://lkml.org/lkml/2012/6/22/309
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
We have an old FIXME in reg.h which points out that we should standardise
on PVR_foo for our PVR #defines. Currently we use PVR_ on 32-bit and PV_
on 64-bit.
So do that rename and remove the FIXME.
Seeing as we're touching all but one usage of __is_processor(), rename it
to something less ugly and more indicative of what it does, which is
simply to check the PVR version.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
For certain speculative events on Power7, 'perf stat' reports far higher
event count than 'perf record' for the same event.
As described in following commit, a performance monitor exception is raised
even when the the performance events are rolled back.
commit 0837e3242c
Author: Anton Blanchard <anton@samba.org>
Date: Wed Mar 9 14:38:42 2011 +1100
perf_event_interrupt() records an event only when an overflow occurs. But
this check for overflow is a simple 'if (val < 0)'.
Because the events are rolled back, this check for overflow fails and the
event is not recorded. perf_event_interrupt() later uses pmc_overflow() to
detect the overflow and resets the counters and the events are lost completely.
To properly detect the overflow of rolled back events, use pmc_overflow()
even when recording events.
To reproduce:
$ cat strcpy.c
#include <stdio.h>
#include <string.h>
main()
{
char buf[256];
alarm(5);
while(1)
strcpy(buf, "string1");
}
$ perf record -e r20014 ./strcpy
$ perf report -n > report.1
$ perf stat -e r20014 > report.2
# Compare report.1 and report.2
Reported-by: Maynard Johnson <mpjohn@us.ibm.com>
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
We use SIAR or regs->nip for the instruction pointer depending on
the PMU configuration, but we always use regs->nip in the callchain.
Use perf_instruction_pointer so the backtrace is consistent.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
At the moment we always use the SIAR if the PMU supports continuous
sampling. Unfortunately the SIAR and the PMU exception are not
synchronised for non marked events so we can end up with callchains
that dont make sense.
The following patch checks the HV and PR bits for samples coming from
userspace and always uses pt_regs for them. Userspace will never have
interrupts off so there is no real advantage to using the SIAR for
non marked events in userspace.
I had experimented with a patch that did a similar thing for kernel
samples but we lost a significant amount of information. I was
unable to profile any of our early exception code for example.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The logic to choose whether to use the SIAR or get the information
out of pt_regs is going to get more complicated, so do it once in
perf_read_regs.
We overload regs->result which is gross but we are already doing it
with regs->dsisr.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
We want to access the MMCRA_SIHV and MMCRA_SIPR bits elsewhere so
create mmcra_sihv and mmcra_sipr which hide the differences between
the old and new layout of the bits.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
We always need to pass the last sample period to
perf_sample_data_init(), otherwise the event distribution will be
wrong. Thus, modifiyng the function interface with the required period
as argument. So basically a pattern like this:
perf_sample_data_init(&data, ~0ULL);
data.period = event->hw.last_period;
will now be like that:
perf_sample_data_init(&data, ~0ULL, event->hw.last_period);
Avoids unininitialized data.period and simplifies code.
Signed-off-by: Robert Richter <robert.richter@amd.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1333390758-10893-3-git-send-email-robert.richter@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
970 and Power4 don't support "continuous sampling" which means that
when we aren't in marked instruction sampling mode (marked events),
SIAR isn't updated with the last instruction sampled before the
perf interrupt. On those processors, we must thus use the exception
SRR0 value as the sampled instruction pointer.
Those processors also don't support the SIPR and SIHV bits in MMCRA
which means we need some kind of heuristic to decide if SIAR values
represent kernel or user addresses.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Pull powerpc merge from Benjamin Herrenschmidt:
"Here's the powerpc batch for this merge window. It is going to be a
bit more nasty than usual as in touching things outside of
arch/powerpc mostly due to the big iSeriesectomy :-) We finally got
rid of the bugger (legacy iSeries support) which was a PITA to
maintain and that nobody really used anymore.
Here are some of the highlights:
- Legacy iSeries is gone. Thanks Stephen ! There's still some bits
and pieces remaining if you do a grep -ir series arch/powerpc but
they are harmless and will be removed in the next few weeks
hopefully.
- The 'fadump' functionality (Firmware Assisted Dump) replaces the
previous (equivalent) "pHyp assisted dump"... it's a rewrite of a
mechanism to get the hypervisor to do crash dumps on pSeries, the
new implementation hopefully being much more reliable. Thanks
Mahesh Salgaonkar.
- The "EEH" code (pSeries PCI error handling & recovery) got a big
spring cleaning, motivated by the need to be able to implement a
new backend for it on top of some new different type of firwmare.
The work isn't complete yet, but a good chunk of the cleanups is
there. Note that this adds a field to struct device_node which is
not very nice and which Grant objects to. I will have a patch soon
that moves that to a powerpc private data structure (hopefully
before rc1) and we'll improve things further later on (hopefully
getting rid of the need for that pointer completely). Thanks Gavin
Shan.
- I dug into our exception & interrupt handling code to improve the
way we do lazy interrupt handling (and make it work properly with
"edge" triggered interrupt sources), and while at it found & fixed
a wagon of issues in those areas, including adding support for page
fault retry & fatal signals on page faults.
- Your usual random batch of small fixes & updates, including a bunch
of new embedded boards, both Freescale and APM based ones, etc..."
I fixed up some conflicts with the generalized irq-domain changes from
Grant Likely, hopefully correctly.
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (141 commits)
powerpc/ps3: Do not adjust the wrapper load address
powerpc: Remove the rest of the legacy iSeries include files
powerpc: Remove the remaining CONFIG_PPC_ISERIES pieces
init: Remove CONFIG_PPC_ISERIES
powerpc: Remove FW_FEATURE ISERIES from arch code
tty/hvc_vio: FW_FEATURE_ISERIES is no longer selectable
powerpc/spufs: Fix double unlocks
powerpc/5200: convert mpc5200 to use of_platform_populate()
powerpc/mpc5200: add options to mpc5200_defconfig
powerpc/mpc52xx: add a4m072 board support
powerpc/mpc5200: update mpc5200_defconfig to fit for charon board
Documentation/powerpc/mpc52xx.txt: Checkpatch cleanup
powerpc/44x: Add additional device support for APM821xx SoC and Bluestone board
powerpc/44x: Add support PCI-E for APM821xx SoC and Bluestone board
MAINTAINERS: Update PowerPC 4xx tree
powerpc/44x: The bug fixed support for APM821xx SoC and Bluestone board
powerpc: document the FSL MPIC message register binding
powerpc: add support for MPIC message register API
powerpc/fsl: Added aliased MSIIR register address to MSI node in dts
powerpc/85xx: mpc8548cds - add 36-bit dts
...
The perf code has grown a lot since it started, and is big enough to
warrant its own subdirectory. For reference it's ~60% bigger than the
oprofile code. It declutters the kernel directory, makes it simpler to
grep for "just perf stuff", and allows us to shorten some filenames.
While we're at it, make it more obvious that we have two implementations
of the core perf logic. One for (roughly) Book3S CPUs, which was the
original implementation, and the other for Freescale embedded CPUs.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>