linux/arch
Peter Zijlstra 9536c8d2da perf/x86: Optimize intel_pmu_pebs_fixup_ip()
There's been reports of high NMI handler overhead, highlighted by
such kernel messages:

  [ 3697.380195] perf samples too long (10009 > 10000), lowering kernel.perf_event_max_sample_rate to 13000
  [ 3697.389509] INFO: NMI handler (perf_event_nmi_handler) took too long to run: 9.331 msecs

Don Zickus analyzed the source of the overhead and reported:

 > While there are a few places that are causing latencies, for now I focused on
 > the longest one first.  It seems to be 'copy_user_from_nmi'
 >
 > intel_pmu_handle_irq ->
 >	intel_pmu_drain_pebs_nhm ->
 >		__intel_pmu_drain_pebs_nhm ->
 >			__intel_pmu_pebs_event ->
 >				intel_pmu_pebs_fixup_ip ->
 >					copy_from_user_nmi
 >
 > In intel_pmu_pebs_fixup_ip(), if the while-loop goes over 50, the sum of
 > all the copy_from_user_nmi latencies seems to go over 1,000,000 cycles
 > (there are some cases where only 10 iterations are needed to go that high
 > too, but in generall over 50 or so).  At this point copy_user_from_nmi
 > seems to account for over 90% of the nmi latency.

The solution to that is to avoid having to call copy_from_user_nmi() for
every instruction.

Since we already limit the max basic block size, we can easily
pre-allocate a piece of memory to copy the entire thing into in one
go.

Don reported this test result:

 > Your patch made a huge difference in improvement.  The
 > copy_from_user_nmi() no longer hits the million of cycles.  I still
 > have a batch of 100,000-300,000 cycles.  My longest NMI paths used
 > to be dominated by copy_from_user_nmi, now it is not (I have to dig
 > up the new hot path).

Reported-and-tested-by: Don Zickus <dzickus@redhat.com>
Cc: jmario@redhat.com
Cc: acme@infradead.org
Cc: dave.hansen@linux.intel.com
Cc: eranian@google.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20131016105755.GX10651@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-16 15:44:00 +02:00
..
alpha
arc ARC: Ignore ptrace SETREGSET request for synthetic register "stop_pc" 2013-10-12 12:00:36 +05:30
arm ARM: SoC fixes for 3.12-rc 2013-10-13 09:59:10 -07:00
arm64 arm64: Remove duplicate DEBUG_STACK_USAGE config 2013-10-02 18:03:26 +01:00
avr32
blackfin
c6x
cris
frv
h8300
hexagon
ia64
m32r
m68k
metag
microblaze
mips Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2013-10-12 11:06:18 -07:00
mn10300
openrisc
parisc parisc: let probe_kernel_read() capture access to page zero 2013-10-13 17:46:31 +02:00
powerpc compiler/gcc4: Add quirk for 'asm goto' miscompilation bug 2013-10-11 07:39:14 +02:00
s390 compiler/gcc4: Add quirk for 'asm goto' miscompilation bug 2013-10-11 07:39:14 +02:00
score
sh
sparc compiler/gcc4: Add quirk for 'asm goto' miscompilation bug 2013-10-11 07:39:14 +02:00
tile
um
unicore32
x86 perf/x86: Optimize intel_pmu_pebs_fixup_ip() 2013-10-16 15:44:00 +02:00
xtensa
.gitignore
Kconfig