linux/include
Stephane Eranian a8d757ef07 perf events: Fix slow and broken cgroup context switch code
The current cgroup context switch code was incorrect leading
to bogus counts. Furthermore, as soon as there was an active
cgroup event on a CPU, the context switch cost on that CPU
would increase by a significant amount as demonstrated by a
simple ping/pong example:

 $ ./pong
 Both processes pinned to CPU1, running for 10s
 10684.51 ctxsw/s

Now start a cgroup perf stat:
 $ perf stat -e cycles,cycles -A -a -G test  -C 1 -- sleep 100

$ ./pong
 Both processes pinned to CPU1, running for 10s
 6674.61 ctxsw/s

That's a 37% penalty.

Note that pong is not even in the monitored cgroup.

The results shown by perf stat are bogus:
 $ perf stat -e cycles,cycles -A -a -G test  -C 1 -- sleep 100

 Performance counter stats for 'sleep 100':

 CPU1 <not counted> cycles   test
 CPU1 16,984,189,138 cycles  #    0.000 GHz

The second 'cycles' event should report a count @ CPU clock
(here 2.4GHz) as it is counting across all cgroups.

The patch below fixes the bogus accounting and bypasses any
cgroup switches in case the outgoing and incoming tasks are
in the same cgroup.

With this patch the same test now yields:
 $ ./pong
 Both processes pinned to CPU1, running for 10s
 10775.30 ctxsw/s

Start perf stat with cgroup:

 $ perf stat -e cycles,cycles -A -a -G test  -C 1 -- sleep 10

Run pong outside the cgroup:
 $ /pong
 Both processes pinned to CPU1, running for 10s
 10687.80 ctxsw/s

The penalty is now less than 2%.

And the results for perf stat are correct:

$ perf stat -e cycles,cycles -A -a -G test  -C 1 -- sleep 10

 Performance counter stats for 'sleep 10':

 CPU1 <not counted> cycles test #    0.000 GHz
 CPU1 23,933,981,448 cycles      #    0.000 GHz

Now perf stat reports the correct counts for
for the non cgroup event.

If we run pong inside the cgroup, then we also get the
correct counts:

$ perf stat -e cycles,cycles -A -a -G test  -C 1 -- sleep 10

 Performance counter stats for 'sleep 10':

 CPU1 22,297,726,205 cycles test #    0.000 GHz
 CPU1 23,933,981,448 cycles      #    0.000 GHz

      10.001457237 seconds time elapsed

Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20110825135803.GA4697@quad
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-08-29 12:28:33 +02:00
..
acpi Merge branch 'apei-release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6 2011-08-03 21:53:27 -10:00
asm-generic All Arch: remove linkage for sys_nfsservctl system call 2011-08-26 15:09:58 -07:00
crypto
drm drm: Separate EDID Header Check from EDID Block Check 2011-08-04 14:39:35 +01:00
keys
linux perf events: Fix slow and broken cgroup context switch code 2011-08-29 12:28:33 +02:00
math-emu
media [media] V4L: initial driver for ov5642 CMOS sensor 2011-07-27 17:56:09 -03:00
mtd
net ipv4: route non-local sources for raw socket 2011-08-07 22:52:32 -07:00
pcmcia Merge git://git.kernel.org/pub/scm/linux/kernel/git/brodo/pcmcia-2.6 2011-07-31 06:23:08 -10:00
rdma atomic: use <linux/atomic.h> 2011-07-26 16:49:47 -07:00
rxrpc atomic: use <linux/atomic.h> 2011-07-26 16:49:47 -07:00
scsi Merge branch 'for-linus' of git://git.open-osd.org/linux-open-osd 2011-08-06 22:56:03 -07:00
sound ASoC: omap: Update e-mail address of Jarkko Nikula 2011-08-12 11:45:10 +09:00
target target: Make standard INQUIRY return 'not connected' for tpg_virt_lun0 2011-08-22 19:25:35 +00:00
trace Merge branch 'for-linus' of git://git.kernel.dk/linux-block 2011-08-19 10:47:07 -07:00
video OMAP: DSS2: Remove unused opt_clock_available 2011-07-25 10:22:05 +03:00
xen xen/balloon: memory hotplug support for Xen balloon driver 2011-07-25 20:57:08 -07:00
Kbuild