linux/mm
Li Zefan 55782138e4 tracing/events: convert block trace points to TRACE_EVENT()
TRACE_EVENT is a more generic way to define tracepoints. Doing so adds
these new capabilities to this tracepoint:

  - zero-copy and per-cpu splice() tracing
  - binary tracing without printf overhead
  - structured logging records exposed under /debug/tracing/events
  - trace events embedded in function tracer output and other plugins
  - user-defined, per tracepoint filter expressions
  ...

Cons:

  - no dev_t info for the output of plug, unplug_timer and unplug_io events.
    no dev_t info for getrq and sleeprq events if bio == NULL.
    no dev_t info for rq_abort,...,rq_requeue events if rq->rq_disk == NULL.

    This is mainly because we can't get the deivce from a request queue.
    But this may change in the future.

  - A packet command is converted to a string in TP_assign, not TP_print.
    While blktrace do the convertion just before output.

    Since pc requests should be rather rare, this is not a big issue.

  - In blktrace, an event can have 2 different print formats, but a TRACE_EVENT
    has a unique format, which means we have some unused data in a trace entry.

    The overhead is minimized by using __dynamic_array() instead of __array().

I've benchmarked the ioctl blktrace vs the splice based TRACE_EVENT tracing:

      dd                   dd + ioctl blktrace       dd + TRACE_EVENT (splice)
1     7.36s, 42.7 MB/s     7.50s, 42.0 MB/s          7.41s, 42.5 MB/s
2     7.43s, 42.3 MB/s     7.48s, 42.1 MB/s          7.43s, 42.4 MB/s
3     7.38s, 42.6 MB/s     7.45s, 42.2 MB/s          7.41s, 42.5 MB/s

So the overhead of tracing is very small, and no regression when using
those trace events vs blktrace.

And the binary output of TRACE_EVENT is much smaller than blktrace:

 # ls -l -h
 -rw-r--r-- 1 root root 8.8M 06-09 13:24 sda.blktrace.0
 -rw-r--r-- 1 root root 195K 06-09 13:24 sda.blktrace.1
 -rw-r--r-- 1 root root 2.7M 06-09 13:25 trace_splice.out

Following are some comparisons between TRACE_EVENT and blktrace:

plug:
  kjournald-480   [000]   303.084981: block_plug: [kjournald]
  kjournald-480   [000]   303.084981:   8,0    P   N [kjournald]

unplug_io:
  kblockd/0-118   [000]   300.052973: block_unplug_io: [kblockd/0] 1
  kblockd/0-118   [000]   300.052974:   8,0    U   N [kblockd/0] 1

remap:
  kjournald-480   [000]   303.085042: block_remap: 8,0 W 102736992 + 8 <- (8,8) 33384
  kjournald-480   [000]   303.085043:   8,0    A   W 102736992 + 8 <- (8,8) 33384

bio_backmerge:
  kjournald-480   [000]   303.085086: block_bio_backmerge: 8,0 W 102737032 + 8 [kjournald]
  kjournald-480   [000]   303.085086:   8,0    M   W 102737032 + 8 [kjournald]

getrq:
  kjournald-480   [000]   303.084974: block_getrq: 8,0 W 102736984 + 8 [kjournald]
  kjournald-480   [000]   303.084975:   8,0    G   W 102736984 + 8 [kjournald]

  bash-2066  [001]  1072.953770:   8,0    G   N [bash]
  bash-2066  [001]  1072.953773: block_getrq: 0,0 N 0 + 0 [bash]

rq_complete:
  konsole-2065  [001]   300.053184: block_rq_complete: 8,0 W () 103669040 + 16 [0]
  konsole-2065  [001]   300.053191:   8,0    C   W 103669040 + 16 [0]

  ksoftirqd/1-7   [001]  1072.953811:   8,0    C   N (5a 00 08 00 00 00 00 00 24 00) [0]
  ksoftirqd/1-7   [001]  1072.953813: block_rq_complete: 0,0 N (5a 00 08 00 00 00 00 00 24 00) 0 + 0 [0]

rq_insert:
  kjournald-480   [000]   303.084985: block_rq_insert: 8,0 W 0 () 102736984 + 8 [kjournald]
  kjournald-480   [000]   303.084986:   8,0    I   W 102736984 + 8 [kjournald]

Changelog from v2 -> v3:

- use the newly introduced __dynamic_array().

Changelog from v1 -> v2:

- use __string() instead of __array() to minimize the memory required
  to store hex dump of rq->cmd().

- support large pc requests.

- add missing blk_fill_rwbs_rq() in block_rq_requeue TRACE_EVENT.

- some cleanups.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
LKML-Reference: <4A2DF669.5070905@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-06-09 12:34:23 -04:00
..
allocpercpu.c percpu: __percpu_depopulate_mask can take a const mask 2009-04-06 13:44:15 -07:00
backing-dev.c block: change the request allocation/congestion logic to be sync/async based 2009-04-06 08:04:53 -07:00
bootmem.c
bounce.c tracing/events: convert block trace points to TRACE_EVENT() 2009-06-09 12:34:23 -04:00
debug-pagealloc.c
dmapool.c
fadvise.c
failslab.c
filemap_xip.c
filemap.c Export filemap_write_and_wait_range 2009-04-16 07:47:49 -07:00
fremap.c
highmem.c
hugetlb.c
internal.h
Kconfig nommu: make the initial mmap allocation excess behaviour Kconfig configurable 2009-05-06 16:36:10 -07:00
Kconfig.debug
maccess.c
madvise.c Revert "Ignore madvise(MADV_WILLNEED) for hugetlbfs-backed regions" 2009-05-13 08:29:12 -07:00
Makefile
memcontrol.c memcg: fix mem_cgroup_shrink_usage() 2009-05-02 15:36:09 -07:00
memory_hotplug.c
memory.c mm: close page_mkwrite races 2009-05-02 15:36:09 -07:00
mempolicy.c
mempool.c
migrate.c
mincore.c
mlock.c x86, bts, mm: clean up buffer allocation 2009-04-24 10:18:52 +02:00
mm_init.c
mmap.c mm: fix Committed_AS underflow on large NR_CPUS environment 2009-05-02 15:36:10 -07:00
mmu_notifier.c
mmzone.c
mprotect.c
mremap.c
msync.c
nommu.c NOMMU: Don't check vm_region::vm_start is page aligned in add_nommu_region() 2009-05-07 12:03:41 -07:00
oom_kill.c oom: prevent livelock when oom_kill_allocating_task is set 2009-05-06 16:36:09 -07:00
page_alloc.c nommu: clamp zone_batchsize() to 0 under NOMMU conditions 2009-05-06 16:36:10 -07:00
page_cgroup.c
page_io.c
page_isolation.c
page-writeback.c
pagewalk.c
pdflush.c Revert "mm: add /proc controls for pdflush threads" 2009-05-15 11:32:24 +02:00
percpu.c
prio_tree.c
quicklist.c
readahead.c
rmap.c
shmem_acl.c
shmem.c memcg: fix mem_cgroup_shrink_usage() 2009-05-02 15:36:09 -07:00
slab.c tracing, kmemtrace: Separate include/trace/kmemtrace.h to kmemtrace part and tracepoint part 2009-04-12 15:22:55 +02:00
slob.c tracing, kmemtrace: Separate include/trace/kmemtrace.h to kmemtrace part and tracepoint part 2009-04-12 15:22:55 +02:00
slub.c tracing, kmemtrace: Separate include/trace/kmemtrace.h to kmemtrace part and tracepoint part 2009-04-12 15:22:55 +02:00
sparse-vmemmap.c
sparse.c
swap_state.c
swap.c mm: fix Committed_AS underflow on large NR_CPUS environment 2009-05-02 15:36:10 -07:00
swapfile.c
thrash.c
truncate.c
util.c Merge branch 'linus' into tracing/core 2009-05-07 11:17:34 +02:00
vmalloc.c alloc_vmap_area: fix memory leak 2009-05-06 16:36:10 -07:00
vmscan.c vmscan: avoid multiplication overflow in shrink_zone() 2009-05-02 15:36:10 -07:00
vmstat.c