Commit Graph

425 Commits

Author SHA1 Message Date
Kui-Feng Lee 1bc98ed290 btf_encoder: Collect info of per-cpu variables from threads
Collect the information of per-cpu variables from encoders of worker
threads to the primary encoder.

btf_encoder store per-cpu info separately, not in BTF.  Previously, it
merged only BTF types generated by worker threads.  So some of the
per-cpu info was missing.

Signed-off-by: Kui-Feng Lee <kuifeng@fb.com>
Tested-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: dwarves@vger.kernel.org
Link: https://lore.kernel.org/r/20220321230837.855572-1-kuifeng@fb.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-03-26 12:04:40 -03:00
Arnaldo Carvalho de Melo 65d7273668 pahole: Introduce --compile to produce a compilable output
This is the same strategy adopted by the bpftool when generating a
vmlinux file using:

  $ bpftool btf dump file vmlinux format c > vmlinux.h

Testing it:

  $ cat a.c
  #include "vmlinux.h"

  int main(void)
  {
        struct saved_context bla;

        bla.ds = 1;
        return bla.ds + 1;
  }
  $

  $ rm -f a ; make a
  cc     a.c   -o a
  $

Now using pahole:

  $ pahole --compile > vmlinux.h
  $ rm -f a ; make a
  $

This still have some issues, like:

  $ pahole --compile ../build/allyesconfig/drivers/spi/spi-bitbang.o > spi.h
  $ make spi
  cc     spi.c   -o spi
  In file included from spi.c:1:
  spi.h:7127:38: error: ‘txrx_word’ declared as function returning an array
   7127 |         u32 (*txrx_word)(struct spi_device *, unsigned int, u32, u8, unsigned int)[4]; /*   200    32 */
        |              ^~~~~~~~~
  make: *** [<builtin>: spi] Error 1
  $

The original source code:

	/* txrx_word[SPI_MODE_*]() just looks like a shift register */

                  u32 (*txrx_word[4])(struct spi_device *, unsigned int, u32, u8, unsigned int);

But this misgeneration predates this patch, so shouldn't prevent it from
being merged now, to make progress.

Also there are issues when generating from DWARF, so for now only accept
it if generating from BTF.

Trying to run it with dwarf produces, for now:

  $ pahole -F dwarf --compile vmlinux
  pahole: --compile currently only works with BTF.
  $

Cc: Tony Jones <tonyj@suse.de>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-03-09 15:54:24 -03:00
Kui-Feng Lee 2135275318 pahole: Use per-thread btf instances to avoid mutex locking
Create an instance of btf for each worker thread, and add type info to
the local btf instance in the steal-function of pahole without mutex
acquiring.  Once finished with all worker threads, merge all
per-thread btf instances to the primary btf instance.

Committer testing:

Results with no multithreading, and without further DWARF loading
improvements (not loading things that won't be converted to BTF, etc),
i.e. using pahole 1.21:

  # perf stat -r5 pahole --btf_encode /tmp/vmlinux ; btfdiff /tmp/vmlinux

   Performance counter stats for 'pahole --btf_encode /tmp/vmlinux' (5 runs):

            6,317.41 msec task-clock                #    0.985 CPUs utilized            ( +-  1.07% )
                  80      context-switches          #   12.478 /sec                     ( +- 15.25% )
                   1      cpu-migrations            #    0.156 /sec                     ( +-111.36% )
             535,890      page-faults               #   83.585 K/sec                    ( +-  0.00% )
      29,789,308,790      cycles                    #    4.646 GHz                      ( +-  0.46% )  (83.33%)
          97,696,165      stalled-cycles-frontend   #    0.33% frontend cycles idle     ( +-  4.05% )  (83.34%)
         145,554,652      stalled-cycles-backend    #    0.49% backend cycles idle      ( +- 21.53% )  (83.33%)
      78,215,192,264      instructions              #    2.61  insn per cycle
                                                    #    0.00  stalled cycles per insn  ( +-  0.05% )  (83.33%)
      18,141,376,637      branches                  #    2.830 G/sec                    ( +-  0.06% )  (83.33%)
         148,826,657      branch-misses             #    0.82% of all branches          ( +-  0.65% )  (83.34%)

              6.4129 +- 0.0682 seconds time elapsed  ( +-  1.06% )

  #

Now with pahole 1.23, with just parallel DWARF loading + trimmed DWARF
loading (skipping DWARF tags that won't be converted to BTF, etc):

  $ perf stat -r5 pahole -j --btf_encode /tmp/vmlinux

   Performance counter stats for 'pahole -j --btf_encode /tmp/vmlinux' (5 runs):

           10,828.98 msec task-clock:u              #    3.539 CPUs utilized            ( +-  0.94% )
                   0      context-switches:u        #    0.000 /sec
                   0      cpu-migrations:u          #    0.000 /sec
             105,407      page-faults:u             #    9.895 K/sec                    ( +-  0.15% )
      24,774,029,571      cycles:u                  #    2.326 GHz                      ( +-  0.50% )  (83.49%)
          76,895,232      stalled-cycles-frontend:u #    0.31% frontend cycles idle     ( +-  4.84% )  (83.50%)
          24,821,768      stalled-cycles-backend:u  #    0.10% backend cycles idle      ( +-  3.66% )  (83.11%)
      69,891,360,588      instructions:u            #    2.83  insn per cycle
                                                    #    0.00  stalled cycles per insn  ( +-  0.10% )  (83.20%)
      16,966,456,889      branches:u                #    1.593 G/sec                    ( +-  0.21% )  (83.41%)
         131,923,443      branch-misses:u           #    0.78% of all branches          ( +-  0.82% )  (83.42%)

              3.0600 +- 0.0140 seconds time elapsed  ( +-  0.46% )

  $

It is a bit better not to use -j to use all the CPU threads in the
machine, i.e. using just the number of non-hyperthreading cores, in this
machine, a Ryzen 5950x, 16 cores:

  $ perf stat -r5 pahole -j16 --btf_encode /tmp/vmlinux

   Performance counter stats for 'pahole -j16 --btf_encode /tmp/vmlinux' (5 runs):

           10,075.46 msec task-clock:u              #    3.431 CPUs utilized            ( +-  0.49% )
                   0      context-switches:u        #    0.000 /sec
                   0      cpu-migrations:u          #    0.000 /sec
              90,777      page-faults:u             #    8.983 K/sec                    ( +-  0.16% )
      22,611,016,624      cycles:u                  #    2.237 GHz                      ( +-  0.93% )  (83.34%)
          55,760,536      stalled-cycles-frontend:u #    0.24% frontend cycles idle     ( +-  2.35% )  (83.25%)
          15,985,651      stalled-cycles-backend:u  #    0.07% backend cycles idle      ( +-  8.79% )  (83.33%)
      68,976,319,497      instructions:u            #    2.96  insn per cycle
                                                    #    0.00  stalled cycles per insn  ( +-  0.34% )  (83.39%)
      16,770,540,533      branches:u                #    1.659 G/sec                    ( +-  0.31% )  (83.35%)
         128,220,385      branch-misses:u           #    0.76% of all branches          ( +-  0.77% )  (83.37%)

              2.9365 +- 0.0284 seconds time elapsed  ( +-  0.97% )

  $

Then with parallel DWARF loading + parallel BTF encoding (this patch):

  $ perf stat -r5 pahole -j --btf_encode /tmp/vmlinux

   Performance counter stats for 'pahole -j --btf_encode /tmp/vmlinux' (5 runs):

           11,063.29 msec task-clock:u              #    6.389 CPUs utilized            ( +-  0.79% )
                   0      context-switches:u        #    0.000 /sec
                   0      cpu-migrations:u          #    0.000 /sec
             163,263      page-faults:u             #   14.840 K/sec                    ( +-  0.48% )
      41,892,887,608      cycles:u                  #    3.808 GHz                      ( +-  0.96% )  (83.41%)
         197,163,158      stalled-cycles-frontend:u #    0.47% frontend cycles idle     ( +-  3.23% )  (83.46%)
         114,187,423      stalled-cycles-backend:u  #    0.27% backend cycles idle      ( +- 16.57% )  (83.43%)
      74,053,722,204      instructions:u            #    1.78  insn per cycle
                                                    #    0.00  stalled cycles per insn  ( +-  0.18% )  (83.37%)
      17,848,238,467      branches:u                #    1.622 G/sec                    ( +-  0.10% )  (83.27%)
         180,232,427      branch-misses:u           #    1.01% of all branches          ( +-  0.86% )  (83.16%)

              1.7316 +- 0.0301 seconds time elapsed  ( +-  1.74% )

  $

Again it is better not to use -j to use all the CPU threads:

  $ perf stat -r5 pahole -j16 --btf_encode /tmp/vmlinux

   Performance counter stats for 'pahole -j16 --btf_encode /tmp/vmlinux' (5 runs):

            6,626.33 msec task-clock:u              #    4.421 CPUs utilized            ( +-  0.82% )
                   0      context-switches:u        #    0.000 /sec
                   0      cpu-migrations:u          #    0.000 /sec
             140,919      page-faults:u             #   21.240 K/sec                    ( +-  1.03% )
      26,085,701,848      cycles:u                  #    3.932 GHz                      ( +-  1.20% )  (83.38%)
          98,962,246      stalled-cycles-frontend:u #    0.37% frontend cycles idle     ( +-  3.47% )  (83.41%)
         102,762,088      stalled-cycles-backend:u  #    0.39% backend cycles idle      ( +- 17.95% )  (83.38%)
      71,193,141,569      instructions:u            #    2.69  insn per cycle
                                                    #    0.00  stalled cycles per insn  ( +-  0.14% )  (83.33%)
      17,166,459,728      branches:u                #    2.587 G/sec                    ( +-  0.15% )  (83.27%)
         150,984,525      branch-misses:u           #    0.87% of all branches          ( +-  0.61% )  (83.34%)

              1.4989 +- 0.0113 seconds time elapsed  ( +-  0.76% )

  $

Minor tweaks to reduce the patch size, things like avoiding moving the
pthread_mutex_lock(&btf_lock) to after a comment, etc.

Signed-off-by: Kui-Feng Lee <kuifeng@fb.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: bpf@vger.kernel.org
Cc: dwarves@vger.kernel.org
Link: https://lore.kernel.org/r/20220126192039.2840752-4-kuifeng@fb.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-01-28 16:30:54 -03:00
Kui-Feng Lee 724c8fddd7 dwarf_loader: Receive per-thread data on worker threads
Add arguments to steal and thread_exit callbacks of conf_load to
receive per-thread data.

Signed-off-by: Kui-Feng Lee <kuifeng@fb.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: bpf@vger.kernel.org
Cc: dwarves@vger.kernel.org
Link: https://lore.kernel.org/r/20220126192039.2840752-2-kuifeng@fb.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-01-28 16:19:29 -03:00
Douglas RAILLARD 69fb1861de Revert "pahole: Add --inner_anon option"
This reverts commit 005236c3e4.

Dropped since it could not cope with recursive types. A new attempt will
be made on 1.24.

Signed-off-by: Douglas RAILLARD <douglas.raillard@arm.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-12-08 08:52:10 -03:00
Douglas Raillard 005236c3e4 pahole: Add --inner_anon option
Allow making the inner struct/enum/union anonymous. This permits using
the header to inspect pointer values using -E, without having to care
about avoiding duplicate type definitions such as:

    struct foo { ... };
    struct bar {
        struct foo {
	     ....
	} a;
    };

With --inner_anon, the conflict between the two definitions of struct
foo is gone:

    struct foo { ... };
    struct bar {
        struct {
	     ....
	} a;
    };

Committer testing:

  $ cat inner_anon.c

  struct foo {
  	int  a;
  	char b;
  };

  struct bar {
          struct foo c;
  	int	   d;
  } bla;
  $ gcc -g -c inner_anon.c   -o inner_anon.o

No expansion:

  $ pahole inner_anon.o
  struct foo {
  	int                        a;                    /*     0     4 */
  	char                       b;                    /*     4     1 */

  	/* size: 8, cachelines: 1, members: 2 */
  	/* padding: 3 */
  	/* last cacheline: 8 bytes */
  };
  struct bar {
  	struct foo                 c;                    /*     0     8 */

  	/* XXX last struct has 3 bytes of padding */

  	int                        d;                    /*     8     4 */

  	/* size: 12, cachelines: 1, members: 2 */
  	/* paddings: 1, sum paddings: 3 */
  	/* last cacheline: 12 bytes */
  };

Expanding types:

  $ pahole -E inner_anon.o
  struct foo {
  	int                        a;                                                    /*     0     4 */
  	char                       b;                                                    /*     4     1 */

  	/* size: 8, cachelines: 1, members: 2 */
  	/* padding: 3 */
  	/* last cacheline: 8 bytes */
  };
  struct bar {
  	struct foo {
  		int                a;                                                    /*     0     4 */
  		char               b;                                                    /*     4     1 */
  	}c; /*     0     8 */

  	/* XXX last struct has 3 bytes of padding */

  	int                        d;                                                    /*     8     4 */

  	/* size: 12, cachelines: 1, members: 2 */
  	/* paddings: 1, sum paddings: 3 */
  	/* last cacheline: 12 bytes */
  };

Anonymising the inner struct:

  $ pahole -E --inner_anon inner_anon.o
  struct foo {
  	int                        a;                                                    /*     0     4 */
  	char                       b;                                                    /*     4     1 */

  	/* size: 8, cachelines: 1, members: 2 */
  	/* padding: 3 */
  	/* last cacheline: 8 bytes */
  };
  struct bar {
  	struct /* foo */ {
  		int                a;                                                    /*     0     4 */
  		char               b;                                                    /*     4     1 */
  	}c; /*     0     8 */

  	/* XXX last struct has 3 bytes of padding */

  	int                        d;                                                    /*     8     4 */

  	/* size: 12, cachelines: 1, members: 2 */
  	/* paddings: 1, sum paddings: 3 */
  	/* last cacheline: 12 bytes */
  };

Signed-off-by: Douglas Raillard <douglas.raillard@arm.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
[ Added man page entry for --inner_anon, refreshed the patch to cope with the btf_tag series ]
Link: https://lore.kernel.org/all/20211019100724.325570-3-douglas.raillard@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-12-07 14:55:16 -03:00
Yonghong Song b488c8d328 dwarf_loader: Support btf_type_tag attribute
LLVM patches ([1] for clang, [2] and [3] for BPF backend)
added support for btf_type_tag attributes. The following is
an example:

  [$ ~] cat t.c
  #define __tag1 __attribute__((btf_type_tag("tag1")))
  #define __tag2 __attribute__((btf_type_tag("tag2")))
  int __tag1 * __tag1 __tag2 *g __attribute__((section(".data..percpu")));
  [$ ~] clang -O2 -g -c t.c
  [$ ~] llvm-dwarfdump --debug-info t.o
  t.o:    file format elf64-x86-64
  ...
  0x0000001e:   DW_TAG_variable
                  DW_AT_name      ("g")
                  DW_AT_type      (0x00000033 "int **")
                  DW_AT_external  (true)
                  DW_AT_decl_file ("/home/yhs/t.c")
                  DW_AT_decl_line (3)
                  DW_AT_location  (DW_OP_addr 0x0)
  0x00000033:   DW_TAG_pointer_type
                  DW_AT_type      (0x0000004b "int *")
  0x00000038:     DW_TAG_LLVM_annotation
                    DW_AT_name    ("btf_type_tag")
                    DW_AT_const_value     ("tag1")
  0x00000041:     DW_TAG_LLVM_annotation
                    DW_AT_name    ("btf_type_tag")
                    DW_AT_const_value     ("tag2")
  0x0000004a:     NULL
  0x0000004b:   DW_TAG_pointer_type
                  DW_AT_type      (0x0000005a "int")
  0x00000050:     DW_TAG_LLVM_annotation
                    DW_AT_name    ("btf_type_tag")
                    DW_AT_const_value     ("tag1")
  0x00000059:     NULL
  0x0000005a:   DW_TAG_base_type
                  DW_AT_name      ("int")
                  DW_AT_encoding  (DW_ATE_signed)
                  DW_AT_byte_size (0x04)
  0x00000061:   NULL

From the above example, you can see that DW_TAG_pointer_type may contain
one or more DW_TAG_LLVM_annotation btf_type_tag tags.  If
DW_TAG_LLVM_annotation tags are present inside DW_TAG_pointer_type, for
BTF encoding, pahole will need to follow [3] to generate a type chain
like:

  var -> ptr -> tag2 -> tag1 -> ptr -> tag1 -> int

This patch implemented dwarf_loader support. If a pointer type contains
DW_TAG_LLVM_annotation tags, a new type btf_type_tag_ptr_type will be
created which will store the pointer tag itself and all
DW_TAG_LLVM_annotation tags.  During recoding stage, the type chain will
be formed properly based on the above example.

An option "--skip_encoding_btf_type_tag" is added to disable
this new functionality.

  [1] https://reviews.llvm.org/D111199
  [2] https://reviews.llvm.org/D113222
  [3] https://reviews.llvm.org/D113496

Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: bpf@vger.kernel.org
Cc: dwarves@vger.kernel.org
Cc: kernel-team@fb.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-11-23 20:37:51 -03:00
Douglas Raillard 772725a77d dwarves_fprintf: Move cacheline_size into struct conf_fprintf
Remove the global variable and turn it into a member in struct
conf_fprintf, so that it can be used by other parts of the code.

Signed-off-by: Douglas Raillard <douglas.raillard@arm.com>
Cc: dwarves@vger.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-10-28 10:17:59 -03:00
Yonghong Song c52f6421f2 btf: Rename btf_tag to btf_decl_tag
Kernel commit ([1]) renamed btf_tag to btf_decl_tag for uapi btf.h and
libbpf api's. The reason is a new clang attribute, btf_type_tag, is
introduced ([2]).  Renaming btf_tag to btf_decl_tag makes it easier to
distinghish from btf_type_tag.

I also pulled in latest libbpf repo since it contains renamed libbpf api
function btf__add_decl_tag().

  [1] https://lore.kernel.org/bpf/20211012164838.3345699-1-yhs@fb.com/
  [2] https://reviews.llvm.org/D111199

Signed-off-by: Yonghong Song <yhs@fb.com>
[ Minor fixups to cope with --skip_missing ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-10-27 09:35:23 -03:00
Douglas Raillard 5282feee6d pahole: Add --skip_missing option
Add a --skip_missing option that allows pahole to keep going in case one
of the type passed to -C (e.g. via a file) does not exist.

This is useful for intropsection software such as debugging kernel
modules that can handle various kernel configurations and versions for
which some recently added types are missing. The consumer of the header
becomes responsible of gating the uses of the type with #ifdef
CONFIG_XXX, rather than pahole bailing out on the first unknown type.

Committer testing:

Before:

  $ pahole tcp_splice_state,xxfrm_policy_queue,list_head tcp.o
  struct tcp_splice_state {
  	struct pipe_inode_info *   pipe;                 /*     0     8 */
  	size_t                     len;                  /*     8     8 */
  	unsigned int               flags;                /*    16     4 */

  	/* size: 24, cachelines: 1, members: 3 */
  	/* padding: 4 */
  	/* last cacheline: 24 bytes */
  };
  pahole: type 'xxfrm_policy_queue' not found
  $

After:

  $ pahole --help |& grep skip
        --skip=COUNT           Skip COUNT input records
        --skip_encoding_btf_tag   Do not encode TAGs in BTF.
        --skip_encoding_btf_vars   Do not encode VARs in BTF.
        --skip_missing         skip missing types passed to -C rather than stop
  $ pahole --skip_missing tcp_splice_state,xxfrm_policy_queue,list_head tcp.o
  struct tcp_splice_state {
  	struct pipe_inode_info *   pipe;                 /*     0     8 */
  	size_t                     len;                  /*     8     8 */
  	unsigned int               flags;                /*    16     4 */

  	/* size: 24, cachelines: 1, members: 3 */
  	/* padding: 4 */
  	/* last cacheline: 24 bytes */
  };
  struct list_head {
  	struct list_head *         next;                 /*     0     8 */
  	struct list_head *         prev;                 /*     8     8 */

  	/* size: 16, cachelines: 1, members: 2 */
  	/* last cacheline: 16 bytes */
  };
  pahole: type 'xxfrm_policy_queue' not found
  $

Signed-off-by: Douglas Raillard <douglas.raillard@arm.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: dwarves@vger.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-10-26 11:29:55 -03:00
Yonghong Song aa8c494e65 dwarf_loader: Parse DWARF tag DW_TAG_LLVM_annotation
Parse the DWARF tag DW_TAG_LLVM_annotation. Only record annotations with
btf_tag name which corresponds to btf_tag attributes in C code. Such
information will be used later by the btf_encoder for BTF conversion.

The LLVM implementation only supports btf_tag annotations on
struct/union, func, func parameter and variable ([1]).  So we only check
existence of corresponding DW tags in these places.

A flag "--skip_encoding_btf_tag" is introduced if for whatever reason
this feature needs to be disabled.

 [1] https://reviews.llvm.org/D106614

Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Arnaldo Carvalho de Melo <arnaldo.melo@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: bpf@vger.kernel.org
Cc: dwarves@vger.kernel.org
Cc: kernel-team@fb.com
Link: https://lore.kernel.org/r/20210922021326.2287095-1-yhs@fb.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-09-27 17:06:56 -03:00
Arnaldo Carvalho de Melo 9f0809e6a8 pahole: Introduce --ptr_table_stats
Useful while developing to help in tuning the ptr tables (types, tags,
functions, maybe some more in the future).

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-20 16:40:27 -03:00
Arnaldo Carvalho de Melo c59e996c97 pahole: Fix races in accessing type information in live CUs
When using multithreaded DWARF loading we can't really freely access
some tables, as they may grow and lead to stale data accesses generating
segfaults.

So use a type comparision that takes into account just the immutable
information for structs, unions.

This isn't enough to discern if two types with the same name are really
the same, as we need to look at the member types to figure that out.

So if there are types for which member types need to be checked, leave
it for when all CUs were processed and are thus completely immutable to
resort and fully compare such types.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-20 16:40:27 -03:00
Arnaldo Carvalho de Melo c34b6c6cc9 pahole: Add missing limits.h include to get ULLONG_MAX definition
Found while compiling on a musl libc system.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-20 16:40:27 -03:00
Arnaldo Carvalho de Melo eba3e874ad pahole: Consider type members's names when comparing unions, structs
The last one was:

  $ btfdiff vmlinux
  --- /tmp/btfdiff.dwarf.VUXlsB	2021-08-06 18:11:51.371012024 -0300
  +++ /tmp/btfdiff.btf.CeZ7hA	2021-08-06 18:11:51.604017029 -0300
  @@ -48226,8 +48226,8 @@ struct intel_ir_data {
   	/* last cacheline: 56 bytes */
   };
   struct intel_pad_context {
  -	u32                        padctrl0;             /*     0     4 */
  -	u32                        padctrl1;             /*     4     4 */
  +	u32                        conf0;                /*     0     4 */
  +	u32                        val;                  /*     4     4 */

   	/* size: 8, cachelines: 1, members: 2 */
   	/* last cacheline: 8 bytes */
  $

That now is covered as well. Please report if you see some other corner
case (some attribute(__aligned__(N)))) perhaps? :)).

Now 'btfdiff vmlinux' is clean.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-20 16:40:27 -03:00
Arnaldo Carvalho de Melo f61d458c91 pahole: Consider type members's types when comparing unions, structs
But this still doesn't cover all types in the kernel, at least not for
btfdiff's needs, which is to have the output of pahole for BTF and DWARF
to be the same, so if we have two types that are ABI equal, it will
still complain if...

  $ btfdiff vmlinux
  --- /tmp/btfdiff.dwarf.VUXlsB	2021-08-06 18:11:51.371012024 -0300
  +++ /tmp/btfdiff.btf.CeZ7hA	2021-08-06 18:11:51.604017029 -0300
  @@ -48226,8 +48226,8 @@ struct intel_ir_data {
   	/* last cacheline: 56 bytes */
   };
   struct intel_pad_context {
  -	u32                        padctrl0;             /*     0     4 */
  -	u32                        padctrl1;             /*     4     4 */
  +	u32                        conf0;                /*     0     4 */
  +	u32                        val;                  /*     4     4 */

   	/* size: 8, cachelines: 1, members: 2 */
   	/* last cacheline: 8 bytes */
  $

The name of some members are different :-\ Consider it in the next
patch and possibly add a knob to consider both types equal, i.e. don't
compare member names, just size, number of members and types of pairs of
members (at each offset in both types).

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-20 16:40:27 -03:00
Arnaldo Carvalho de Melo 815041d6dc pahole: Improve the type sorting routine to consider multiple types with same name
Out of these different Linux kernel types with the same name (in different object files):

  $ pahole --sizes | sed -r 's/([^\t]+)\t.*/\1/g' | sort | uniq -c | grep -v ' 1 '
        2 chksum_desc_ctx
        2 controller
        2 debug_buffer
        2 dir_entry
        2 disklabel
        2 dma_chan
        2 dma_heap_attachment
        2 d_partition
        2 elf_thread_core_info
        2 intel_community_context
        2 intel_pad_context
        3 irq_info
        2 irte
        2 map_info
        2 mm_slot
        2 netlbl_domhsh_walk_arg
        2 node
        2 pci_root_info
        2 perf_aux_event
        2 pmc_dev
        2 pmc_reg_map
        2 remap_data
        2 slot
        2 sw842_param
        2 syscall_tp_t
        3 urb_priv
        2 walk_control
        3 workspace
  $

Only this one needs a more involved type comparision:

  $ btfdiff vmlinux
  --- /tmp/btfdiff.dwarf.Pksrlr	2021-08-06 16:42:34.823259365 -0300
  +++ /tmp/btfdiff.btf.KOAuwd	2021-08-06 16:42:35.032264038 -0300
  @@ -31035,7 +31035,7 @@ struct elf_note_info {
   	struct memelfnote          auxv;                 /*    56    24 */
   	/* --- cacheline 1 boundary (64 bytes) was 16 bytes ago --- */
   	struct memelfnote          files;                /*    80    24 */
  -	compat_siginfo_t           csigdata;             /*   104   128 */
  +	siginfo_t                  csigdata;             /*   104   128 */
   	/* --- cacheline 3 boundary (192 bytes) was 40 bytes ago --- */
   	size_t                     size;                 /*   232     8 */
   	int                        thread_notes;         /*   240     4 */
  $

It has the same size, number of members.

And this is not always, it all depends on the order in which the btf
encoder gets it from one of the DWARF loading threads:

  $ pahole -j12 --btf_encode vmlinux
  $ btfdiff vmlinux
  $

No changes, but then:

  $ btfdiff vmlinux
  $ perf stat pahole -j12 --btf_encode vmlinux

   Performance counter stats for 'pahole -j12 --btf_encode vmlinux':

           17,920.75 msec task-clock:u              #    2.995 CPUs utilized
                   0      context-switches:u        #    0.000 /sec
                   0      cpu-migrations:u          #    0.000 /sec
              78,004      page-faults:u             #    4.353 K/sec
      42,677,746,170      cycles:u                  #    2.381 GHz                      (83.37%)
         480,920,924      stalled-cycles-frontend:u #    1.13% frontend cycles idle     (83.33%)
       6,470,001,379      stalled-cycles-backend:u  #   15.16% backend cycles idle      (83.39%)
      96,468,468,147      instructions:u            #    2.26  insn per cycle
                                                    #    0.07  stalled cycles per insn  (83.33%)
      19,757,801,968      branches:u                #    1.103 G/sec                    (83.27%)
         143,118,731      branch-misses:u           #    0.72% of all branches          (83.32%)

         5.984348164 seconds time elapsed

        17.234929000 seconds user
         0.398715000 seconds sys

  $ btfdiff vmlinux
  --- /tmp/btfdiff.dwarf.b9FEZI	2021-08-06 16:46:08.810043718 -0300
  +++ /tmp/btfdiff.btf.IawvDY	2021-08-06 16:46:09.026048548 -0300
  @@ -31035,7 +31035,7 @@ struct elf_note_info {
   	struct memelfnote          auxv;                 /*    56    24 */
   	/* --- cacheline 1 boundary (64 bytes) was 16 bytes ago --- */
   	struct memelfnote          files;                /*    80    24 */
  -	compat_siginfo_t           csigdata;             /*   104   128 */
  +	siginfo_t                  csigdata;             /*   104   128 */
   	/* --- cacheline 3 boundary (192 bytes) was 40 bytes ago --- */
   	size_t                     size;                 /*   232     8 */
   	int                        thread_notes;         /*   240     4 */
  $

Next cset will take that into account by traversing both types looking
for differences in the type for a field.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-20 16:40:27 -03:00
Arnaldo Carvalho de Melo 23ea62817c pahole: Move case fallthru comment to after the statement
In this case we have:

	case foo: {
	}
        case bar:

The fallthru comment has to be _after_ the closing curly brace, fix it
and avoid this warning (from clang, but probably from gcc too):

  /var/home/acme/git/pahole/pahole.c:573:40: warning: this statement may fall through [-Wimplicit-fallthrough=]
    573 |                 case DW_TAG_base_type: {
        |                                        ^
  /var/home/acme/git/pahole/pahole.c:582:17: note: here
    582 |                 case DW_TAG_pointer_type:
        |                 ^~~~

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-20 16:40:27 -03:00
Arnaldo Carvalho de Melo 21b2933f01 pahole: Fix signedness of ternary expression operator
To address this clang warning:

  /var/home/acme/git/pahole/pahole.c: In function ‘type__instance_read_once’:
  /var/home/acme/git/pahole/pahole.c:1933:78: warning: operand of ‘?:’ changes signedness from ‘int’ to ‘uint32_t’ {aka ‘unsigned int’} due to unsignedness of other operand [-Wsign-compare]
   1933 |         return fread(instance->instance, instance->type->size, 1, fp) != 1 ? -1 : instance->type->size;

Fixes: e3e5a4626c ("pahole: Make sure the header is read only once")
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-20 16:40:27 -03:00
Arnaldo Carvalho de Melo 54c1e93b8e pahole: Use the 'prototypes' parameter in prototypes__load()
It was using &class_names directly while it was also being passed as the
'prototypes' argument, use the argument.

Fixes: 823739b56f ("pahole: Convert class_names into a list of struct prototypes")
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-20 16:40:27 -03:00
Arnaldo Carvalho de Melo 3895127ce6 pahole: Clarify that currently --nr_methods doesn't work together witn -C
It should, as its natural to do:

  $ pahole --nr_methods -C sock

And have it traverse all functions in all compilation units and show how
many of them have 'struct sock *' as one of its arguments, but more
changes are needed to have this in place and it is easy enough to do:

  $ pahole --nr_methods | grep -w sock

  $ pahole --nr_methods  | grep -w sock
  sock	1005
  $

And with BTF, its super fast too.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-20 16:40:27 -03:00
Arnaldo Carvalho de Melo 2ea46285ac pahole: No need to store the class name in 'struct structure'
As we by now already store the 'struct class' it comes from and
class->name is now a string, no point in storing a duplicate name.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-20 16:40:27 -03:00
Arnaldo Carvalho de Melo 4d8551396d pahole: Multithreaded DWARF loading requires elfutils >= 0.178
According to Mark Wieelard and as per testing, elfutils' libdw version
must be at least 0.178 for multithreaded DWARF loading.

Check that and emit a warning and then continue using just a single
thread, this allows for asking for multithreading in things like the
Linux Kernel makefiles while still working on older systems, such as
centos:7, where the elfutils version is 0.176.

Mark also provided this info for people using centos:7 (and
equivalents):

''Note that on centos7 if you install centos-release-scl you can get the
various devtoolset packages that do contain newer gcc and elfutils. The
latest are devtoolset-10-gcc (gcc-10.2.1) and devtoolset-10-elfutils-devel
(elfutils-0.182).

After installing you can use them with "scl enable devtoolset-10 bash"
which sets up the environment with the new devtools as default.''

A quick attempt at using a lock around all libdw functions ended up
being a too heavy big hammer, making the multithreaded DWARF loader to
be worse than using just a single thread.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-20 16:40:27 -03:00
Arnaldo Carvalho de Melo d2d83be1e2 pahole: Allow tweaking the size of the loader hash tables
To experiment with different sizes as time goes by and the number of symbols in
the kernel grows.

The current default, 15, is suboptimal for the fedora rawhide kernel, we can do
better using 12.

Default: 15:

  $ sudo ~acme/bin/perf stat -d -r5 pahole -j --btf_encode_detached vmlinux-j.btf vmlinux

   Performance counter stats for 'pahole -j --btf_encode_detached vmlinux-j.btf vmlinux' (5 runs):

            8,107.73 msec task-clock                #    2.749 CPUs utilized            ( +-  0.05% )
               1,723      context-switches          #  212.562 /sec                     ( +-  1.86% )
                   5      cpu-migrations            #    0.641 /sec                     ( +- 46.07% )
              68,802      page-faults               #    8.486 K/sec                    ( +-  0.05% )
      29,221,590,880      cycles                    #    3.604 GHz                      ( +-  0.04% )
      63,438,138,612      instructions              #    2.17  insn per cycle           ( +-  0.00% )
      15,125,172,105      branches                  #    1.866 G/sec                    ( +-  0.00% )
         119,983,284      branch-misses             #    0.79% of all branches          ( +-  0.06% )
      13,964,248,638      L1-dcache-loads           #    1.722 G/sec                    ( +-  0.00% )
         375,110,346      L1-dcache-load-misses     #    2.69% of all L1-dcache accesses( +-  0.01% )
          91,712,402      LLC-loads                 #   11.312 M/sec                    ( +-  0.14% )
          28,025,289      LLC-load-misses           #   30.56% of all LL-cache accesses ( +-  0.23% )

             2.94980 +- 0.00193 seconds time elapsed  ( +-  0.07% )

  $

New default, to be set in an upcoming patch, 12:

  $ sudo ~acme/bin/perf stat -d -r5 pahole --hashbits=12 -j --btf_encode_detached vmlinux-j.btf vmlinux

   Performance counter stats for 'pahole --hashbits=12 -j --btf_encode_detached vmlinux-j.btf vmlinux' (5 runs):

            7,687.31 msec task-clock                #    2.704 CPUs utilized            ( +-  0.02% )
               1,677      context-switches          #  218.126 /sec                     ( +-  0.70% )
                   4      cpu-migrations            #    0.468 /sec                     ( +- 18.84% )
              67,827      page-faults               #    8.823 K/sec                    ( +-  0.03% )
      27,711,744,058      cycles                    #    3.605 GHz                      ( +-  0.02% )
      63,032,539,630      instructions              #    2.27  insn per cycle           ( +-  0.00% )
      15,062,001,666      branches                  #    1.959 G/sec                    ( +-  0.00% )
         127,728,818      branch-misses             #    0.85% of all branches          ( +-  0.07% )
      13,972,184,314      L1-dcache-loads           #    1.818 G/sec                    ( +-  0.00% )
         364,962,883      L1-dcache-load-misses     #    2.61% of all L1-dcache accesses( +-  0.02% )
          83,969,109      LLC-loads                 #   10.923 M/sec                    ( +-  0.13% )
          19,141,055      LLC-load-misses           #   22.80% of all LL-cache accesses ( +-  0.25% )

            2.842440 +- 0.000952 seconds time elapsed  ( +-  0.03% )

  $ sudo ~acme/bin/perf stat -d -r5 pahole --hashbits=11 -j --btf_encode_detached vmlinux-j.btf vmlinux

   Performance counter stats for 'pahole --hashbits=11 -j --btf_encode_detached vmlinux-j.btf vmlinux' (5 runs):

            7,704.29 msec task-clock                #    2.702 CPUs utilized            ( +-  0.05% )
               1,676      context-switches          #  217.515 /sec                     ( +-  1.04% )
                   2      cpu-migrations            #    0.286 /sec                     ( +- 17.01% )
              67,813      page-faults               #    8.802 K/sec                    ( +-  0.05% )
      27,786,710,102      cycles                    #    3.607 GHz                      ( +-  0.05% )
      63,027,795,038      instructions              #    2.27  insn per cycle           ( +-  0.00% )
      15,066,316,987      branches                  #    1.956 G/sec                    ( +-  0.00% )
         130,431,772      branch-misses             #    0.87% of all branches          ( +-  0.20% )
      13,981,516,517      L1-dcache-loads           #    1.815 G/sec                    ( +-  0.00% )
         369,525,466      L1-dcache-load-misses     #    2.64% of all L1-dcache accesses( +-  0.03% )
          83,328,524      LLC-loads                 #   10.816 M/sec                    ( +-  0.27% )
          18,704,020      LLC-load-misses           #   22.45% of all LL-cache accesses ( +-  0.18% )

             2.85109 +- 0.00281 seconds time elapsed  ( +-  0.10% )

  $ sudo ~acme/bin/perf stat -d -r5 pahole --hashbits=8 -j --btf_encode_detached vmlinux-j.btf vmlinux

   Performance counter stats for 'pahole --hashbits=8 -j --btf_encode_detached vmlinux-j.btf vmlinux' (5 runs):

            8,190.55 msec task-clock                #    2.774 CPUs utilized            ( +-  0.03% )
               1,607      context-switches          #  196.226 /sec                     ( +-  0.67% )
                   3      cpu-migrations            #    0.317 /sec                     ( +- 15.38% )
              67,869      page-faults               #    8.286 K/sec                    ( +-  0.05% )
      29,511,213,192      cycles                    #    3.603 GHz                      ( +-  0.02% )
      63,347,196,598      instructions              #    2.15  insn per cycle           ( +-  0.00% )
      15,198,023,498      branches                  #    1.856 G/sec                    ( +-  0.00% )
         131,113,100      branch-misses             #    0.86% of all branches          ( +-  0.14% )
      14,118,162,884      L1-dcache-loads           #    1.724 G/sec                    ( +-  0.00% )
         422,048,384      L1-dcache-load-misses     #    2.99% of all L1-dcache accesses( +-  0.01% )
         105,878,910      LLC-loads                 #   12.927 M/sec                    ( +-  0.05% )
          21,022,664      LLC-load-misses           #   19.86% of all LL-cache accesses ( +-  0.20% )

            2.952678 +- 0.000858 seconds time elapsed  ( +-  0.03% )

  $ sudo ~acme/bin/perf stat -d -r5 pahole --hashbits=13 -j --btf_encode_detached vmlinux-j.btf vmlinux

   Performance counter stats for 'pahole --hashbits=13 -j --btf_encode_detached vmlinux-j.btf vmlinux' (5 runs):

            7,728.71 msec task-clock                #    2.707 CPUs utilized            ( +-  0.07% )
               1,661      context-switches          #  214.887 /sec                     ( +-  0.70% )
                   2      cpu-migrations            #    0.259 /sec                     ( +- 22.36% )
              67,893      page-faults               #    8.785 K/sec                    ( +-  0.04% )
      27,874,322,843      cycles                    #    3.607 GHz                      ( +-  0.07% )
      63,079,425,815      instructions              #    2.26  insn per cycle           ( +-  0.00% )
      15,067,279,408      branches                  #    1.950 G/sec                    ( +-  0.00% )
         125,706,874      branch-misses             #    0.83% of all branches          ( +-  1.00% )
      13,967,177,801      L1-dcache-loads           #    1.807 G/sec                    ( +-  0.00% )
         363,566,754      L1-dcache-load-misses     #    2.60% of all L1-dcache accesses( +-  0.02% )
          86,583,482      LLC-loads                 #   11.203 M/sec                    ( +-  0.13% )
          20,629,871      LLC-load-misses           #   23.83% of all LL-cache accesses ( +-  0.21% )

             2.85551 +- 0.00124 seconds time elapsed  ( +-  0.04% )

  $

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-20 16:40:27 -03:00
Arnaldo Carvalho de Melo a2f1e69848 core: Use obstacks: take 2
Allow asking for obstacks to be used, as for use cases like the btf
encoder where its all allocate sequentially + free everything at
cu__delete(), so obstacks are applicable and provide a good speedup:

  $ grep "model name" /proc/cpuinfo
  model name	: Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz
  model name	: Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz
  model name	: Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz
  model name	: Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz
  $

Before:

  $ perf stat -r5 pahole -j --btf_encode_detached vmlinux-j.btf vmlinux

   Performance counter stats for 'pahole -j --btf_encode_detached vmlinux-j.btf vmlinux' (5 runs):

           10,445.75 msec task-clock:u              #    2.864 CPUs utilized            ( +-  0.08% )
                   0      context-switches:u        #    0.000 /sec
                   0      cpu-migrations:u          #    0.000 /sec
             761,926      page-faults:u             #   72.941 K/sec                    ( +-  0.00% )
      31,946,591,661      cycles:u                  #    3.058 GHz                      ( +-  0.05% )
      69,103,520,880      instructions:u            #    2.16  insn per cycle           ( +-  0.00% )
      16,353,763,143      branches:u                #    1.566 G/sec                    ( +-  0.00% )
         122,309,098      branch-misses:u           #    0.75% of all branches          ( +-  0.12% )

             3.64689 +- 0.00437 seconds time elapsed  ( +-  0.12% )

  $ perf record --call-graph lbr pahole -j --btf_encode_detached vmlinux-j.btf vmlinux
  [ perf record: Woken up 52 times to write data ]
  [ perf record: Captured and wrote 13.151 MB perf.data (43058 samples) ]
  $
  $ perf report --no-children
  Samples: 43K of event 'cycles:u', Event count (approx.): 31938442091
    Overhead  Command  Shared Object         Symbol
  +   22.98%  pahole   libdw-0.185.so        [.] __libdw_find_attr
  +    6.69%  pahole   libdwarves.so.1.0.0   [.] cu__hash.isra.0
  +    5.82%  pahole   libdwarves.so.1.0.0   [.] hashmap__insert
  +    5.16%  pahole   libc.so.6             [.] __libc_calloc
  +    5.01%  pahole   libdwarves.so.1.0.0   [.] btf_dedup_is_equiv
  +    3.39%  pahole   libc.so.6             [.] _int_malloc
  +    2.82%  pahole   libc.so.6             [.] __strcmp_avx2
  +    2.22%  pahole   libdw-0.185.so        [.] __libdw_form_val_compute_len
  +    2.13%  pahole   libdw-0.185.so        [.] dwarf_attr
  +    2.08%  pahole   [unknown]             [k] 0xffffffffa0e010a7
  +    1.98%  pahole   libdwarves.so.1.0.0   [.] dwarf_cu__find_type_by_ref
  +    1.98%  pahole   libdwarves.so.1.0.0   [.] btf__dedup
  +    1.92%  pahole   libc.so.6             [.] pthread_rwlock_unlock@@GLIBC_2.34
  +    1.92%  pahole   libdwarves.so.1.0.0   [.] btf__add_field
  +    1.92%  pahole   libdwarves.so.1.0.0   [.] list__for_all_tags
  +    1.61%  pahole   libdwarves.so.1.0.0   [.] btf_encoder__encode_cu
  +    1.49%  pahole   libdwarves.so.1.0.0   [.] die__process_class
  +    1.44%  pahole   libc.so.6             [.] pthread_rwlock_tryrdlock@@GLIBC_2.34
  +    1.24%  pahole   libdw-0.185.so        [.] dwarf_siblingof
  +    1.18%  pahole   libdwarves.so.1.0.0   [.] btf_dedup_ref_type
  +    1.12%  pahole   libdwarves.so.1.0.0   [.] strs_hash_fn
  +    1.11%  pahole   libdwarves.so.1.0.0   [.] attr_numeric
  +    1.01%  pahole   libdwarves.so.1.0.0   [.] tag__size

After:

  $ perf stat -r5 pahole -j --btf_encode_detached vmlinux-j.btf vmlinux

   Performance counter stats for 'pahole -j --btf_encode_detached vmlinux-j.btf vmlinux' (5 runs):

            8,114.11 msec task-clock:u              #    2.747 CPUs utilized            ( +-  0.09% )
                   0      context-switches:u        #    0.000 /sec
                   0      cpu-migrations:u          #    0.000 /sec
              68,792      page-faults:u             #    8.478 K/sec                    ( +-  0.05% )
      28,705,283,249      cycles:u                  #    3.538 GHz                      ( +-  0.09% )
      63,013,653,035      instructions:u            #    2.20  insn per cycle           ( +-  0.00% )
      15,039,319,384      branches:u                #    1.853 G/sec                    ( +-  0.00% )
         118,272,350      branch-misses:u           #    0.79% of all branches          ( +-  0.41% )

             2.95368 +- 0.00221 seconds time elapsed  ( +-  0.07% )

  $
  $ perf record --call-graph lbr pahole -j --btf_encode_detached vmlinux-j.btf vmlinux
  [ perf record: Woken up 40 times to write data ]
  [ perf record: Captured and wrote 10.426 MB perf.data (33733 samples) ]
  $
  $ perf report --no-children
  Samples: 33K of event 'cycles:u', Event count (approx.): 28860426071
    Overhead  Command  Shared Object         Symbol
  +   26.10%  pahole   libdw-0.185.so        [.] __libdw_find_attr
  +    6.13%  pahole   libdwarves.so.1.0.0   [.] cu__hash.isra.0
  +    5.83%  pahole   libdwarves.so.1.0.0   [.] hashmap__insert
  +    5.52%  pahole   libdwarves.so.1.0.0   [.] btf_dedup_is_equiv
  +    3.04%  pahole   libc.so.6             [.] __strcmp_avx2
  +    2.45%  pahole   libdw-0.185.so        [.] __libdw_form_val_compute_len
  +    2.31%  pahole   libdwarves.so.1.0.0   [.] btf__dedup
  +    2.30%  pahole   libdw-0.185.so        [.] dwarf_attr
  +    2.19%  pahole   libc.so.6             [.] pthread_rwlock_unlock@@GLIBC_2.34
  +    2.08%  pahole   libdwarves.so.1.0.0   [.] list__for_all_tags
  +    2.07%  pahole   libdwarves.so.1.0.0   [.] dwarf_cu__find_type_by_ref
  +    1.96%  pahole   libdwarves.so.1.0.0   [.] btf__add_field
  +    1.67%  pahole   libc.so.6             [.] pthread_rwlock_tryrdlock@@GLIBC_2.34
  +    1.63%  pahole   libdwarves.so.1.0.0   [.] btf_encoder__encode_cu
  +    1.52%  pahole   libdwarves.so.1.0.0   [.] die__process_class
  +    1.51%  pahole   libdwarves.so.1.0.0   [.] attr_type
  +    1.36%  pahole   libdwarves.so.1.0.0   [.] btf_dedup_ref_type
  +    1.32%  pahole   libdwarves.so.1.0.0   [.] strs_hash_fn
  +    1.25%  pahole   libdw-0.185.so        [.] dwarf_siblingof
  +    1.24%  pahole   libdwarves.so.1.0.0   [.] namespace__recode_dwarf_types
  +    1.17%  pahole   libdwarves.so.1.0.0   [.] attr_numeric
  +    1.16%  pahole   libdwarves.so.1.0.0   [.] dwarf_cu__init
  +    1.03%  pahole   libdwarves.so.1.0.0   [.] tag__init
  +    1.01%  pahole   libdwarves.so.1.0.0   [.] tag__size

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-20 16:40:27 -03:00
Arnaldo Carvalho de Melo 9d0a7ee0c3 pahole: Ignore DW_TAG_label when encoding BTF
As it will not be used, so don't waste cycles/memory parsing them:

  $ grep "model name" /proc/cpuinfo
  model name	: Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz
  model name	: Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz
  model name	: Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz
  model name	: Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz
  $

Before:

  $ perf stat -r5 pahole -j --btf_encode_detached=vmlinux-j.btf -F dwarf vmlinux

   Performance counter stats for 'pahole -j --btf_encode_detached=vmlinux-j.btf -F dwarf vmlinux' (5 runs):

           10,487.54 msec task-clock:u              #    2.855 CPUs utilized            ( +-  0.31% )
                   0      context-switches:u        #    0.000 /sec
                   0      cpu-migrations:u          #    0.000 /sec
             762,431      page-faults:u             #   72.699 K/sec                    ( +-  0.00% )
      31,994,949,358      cycles:u                  #    3.051 GHz                      ( +-  0.09% )
      69,129,157,311      instructions:u            #    2.16  insn per cycle           ( +-  0.00% )
      16,359,974,001      branches:u                #    1.560 G/sec                    ( +-  0.00% )
         122,800,385      branch-misses:u           #    0.75% of all branches          ( +-  0.23% )

             3.67286 +- 0.00917 seconds time elapsed  ( +-  0.25% )

  $

After:

  $ perf stat -r5 pahole -j --btf_encode_detached=vmlinux-j.btf -F dwarf vmlinux

   Performance counter stats for 'pahole -j --btf_encode_detached=vmlinux-j.btf -F dwarf vmlinux' (5 runs):

           10,431.47 msec task-clock:u              #    2.865 CPUs utilized            ( +-  0.04% )
                   0      context-switches:u        #    0.000 /sec
                   0      cpu-migrations:u          #    0.000 /sec
             761,982      page-faults:u             #   73.046 K/sec                    ( +-  0.00% )
      31,885,756,148      cycles:u                  #    3.057 GHz                      ( +-  0.04% )
      69,103,456,079      instructions:u            #    2.17  insn per cycle           ( +-  0.00% )
      16,353,867,606      branches:u                #    1.568 G/sec                    ( +-  0.00% )
         122,023,818      branch-misses:u           #    0.75% of all branches          ( +-  0.09% )

             3.64095 +- 0.00194 seconds time elapsed  ( +-  0.05% )

  $

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-20 16:40:27 -03:00
Arnaldo Carvalho de Melo 51ba831929 pahole: Ignore DW_TAG_inline_expansion when encoding BTF
XXX: for now leave this commented out, see comments in the source code.

As it will not be used, so don't waste cycles/memory parsing them:

  $ grep "model name" /proc/cpuinfo
  model name	: Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz
  model name	: Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz
  model name	: Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz
  model name	: Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz
  $

Before:

  $ perf stat -r5 pahole -j --btf_encode_detached=vmlinux-j.btf -F dwarf vmlinux

   Performance counter stats for 'pahole -j --btf_encode_detached=vmlinux-j.btf -F dwarf vmlinux' (5 runs):

           10,973.13 msec task-clock:u              #    2.906 CPUs utilized            ( +-  0.13% )
                   0      context-switches:u        #    0.000 /sec
                   0      cpu-migrations:u          #    0.000 /sec
             793,927      page-faults:u             #   72.352 K/sec                    ( +-  0.00% )
      33,585,562,298      cycles:u                  #    3.061 GHz                      ( +-  0.17% )
      72,687,766,428      instructions:u            #    2.16  insn per cycle           ( +-  0.15% )
      17,198,056,478      branches:u                #    1.567 G/sec                    ( +-  0.16% )
         129,011,360      branch-misses:u           #    0.75% of all branches          ( +-  0.53% )

              3.7760 +- 0.0158 seconds time elapsed  ( +-  0.42% )

  $

After:

  $ perf stat -r5 pahole -j --btf_encode_detached=vmlinux-j.btf -F dwarf vmlinux

   Performance counter stats for 'pahole -j --btf_encode_detached=vmlinux-j.btf -F dwarf vmlinux' (5 runs):

           10,487.54 msec task-clock:u              #    2.855 CPUs utilized            ( +-  0.31% )
                   0      context-switches:u        #    0.000 /sec
                   0      cpu-migrations:u          #    0.000 /sec
             762,431      page-faults:u             #   72.699 K/sec                    ( +-  0.00% )
      31,994,949,358      cycles:u                  #    3.051 GHz                      ( +-  0.09% )
      69,129,157,311      instructions:u            #    2.16  insn per cycle           ( +-  0.00% )
      16,359,974,001      branches:u                #    1.560 G/sec                    ( +-  0.00% )
         122,800,385      branch-misses:u           #    0.75% of all branches          ( +-  0.23% )

             3.67286 +- 0.00917 seconds time elapsed  ( +-  0.25% )

  $

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-20 16:40:25 -03:00
Arnaldo Carvalho de Melo 20757745f0 pahole: Allow encoding BTF with parallel DWARF loading
By adding a lock to serialize access to btf_encoder__encode_cu().

This works and allows a speedup in BTF encoding, but its too brute
force, the right thing to do is have per-thread BTF encoders and then
at the end merge everything in a last pass.

But pick the low hanging fruits now.

On a machine with 4 cores, no HT:

  $ grep "model name" -m1 /proc/cpuinfo
  model name	: Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz
  $

Non-parallel:

  $ perf stat -r5 pahole --btf_encode_detached=vmlinux.btf vmlinux

   Performance counter stats for 'pahole --btf_encode_detached=vmlinux.btf vmlinux' (5 runs):

            8,580.19 msec task-clock:u              #    1.000 CPUs utilized            ( +-  0.08% )
                   0      context-switches:u        #    0.000 /sec
                   0      cpu-migrations:u          #    0.000 /sec
             795,451      page-faults:u             #   92.708 K/sec                    ( +-  0.00% )
      29,151,924,821      cycles:u                  #    3.398 GHz                      ( +-  0.11% )
      70,947,245,709      instructions:u            #    2.43  insn per cycle           ( +-  0.00% )
      16,791,160,182      branches:u                #    1.957 G/sec                    ( +-  0.00% )
         120,793,994      branch-misses:u           #    0.72% of all branches          ( +-  1.04% )

             8.58192 +- 0.00686 seconds time elapsed  ( +-  0.08% )
  $

Parallel:

  $ perf stat -r5 pahole --btf_encode_detached=vmlinux-j.btf -j vmlinux

   Performance counter stats for 'pahole --btf_encode_detached=vmlinux-j.btf -j vmlinux' (5 runs):

           10,962.45 msec task-clock:u              #    2.914 CPUs utilized            ( +-  0.15% )
                   0      context-switches:u        #    0.000 /sec
                   0      cpu-migrations:u          #    0.000 /sec
             793,915      page-faults:u             #   72.421 K/sec                    ( +-  0.00% )
      33,552,130,646      cycles:u                  #    3.061 GHz                      ( +-  0.16% )
      72,778,320,572      instructions:u            #    2.17  insn per cycle           ( +-  0.12% )
      17,220,541,136      branches:u                #    1.571 G/sec                    ( +-  0.13% )
         129,353,767      branch-misses:u           #    0.75% of all branches          ( +-  0.48% )

              3.7614 +- 0.0141 seconds time elapsed  ( +-  0.38% )

  $

That CPUs utilized should go all the way to 4 when we parallelize the
BTF encoding.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-20 16:39:02 -03:00
Arnaldo Carvalho de Melo d133569bd0 pahole: No need to read DW_AT_alignment when encoding BTF
No need to read the DW_AT_alignment, not used in BTF encoding.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-20 16:38:58 -03:00
Arnaldo Carvalho de Melo 3e1c7a2077 pahole: Introduce --sort
To ask for sorting output, initially by name.

This is needed in 'btfdiff' to diff the output of 'pahole -F dwarf
--jobs N', where N threads will go on consuming DWARF compile units and
and pretty printing them, producing a non deterministic output.

So we need to sort the output for both BTF and DWARF, and then diff
them.

This is still not enough for some cases where different types have the
same name, things like "usb_priv" that exists in multiple DWARF compile
units, the first processed is "winning", i.e. being the only one
considered.

I have to look how BTF handles this to adopt a similar algorithm and
keep btfdiff usable as a regression test for the BTF and DWARF loader
and the BTF encoder.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-12 09:41:13 -03:00
Arnaldo Carvalho de Melo 967290bc71 pahole: Store the class id in 'struct structure' as well
Needed to defer calling printing classes to after we have all sorted out
by name with the upcoming 'pahole --sort' option, needed to make it
possible to compare 'pahole -F btf' with 'pahole -F dwarf -j', as the
multithreaded DWARF loader will not have all classes in a deterministic
order. This is needed for 'btfdiff'.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-12 09:41:13 -03:00
Arnaldo Carvalho de Melo 5365c45177 pahole: Keep class + cu in tree of structures
We'll use it for ordering by name.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-12 09:41:13 -03:00
Arnaldo Carvalho de Melo 75d4748861 pahole: Disable parallell BTF encoding for now
Introduce first parallell DWARF loading, test it, then move on to use it
together with BTF encoding.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-12 09:41:13 -03:00
Arnaldo Carvalho de Melo 1c60f71daa pahole: Add locking for the structures list and rbtree
Prep work for multithreaded DWARF loading, when there will be concurrent
access to this data structure.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-12 09:41:13 -03:00
Arnaldo Carvalho de Melo caa219dffc core: base_type__name() doesn't need a 'cu' arg
Another simplification made possible by using a plain char string
instead of string_t, that was only needed in the core as prep work
for CTF encoding.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-12 09:41:13 -03:00
Arnaldo Carvalho de Melo 0f54ca9c82 core: class__clone() doesn't need a 'cu' arg
Since we stopped using per-cu obstacks we don't need it. If we ever
want to use it we can do per thread obstacks.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-12 09:41:13 -03:00
Arnaldo Carvalho de Melo 2b2014187b core: class__delete() doesn't need a 'cu' arg
Since we stopped using per-cu obstacks we don't need it. If we ever
want to use it we can do per thread obstacks.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-12 09:41:13 -03:00
Arnaldo Carvalho de Melo 33e0d5f874 pahole: Introduce --prettify option
The use of isatty(0) to switch into pretty printing is problematic as
reported by Bernd Buschinski, that ran into problems with his scripts:

========================================================================
  I am using pahole 1.21 and I recently noticed that I no longer have
  any pahole output in several scripts.

  Using (on the command line):

    $ pahole -V -E -C my_struct /path/to/my/debug.o

  works fine and gives the expected output.

  But:

    $ parallel -j 1 pahole -V -E -C my_struct ::: /path/to/my/debug.o

  gives nothing, no stderr, no stdout and ret code 0.

  After testing some versions, it works fine in 1.17 and no longer works in 1.18.
========================================================================

Since the pretty printer broke existing scripts, and its a relatively
new feature, lets switch to using a explicit command line option to
activate the pretty printer, i.e. where we used:

  $ pahole --header elf64_hdr < /bin/bash

We now use one of:

  ⬢[acme@toolbox pahole]$ pahole --header elf64_hdr --prettify=/bin/bash
  {
  	.e_ident = { 127, 69, 76, 70, 2, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0 },
  	.e_type = 3,
  	.e_machine = 62,
  	.e_version = 1,
  	.e_entry = 204016,
  	.e_phoff = 64,
  	.e_shoff = 1388096,
  	.e_flags = 0,
  	.e_ehsize = 64,
  	.e_phentsize = 56,
  	.e_phnum = 13,
  	.e_shentsize = 64,
  	.e_shnum = 31,
  	.e_shstrndx = 30,
  },
  ⬢[acme@toolbox pahole]$ pahole --header elf64_hdr --prettify /bin/bash
  {
  	.e_ident = { 127, 69, 76, 70, 2, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0 },
  	.e_type = 3,
  	.e_machine = 62,
  	.e_version = 1,
  	.e_entry = 204016,
  	.e_phoff = 64,
  	.e_shoff = 1388096,
  	.e_flags = 0,
  	.e_ehsize = 64,
  	.e_phentsize = 56,
  	.e_phnum = 13,
  	.e_shentsize = 64,
  	.e_shnum = 31,
  	.e_shstrndx = 30,
  },
  ⬢[acme@toolbox pahole]$ pahole --header elf64_hdr --prettify - < /bin/bash
  {
  	.e_ident = { 127, 69, 76, 70, 2, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0 },
  	.e_type = 3,
  	.e_machine = 62,
  	.e_version = 1,
  	.e_entry = 204016,
  	.e_phoff = 64,
  	.e_shoff = 1388096,
  	.e_flags = 0,
  	.e_ehsize = 64,
  	.e_phentsize = 56,
  	.e_phnum = 13,
  	.e_shentsize = 64,
  	.e_shnum = 31,
  	.e_shstrndx = 30,
  },
  ⬢[acme@toolbox pahole]$ pahole --header elf64_hdr --prettify=- < /bin/bash
  {
  	.e_ident = { 127, 69, 76, 70, 2, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0 },
  	.e_type = 3,
  	.e_machine = 62,
  	.e_version = 1,
  	.e_entry = 204016,
  	.e_phoff = 64,
  	.e_shoff = 1388096,
  	.e_flags = 0,
  	.e_ehsize = 64,
  	.e_phentsize = 56,
  	.e_phnum = 13,
  	.e_shentsize = 64,
  	.e_shnum = 31,
  	.e_shstrndx = 30,
  },
  ⬢[acme@toolbox pahole]$

Reported-by: Bernd Buschinski <b.buschinski@googlemail.com>
Report-Link: https://lore.kernel.org/dwarves/CACN-hLVoz2tWrtgDLabOv6S1-H_8RD2fh8SV6EnADF1ikMxrmw@mail.gmail.com/
Tested-by-by: Bernd Buschinski <b.buschinski@googlemail.com>
Test-Link: https://lore.kernel.org/dwarves/CACN-hLXgHWdBkyMz+w58qX8DaV+WJ1mj1qheGBHbPv4fqozi5w@mail.gmail.com/
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-12 09:41:13 -03:00
Arnaldo Carvalho de Melo bc36e94f32 pahole: Try harder to resolve the --header type when pretty printing
Go on processing CUs till we have everything sorted out, which includes
the --header type.

On a file with DWARF info where the header type was the last to be found
it wasn't being resolved and the tool fails to resolve header variable
references and emits this misleading error message:

  ⬢[acme@toolbox pahole]$ pahole ~/bin/perf --header=perf_file_header --seek_bytes '$header.data.offset' --size_bytes='$header.data.size' -C 'perf_event_header(sizeof,type,type_enum=perf_event_type)' < perf.data
  pahole: --seek_bytes ($header.data.offset) makes reference to --header but it wasn't specified
  ⬢[acme@toolbox pahole]$

And that 'struct perf_file_header' _is_ in one of the CUs in ~/bin/perf:

  ⬢[acme@toolbox pahole]$ pahole ~/bin/perf -C perf_file_header
  struct perf_file_header {
  	u64                        magic;                /*     0     8 */
  	u64                        size;                 /*     8     8 */
  	u64                        attr_size;            /*    16     8 */
  	struct perf_file_section   attrs;                /*    24    16 */
  	struct perf_file_section   data;                 /*    40    16 */
  	struct perf_file_section   event_types;          /*    56    16 */
  	/* --- cacheline 1 boundary (64 bytes) was 8 bytes ago --- */
  	long unsigned int          adds_features[4];     /*    72    32 */

  	/* size: 104, cachelines: 2, members: 7 */
  	/* last cacheline: 40 bytes */
  };
  ⬢[acme@toolbox pahole]$

With this fix all the records are printed.

This probably wasn't noticed before because most tests were made with a
~/bin/perf file with BTF information, i.e. just one "CU", so the logic
of deferring the pretty printing till everything gets resolved wasn't
being exercised properly.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-12 09:41:13 -03:00
Arnaldo Carvalho de Melo fcfa2141c3 pahole: Make prototype__stdio_fprintf_value() receive a FILE to read raw data from
So far its just from stdin, but shouldn't.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-12 09:41:13 -03:00
Arnaldo Carvalho de Melo 2d35630fa5 pahole: Make pipe_seek() honour the 'fp' arg instead of hardcoding stdin
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-12 09:41:13 -03:00
Arnaldo Carvalho de Melo 9aa01472d9 pahole: Rename 'fp' to 'output' in prototype__stdio_fprintf_value()
As we'll also have another FILE pointer for input.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-12 09:41:13 -03:00
Arnaldo Carvalho de Melo 472b940180 pahole: Use the supplied 'fp' argument in type__instance_read_once()
It was unconditionally reading from 'stdin', when a 'fp' is supplied.

Fix this as now we'll stop unconditionally reading from stdin for the
pretty printer.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-12 09:41:13 -03:00
Arnaldo Carvalho de Melo ced4c34c37 core: Remove strings.c, unused
We were using this just for the ctf_encoder, that never really got
complete, so ditch it.

For BTF the strings table is done by libbpf, so we don't need it there
either.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-12 09:41:11 -03:00
Arnaldo Carvalho de Melo f8d571934b pahole: Add missing bpf/btf.h include
We get it by accident, via pahole_strings.h, and that is going away, fix
it.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-12 09:39:46 -03:00
Arnaldo Carvalho de Melo f4a77d0390 pahole: Use conf_load.kabi_prefix
Should work just as before, i.e. we hook at wher we read strings from
DWARF.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-12 09:39:46 -03:00
Arnaldo Carvalho de Melo e974d1b240 pahole: class_member_filter__new() doesn't need a 'struct cu *' argument
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-12 09:39:46 -03:00
Arnaldo Carvalho de Melo 0275e8d249 pahole: class_member_filter__parse() doesn't need a 'struct cu *' argument
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-12 09:39:46 -03:00
Arnaldo Carvalho de Melo 90183e8e4d pahole: tag__real_sizeof() doesn't need a 'struct cu *' argument
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-12 09:39:46 -03:00
Arnaldo Carvalho de Melo 5cb9192738 pahole: Rename tag__fprintf_hexdump_value() to instance__fprintf_hexdump_value()
As it acts only on an instance, doesn't need neither a 'struct tag' nor
a 'struct cu'.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-12 09:39:46 -03:00